WO2022095656A1 - 一种音频处理方法、装置、设备及介质 - Google Patents

一种音频处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022095656A1
WO2022095656A1 PCT/CN2021/122559 CN2021122559W WO2022095656A1 WO 2022095656 A1 WO2022095656 A1 WO 2022095656A1 CN 2021122559 W CN2021122559 W CN 2021122559W WO 2022095656 A1 WO2022095656 A1 WO 2022095656A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
chord
processed
humming
information
Prior art date
Application number
PCT/CN2021/122559
Other languages
English (en)
French (fr)
Inventor
吴泽斌
芮元庆
蒋义勇
曹硕
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Priority to US18/034,032 priority Critical patent/US20230402026A1/en
Publication of WO2022095656A1 publication Critical patent/WO2022095656A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process

Definitions

  • the present application relates to the field of computer technology, and in particular, to an audio processing method, apparatus, device, and medium.
  • the prior art is mainly to first convert the collected user audio into a MIDI (Musical Instrument Digital Interface, musical instrument digital interface) file, and then analyze the MIDI file to generate a MIDI file corresponding to the chord accompaniment.
  • MIDI Musical Instrument Digital Interface, musical instrument digital interface
  • the above prior art relies on MIDI files as input and output, and needs to use other methods to process input samples into MIDI files. This will cause accumulated errors due to the small amount of information in the MIDI file and the incomplete and accurate identification and conversion.
  • only MIDI files are generated in the end, and the playback of MIDI files depends on the performance of the audio equipment, which is prone to the problem of audio timbre distortion, which may not achieve the expected effect and make the user experience inconsistent during the propagation process.
  • the purpose of this application is to provide an audio processing method, device, equipment, medium, which can generate the melody rhythm and chord accompaniment audio corresponding to the user's humming audio, and is not easy to generate cumulative errors, so that the music experience of different users is consistent.
  • Its specific plan is as follows:
  • an audio processing method including:
  • the humming audio to be processed obtain the music information corresponding to the humming audio to be processed, wherein, the music information includes note information and beat information per minute;
  • chord accompaniment audio corresponding to the humming audio to be processed is generated according to the beat information per minute, the chords and the chord accompaniment parameters obtained in advance, wherein the chord accompaniment parameters are the chord accompaniment generation parameters set by the user;
  • the MIDI file and the chord accompaniment audio are output.
  • obtaining the humming audio to be processed, and obtaining music information corresponding to the humming audio to be processed includes:
  • the determining the target pitch period of each first audio frame in the to-be-processed humming audio includes:
  • the target pitch period of each first audio frame in the to-be-processed humming audio is determined by using a short-term autocorrelation function and a preset unvoiced sound detection method.
  • determining the target pitch period of each first audio frame in the to-be-processed humming audio using a short-term autocorrelation function and a preset unvoiced sound detection method including:
  • the preselected pitch period corresponding to the first audio frame is determined as the target pitch period corresponding to the first audio frame.
  • determining the musical note information corresponding to each first audio frame based on the target pitch period including:
  • the notes corresponding to each first audio frame and the start and end times corresponding to each first audio frame are determined as note information corresponding to each of the first audio frames.
  • the determining the sound energy of each second audio frame in the humming audio to be processed, and determining the beat per minute information corresponding to the humming audio to be processed based on the sound energy including:
  • target comparison parameters based on the average acoustic energy
  • the sound energy of the current second audio frame is greater than the target comparison parameter, it is determined that the current second audio frame is a beat, until the detection of each second audio frame in the to-be-processed humming audio is completed, and the to-be-processed humming audio is obtained.
  • the total number of beats in the humming song, and the beat per minute information corresponding to the humming audio to be processed is determined based on the total number of beats.
  • the construction of target comparison parameters based on the average acoustic energy includes:
  • the average acoustic energy is calibrated based on the calibration factor to obtain the target comparison parameter.
  • the determining the chord corresponding to the audio to be processed based on the note information and the beat information per minute includes:
  • a chord corresponding to the audio to be processed is determined from the preselected chords based on the note information and the beat information.
  • the determining the key of the to-be-processed humming audio based on the musical note information includes:
  • determining the chord corresponding to the audio to be processed from the preselected chord based on the note information and the beat per minute information including:
  • each measure is matched with each of the preselected chords respectively, and the chord corresponding to each measure is determined, so as to determine the chord corresponding to the audio to be processed.
  • generating the chord accompaniment audio corresponding to the to-be-processed humming audio according to the beat information per minute, the chords, and the chord accompaniment parameters acquired in advance including:
  • chord parameter in the chord accompaniment parameter represents a common chord
  • chord is optimized according to the common chord group in the preset common chord library to obtain an optimized chord
  • the audio material information corresponding to each note in the optimized notes is determined according to the instrument type parameter and the instrument pitch parameter in the chord accompaniment parameters, and the audio material corresponding to the audio material information is processed according to the preset mixing rules. mix;
  • chords are optimized according to the common chord groups in the preset common chord library to obtain optimized chords, including:
  • the audio material information corresponding to each note in the optimized notes is determined according to the instrument type parameter and the instrument pitch parameter in the chord accompaniment parameters, and the audio material is processed according to preset mixing rules.
  • the audio material corresponding to the information is mixed, including:
  • the audio material information corresponding to each note in the optimized notes is determined according to the musical instrument type parameter and the musical instrument pitch parameter in the chord accompaniment parameters, wherein the audio material information includes a material identifier, pitch, and starting playback position and the duration of the material;
  • the audio material information into the preset sounding array according to the preset sound mixing rules, and mix the audio material in the preset audio material library pointed to by the audio material information in the preset sounding array for the current beat , wherein the beat is determined according to the beat per minute information.
  • an audio processing device including:
  • an audio acquisition module configured to acquire the humming audio to be processed, and obtain music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute;
  • a chord determination module configured to determine a chord corresponding to the audio to be processed based on the note information and the beat information per minute;
  • MIDI file generation module for generating the MIDI file corresponding to the humming audio to be processed according to the note information and the beat information per minute;
  • the chord accompaniment generation module is used to generate the corresponding chord accompaniment audio of the humming audio to be processed according to the beat information per minute, the chord and the obtained chord accompaniment parameter, wherein the chord accompaniment parameter is set by the user Chord accompaniment generation parameters;
  • An output module for outputting the MIDI file and the chord accompaniment audio.
  • an electronic device comprising:
  • the memory is used to store computer programs
  • the processor is configured to execute the computer program to implement the audio processing method disclosed above.
  • the present application discloses a computer-readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the audio processing method disclosed above is implemented.
  • the application first obtains the humming audio to be processed, and obtains the music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute, and then based on the note information, the The minute beat information determines the chord corresponding to the to-be-processed audio, and then generates a MIDI file corresponding to the to-be-processed humming audio according to the note information and the per-minute beat information, and according to the per-minute beat information, the The chords and the pre-acquired chord accompaniment parameters generate chord accompaniment audio corresponding to the to-be-processed humming audio, and then the MIDI file and the chord accompaniment audio can be output.
  • the application can obtain the corresponding music information after obtaining the humming audio to be processed.
  • the resulting MIDI files are analyzed, so it is not easy to cause the error accumulation problem caused by converting the audio into MIDI files first.
  • the present application generates the MIDI file corresponding to the main melody of the humming audio to be processed, and directly generates the chord accompaniment audio corresponding to the humming audio to be processed, so that the performance of the chord accompaniment audio on the audio equipment is relatively low.
  • the experience of different users is consistent, and the expected user experience effect is obtained.
  • FIG. 1 is a schematic diagram of a system framework to which the audio processing solution provided by the present application is applicable;
  • Fig. 4 is a kind of musical note comparison diagram disclosed by the application.
  • Fig. 5 is a kind of musical note detection result graph disclosed by the application.
  • Fig. 6 is a kind of tonic table disclosed by the application.
  • Fig. 8 is a kind of chord and note comparison table
  • Fig. 9 is a kind of arpeggio and note comparison table
  • Fig. 10 is a specific audio material mixing flow chart disclosed in the application.
  • Fig. 11a is a kind of APP application interface disclosed by this application.
  • Fig. 11b is a kind of APP application interface disclosed by this application.
  • Fig. 11c is a kind of APP application interface disclosed by this application.
  • FIG. 12 is a schematic structural diagram of an audio processing apparatus disclosed in the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device disclosed in this application.
  • the hardware composition framework may include: a first computer device 101 and a second computer device 102 .
  • a communication connection is implemented between the first computer device 101 and the second computer device 102 through the network 103 .
  • the hardware structures of the first computer device 101 and the second computer device 102 are not specifically limited here, and the first computer device 101 and the second computer device 102 perform data interaction to realize the audio processing function.
  • the embodiment of the present application does not limit the form of the network 103, for example, the network 103 may be a wireless network (such as WIFI, Bluetooth, etc.) or a wired network.
  • the first computer device 101 and the second computer device 102 may be the same computer device, for example, the first computer device 101 and the second computer device 102 are both servers; they may also be different types of computer devices, such as the first computer
  • the device 101 may be a terminal or an intelligent electronic device, and the second computer device 102 may be a server.
  • a server with strong computing power may be used as the second computer device 102 to improve data processing efficiency and reliability, thereby improving audio processing efficiency.
  • a terminal or intelligent electronic device with low cost and wide application range is used as the first computer device 101 to realize the interaction between the second computer device 102 and the user.
  • the terminal After acquiring the humming audio to be processed, the terminal sends the humming audio to be processed to the server corresponding to the terminal, and the server obtains the humming audio after receiving the humming audio to be processed.
  • Music information corresponding to the humming audio to be processed wherein the music information includes note information and beat information per minute, and then determine the chord corresponding to the audio to be processed based on the note information and the beat information per minute, and then need to
  • the note information and the beat information per minute generate a MIDI file corresponding to the humming audio to be processed, and generate the humming audio to be processed according to the beat information per minute, the chords and the pre-acquired chord accompaniment parameters.
  • the corresponding chord accompaniment audio of the audio The corresponding chord accompaniment audio of the audio.
  • the generated MIDI file and the chord accompaniment audio can be output to the terminal.
  • the terminal receives the first play instruction triggered by the user, the terminal can read the acquired MIDI file and play the corresponding audio.
  • the second play instruction is triggered, the acquired chord accompaniment audio can be played.
  • the entire aforementioned audio processing process can also be completed by the terminal, that is, the humming audio to be processed is acquired through the voice acquisition module of the terminal, and the music information corresponding to the humming audio to be processed is obtained, wherein , the music information includes note information and beat information per minute, and then determines the chord corresponding to the audio to be processed based on the note information and beat information per minute, and then also needs to generate according to the note information and the beat information per minute
  • the MIDI file corresponding to the humming audio to be processed, and the chord accompaniment audio corresponding to the humming audio to be processed is generated according to the beat information per minute, the chords and the chord accompaniment parameters acquired in advance.
  • the generated MIDI file and the chord accompaniment audio can be output to the corresponding path for saving.
  • the obtained MIDI file can be read and the corresponding audio can be played.
  • the acquired chord accompaniment audio can be played.
  • an embodiment of the present application discloses an audio processing method, which includes:
  • Step S11 Acquire the humming audio to be processed, and obtain music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute.
  • the humming audio to be processed may be the audio of the user humming collected by the voice collection device, so as to obtain the corresponding humming audio to be processed.
  • music information may be acquired first, and then music information retrieval is performed on the acquired humming audio to be processed to obtain music information corresponding to the humming audio to be processed, wherein the music information includes note information and Beats per minute information.
  • Music Information Retrieval includes pitch/melody extraction, automatic notation, rhythm analysis, harmony analysis, singing information processing, music search, music structure analysis, music emotion calculation, music Recommendation, music classification, automatic composition in music generation, singing voice synthesis, digital instrument voice synthesis, etc.
  • the current computer equipment acquiring the humming audio to be processed includes acquiring the humming audio to be processed through its own input unit.
  • the current computer equipment collects the humming audio to be processed through a voice acquisition module, or The device acquires the to-be-processed humming audio from a cappella audio library, where the acapella audio library may include pre-acquired acapella audios of different users.
  • the current computer device can also obtain the pending humming audio sent by other devices through the network (which may be a wired network or a wireless network).
  • the way to handle humming audio For example, other devices (such as terminals) may receive the humming audio to be processed input by the user through the voice input module.
  • acquiring the to-be-processed humming audio, and obtaining music information corresponding to the to-be-processed humming audio includes: acquiring the to-be-processed humming audio; determining each first audio frame in the to-be-processed humming audio the target pitch period, and determine the note information corresponding to each first audio frame based on the target pitch period, wherein the first audio frame is an audio frame with a duration equal to the first preset duration; determine the to-be-processed hum
  • the sound energy of each second audio frame in the singing audio, and based on the sound energy, the beat information per minute corresponding to the humming audio to be processed is determined, wherein the second audio frame includes a preset number of sampling points audio frame.
  • the target pitch period corresponding to each first audio frame in the to-be-processed humming audio can be determined first, and then the note information corresponding to each first audio frame can be determined based on the target pitch period.
  • the audio framing method is to divide the audio of the first preset duration into a first audio frame. For pitch detection, it is generally required that a frame contains at least 2 cycles, and generally the minimum pitch is 50Hz, that is, the longest cycle is 20ms. Therefore, the frame length of one of the first audio frames is generally required to be greater than 40ms.
  • determining the target pitch period of each first audio frame in the to-be-processed humming audio includes: determining each first audio frame in the to-be-processed humming audio by using a short-term autocorrelation function and a preset voiceless sound detection method target pitch period.
  • the speech signal When people pronounce, according to the vibration of the vocal cords, the speech signal can be divided into two types: unvoiced and voiced. Among them, the voiced sound shows obvious periodicity in the time domain.
  • the speech signal is a non-stationary signal, and its characteristics change with time, but it can be considered to have relatively stable characteristics in a short period of time, that is, short-term stationarity. Therefore, the target pitch period of each first audio frame in the to-be-processed humming audio can be determined by using the short-term autocorrelation function and the preset unvoiced sound detection method.
  • a short-term autocorrelation function can be used to determine the preselected pitch period of each first audio frame in the humming audio to be processed;
  • a preset unvoiced sound detection method can be used to determine whether each of the first audio frames is a voiced sound frame; if If the first audio frame is a voiced frame, the preselected pitch period corresponding to the first audio frame is determined as the target pitch period corresponding to the first audio frame. That is, for the current first audio frame, the preselected pitch period can be determined first through the short-term autocorrelation function, and then the preset unvoiced sound detection method is used to determine whether the current first audio frame is a voiced sound frame.
  • the frame is a voiced frame, then the preselected pitch period of the current first audio frame is used as the target pitch period of the current first audio frame, if the current first audio frame is a voiceless frame, then the preselected pitch period of the current first audio frame is determined as Invalid pitch period.
  • determining whether the current first audio frame is a voiced frame by using a preset unvoiced sound detection method may be determined by judging whether the ratio of the energy in the voiced audio segment to the energy of the unvoiced audio segment on the current first audio frame is greater than or equal to the preset
  • the energy ratio threshold is used to determine whether the current first audio frame is a voiced audio frame.
  • the voiced audio segment is usually 100 Hz to 4000 Hz
  • the unvoiced audio segment is usually 4000 Hz to 8000 Hz
  • the unvoiced audio segment is usually 100 Hz to 8000 Hz.
  • other unvoiced and voiced sound detection methods may also be used, which are not specifically limited here.
  • the note information corresponding to each first audio frame may be determined based on the target pitch period. Specifically, the pitch of each of the first audio frames is determined based on each of the target pitch periods; the note corresponding to each of the first audio frames is determined based on the pitch of each of the first audio frames; The notes corresponding to the audio frames and the start and end times corresponding to each of the first audio frames are determined as note information corresponding to each of the first audio frames.
  • the note information corresponding to each first audio frame determined based on the target pitch period is expressed by the first operation formula as:
  • note represents the note corresponding to the current first audio frame
  • pitch represents the pitch corresponding to the current first audio frame
  • T represents the target pitch period corresponding to the current first audio frame.
  • FIG. 4 the corresponding relationship between notes and notes, frequencies and periods on the piano is shown. It can be seen from FIG. 4 that, for example, when the pitch is 220 Hz, the note is the 57th note, which corresponds to the A3 note on the piano note.
  • the calculated note is a decimal, just take the nearest integer. And record the start and end time of the current note at the same time. When no voiced sound is detected, it is considered to be other interference or pause, not effective humming. In this way, a series of discretely distributed note sequences can be obtained, which can be expressed in the form of a piano roll. As shown in Figure 5.
  • the determining the sound energy of each second audio frame in the humming audio to be processed, and determining the beat per minute information corresponding to the humming audio to be processed based on the sound energy may specifically include : determine the sound energy of the current second audio frame in the humming audio to be processed and the average sound energy corresponding to the current second audio frame, wherein the average sound energy is the past continuous sound energy before the termination time of the current second audio frame The average value of the sound energy of each second audio frame within the second preset duration; constructing a target comparison parameter based on the average sound energy; judging whether the sound energy of the current second audio frame is greater than the target comparison parameter; if the current sound energy is greater than the target comparison parameter; The sound energy of the second audio frame is greater than the target comparison parameter, then it is determined that the current second audio frame is a beat, until the detection of each second audio frame in the to-be-processed humming audio is completed, and the to-be-processed humming audio is obtained.
  • the total number of beat may be
  • constructing the target comparison parameter based on the average sound energy may specifically include: determining that the sound energy of each second audio frame within the second consecutive second preset time period before the termination time of the current second audio frame is relative to the sound energy of each second audio frame.
  • the offset sum of the average sound energy; a calibration factor for determining the average sound energy based on the offset sum; and the target comparison parameter is obtained by calibrating the average sound energy based on the calibration factor.
  • P represents the target comparison parameter of the current second audio frame
  • C represents the calibration factor of the current second audio frame
  • E j represents the sound energy of the current second audio frame
  • var(E) represents the termination time of the current second audio frame.
  • N represents the past continuous second audio frame before the end time corresponding to the current second audio frame.
  • the total number of second audio frames within the preset duration, M represents the total number of sampling points in the current second audio frame, and input i represents the value of the ith sampling point in the current second audio frame.
  • the total number of beats included in the humming audio to be processed is obtained, and the total number of beats is divided by the duration corresponding to the humming audio to be processed.
  • the number is the beats per minute (BPM). After obtaining the BPM, taking 4/4 beat as an example, the duration of each measure can be calculated as 4*60/BPM.
  • the beat is usually detected from the first second audio frame starting from the 1s, that is, starting from the 1s, every 1024 sampling points is used as a second audio. frame, for example, the consecutive 1024 sample points from the 1s as the first second audio frame, and then calculate the sound energy of this second audio frame and the past 1s before the 1024th sample point from the 1s The average sound energy of the sound energy of each second audio frame, and the following operations are performed.
  • Step S12 Determine the chord corresponding to the audio to be processed based on the note information and the beat information per minute.
  • the chord corresponding to the to-be-processed audio may be determined based on the musical note information and the beat per minute information.
  • the preset chords are preset chords, there are corresponding preset chords for different tonality, and the preset chords can support expansion, that is, chords can be added to the preset chords.
  • determining the key of the to-be-processed humming audio based on the note information may specifically include: when the preset adjustment parameters take different values, determining the real-time key corresponding to the note sequence in the note information Then match each real-time key feature with the preset key feature, and determine the real-time key feature with the highest matching degree as the target real-time key feature, and then based on the target real-time key feature corresponding to the The tonality of the humming audio to be processed is determined by the value of the preset adjustment parameter and the corresponding relationship between the value of the preset adjustment parameter and the tonality corresponding to the preset tonality feature that best matches the target real-time tonality feature.
  • the interval relationship between the two tones starting from the tonic is whole tone, whole tone, semitone, whole tone, whole tone, whole tone, semitone, and in the minor key, the interval relationship between the two tones starting from the tonic tone is in order: Whole tone, half tone, whole tone, whole tone, half tone, whole tone, whole tone.
  • the left column (Major Key) shown in FIG. 6 is a major key
  • the right column (Minor Key) is a minor key, wherein, "#" in the table represents one semitone sharp, and "b" represents one semitone flat. That is, there are a total of 12 major keys, namely C major, C# major, D major, D# major, E major, F major, F# major, G major, G# major, A major , A# major, B major.
  • Shift can be used to represent the preset adjustment parameter, and shift can be 0-11.
  • the preset adjustment parameter takes different values
  • the modulo value of each note in the note sequence in the note information is determined by the third operation formula, and the preset adjustment parameters are at the current values
  • the modulus value corresponding to each note is used as the real-time tonality feature corresponding to the note sequence in the note information, wherein, the third operation formula is:
  • M i represents the modulo value corresponding to the ith note in the note sequence
  • note_array[i] represents the MIDI value of the ith note in the note sequence
  • % represents the modulo operation
  • shift represents the preset adjustment parameter, take 0 to 11.
  • the preset tonal characteristics are the tonal characteristics of C major (0 2 4 5 7 9 11 12) and the tonal characteristics of C minor (0 2 3 5 7 8 10 12). Specifically, it is to match each real-time tonal feature with the above two tonal features, and see which real-time tonal feature has the largest number of modulo values that fall into the two preset tonal features. The determination is to determine the real-time tonality characteristic of the target.
  • the real-time tonal features S, H, and X all include 10 modulo values, and then the modulo values of the real-time tonal feature S falling into the tonal features in the key of C major are 10, and the modulo values in the tonal features in the key of C minor
  • the modulo value in the key is 5; the modulo value of the real-time tonal feature H falling into the tonal feature of C major is 7, and the modulo value of the modulo value falling into the tonal feature of C minor is 4;
  • the real-time tonal feature is The modulo value of X falling into the tonal character of C major is 6, and the modulo value of X falling into the tonal character of C minor is 8. Then the real-time tonal feature S and the key of C major have the highest matching degree, and the real-time tonal feature S is determined to determine the target real-time tonal feature.
  • the corresponding relationship between the preset adjustment parameters and the key of C major is: when shift is 0, it corresponds to C major; when shift is 1, it corresponds to B major; when shift is 2, it corresponds to A# major; when shift is 3, it corresponds to A major; when shift is 4, it corresponds to G# major; when shift is 5, it corresponds to G major; when shift is 6, it corresponds to F# major When shift takes 7, it corresponds to F major; when shift takes 8, it corresponds to E major; when shift takes 9, it corresponds to D# major; when shift takes 10, it corresponds to D major; When shift takes 11, it corresponds to C# major.
  • the corresponding relationship between the preset adjustment parameter value and the key of C minor is: when shift is 0, it corresponds to C minor; when shift is 1, it corresponds to B minor; when shift is 2, it corresponds to A# minor; When shift takes 3, it corresponds to A minor; when shift takes 4, it corresponds to G# minor; when shift takes 5, it corresponds to G minor; when shift takes 6, it corresponds to F# minor; when shift takes 7, it corresponds to F# minor.
  • shift is 8 it corresponds to E minor; when shift is 9, it corresponds to D# minor; when shift is 10, it corresponds to D minor; when shift is 11, it corresponds to C# minor.
  • the value of the preset adjustment parameter corresponding to the target real-time tonality feature and the value of the preset adjustment parameter corresponding to the preset tonality feature that best matches the target real-time tonality feature can be adjusted and adjusted.
  • the sex correspondence determines the tonality of the humming audio to be processed. For example, after the real-time key feature S is determined as the target real-time key feature, since the key that best matches the real-time key feature S is the key of C major, if the shift corresponding to the real-time key feature S is 2, the The humming audio corresponds to the key of A# major.
  • a preselected chord can be determined from the preset chords based on the key of the humming audio to be processed, that is, preset chords corresponding to each key are preset , different tones can correspond to different preset chords, and after determining the key corresponding to the humming audio to be processed, it can be determined from the preset chords according to the key corresponding to the frequency of the humming to be processed Preselected chords.
  • C major is a scale made up of 7 notes, so C is a 7th chord. Details are as follows:
  • the major chord is 1 3 5 on the tonic.
  • the upper tonic is a 246 minor triad.
  • the alto is the 3 5 7 minor triad.
  • the subdominant is 4 6 1 major triad.
  • the dominant chord is the 572 major chord.
  • the lower tenor is the 6 1 3 minor triad.
  • the leading tone is the 7 2 4 diminished triad.
  • C major has three major triads, C is (1), F is (4), G is (5), three minor triads, Dm is (2), Em is (3) , Am is (6), a diminished triad, and Bdmin is (7).
  • C is (1)
  • F is (4)
  • G is (5)
  • Dm is (2)
  • Em is (3)
  • Am is (6)
  • Bdmin is (7).
  • m is a minor triad and dmin is a diminished chord.
  • the C minor chords include: Cm (1-b3-5), Ddim (2-4-b6), bE (b3-5-7), Fm (4-b6-1), G7 (5-7-2- 4), bA (b6-1-b3), bB (b7-b2-4).
  • the minor triads C#, E, G# with C# as the root the minor triads F#, A, C# with F# as the root
  • the minor triads G#, B, D# with G# as the root and Major triads rooted at E, A, and B, respectively, and major and minor seventh chords rooted at E, A, and B, respectively.
  • the 9 chords in the above table are determined as the preselected chords corresponding to the humming audio of the audio to be processed, and then based on the note information and the per minute
  • the beat information determines the chords corresponding to the audio to be processed from the preselected chords. Specifically, based on the beat information per minute, the notes in the note information are divided into different sections according to time series; Matching with each of the preselected chords, the chord corresponding to each measure is determined, so as to determine the chord corresponding to the audio to be processed.
  • the notes in the first measure are E, F, G#, D#, and for a major triad, the interval relationship is 0, 4, and 7.
  • chord corresponding to each measure in the humming audio to be processed is determined, the chord corresponding to the humming audio to be processed is obtained.
  • Step S13 Generate a MIDI file corresponding to the to-be-processed humming audio according to the note information and the beat information per minute.
  • a MIDI file corresponding to the humming audio to be processed may be generated according to the note information and the beat per minute information.
  • MIDI Musical Instrument Digital Interface, Musical Instrument Digital Interface
  • MIDI files do not sample the audio, but instead record each note of the music as a number, so are much smaller compared to wave files.
  • the MIDI standard specifies the mixing and articulation of various tones, instruments, and the output device can re-synthesize these numbers into music.
  • the BPM corresponding to the to-be-processed humming audio is obtained by combining the calculation, that is, the rhythm information is obtained, and the start and end times of the note sequence are obtained, which can be encoded into a MIDI file according to the MIDI format.
  • Step S14 Generate chord accompaniment audio corresponding to the to-be-processed humming audio according to the beat information per minute, the chords, and the acquired chord accompaniment parameters.
  • the chord accompaniment audio corresponding to the humming audio to be processed can be generated according to the beat information per minute, the chord and the pre-acquired chord accompaniment parameters, wherein , the chord accompaniment parameters are chord accompaniment generation parameters set by the user.
  • the chord accompaniment parameters may be default chord accompaniment generation parameters selected by the user, or may be chord accompaniment generation parameters specifically set by the user.
  • Step S15 Output the MIDI file and the chord accompaniment audio.
  • the MIDI file and the chord accompaniment audio can be output.
  • the outputting the MIDI file and the chord accompaniment audio may be transmitting the MIDI file and the chord accompaniment audio from one device to another device, or outputting the MIDI file and the chord accompaniment audio to a Storage in a specific path, and external playback of the MIDI file and the chord accompaniment audio, etc., are not specifically limited here, and can be determined according to specific circumstances.
  • the application first obtains the humming audio to be processed, and obtains the music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute, and then based on the note information, the The minute beat information determines the chord corresponding to the to-be-processed audio, and then generates a MIDI file corresponding to the to-be-processed humming audio according to the note information and the per-minute beat information, and according to the per-minute beat information, the The chords and the pre-acquired chord accompaniment parameters generate chord accompaniment audio corresponding to the to-be-processed humming audio, and then the MIDI file and the chord accompaniment audio can be output.
  • the application can obtain the corresponding music information after obtaining the humming audio to be processed.
  • the resulting MIDI files are analyzed, so it is not easy to cause the error accumulation problem caused by converting the audio into MIDI files first.
  • the present application not only generates the MIDI file corresponding to the main melody of the humming audio to be processed, but also directly generates the chord accompaniment audio corresponding to the humming audio to be processed. In this way, since the performance of the chord accompaniment audio on the audio equipment is relatively low, it can make The experience of different users is consistent, and the expected user experience effect is obtained.
  • the chord accompaniment audio corresponding to the to-be-processed humming audio is generated according to the beat information per minute, the chords, and the chord accompaniment parameters obtained in advance, which may specifically include:
  • Step S21 Determine whether the chord parameters in the chord accompaniment parameters represent common chords.
  • chord parameters in the obtained chord accompaniment generation parameters represent common chords. If so, it means that the chords in the chords determined above need to be optimized, so as to solve the chord dissonance caused by the user's humming error. question. If the chord parameter represents a free chord, the chord can be directly used as the optimized chord.
  • Step S22 If the chord parameters in the chord accompaniment parameters represent common chords, optimize the chords according to the common chord groups in the preset common chord library to obtain optimized chords.
  • the chord parameter represents a common chord
  • the chord needs to be optimized according to the common chord group in the preset common chord library to obtain an optimized chord.
  • Optimizing the chords by presetting the common chord groups in the common chord library can make the obtained optimized chords less likely to appear in the dissonant chords caused by out-of-tune in the humming audio to be processed, so that the final generation of The chord accompaniment audio is more in line with the user's listening experience.
  • chords are grouped to obtain different chord groups; the current chord group is respectively matched with each common chord group corresponding to the key in the preset common chord library, and the one with the highest matching degree is matched.
  • the common chord group is determined as the optimized chord group corresponding to the current chord group, until the optimized chord group corresponding to each chord group is determined, and the optimized chord is obtained.
  • the current chord group is respectively matched with each common chord group corresponding to the tonality in the preset common chord library to obtain the matching degree between the current chord group and each common chord group, and the common chord group with the highest matching degree is matched.
  • the chord group is determined as the optimized chord group corresponding to the current chord group, until the optimized chord group corresponding to each chord group is determined, and the optimized chord is obtained.
  • chords are grouped to obtain different chord groups. Specifically, every four chords in the chords are divided into a chord group. If an empty chord appears without four consecutive chords, then How many consecutive chords there are directly divide these chords into a chord group.
  • chords are C, E, F, A, C, A, B, W, G, D, C, where W represents an empty chord
  • first divide C, E, F, A into a chord group and then Divide C, A, B into a chord group, and then divide G, D, C into a chord group.
  • the common chord groups in the common chord library include 9 chord groups corresponding to major keys, and 3 chord groups corresponding to minor keys. Of course, it can include more or less common chord groups, and For other common chord group styles, the specific common chord group is not specifically limited here, and can be set according to the actual situation.
  • the current chord group is matched with the chord of the corresponding position in the first common chord group, and the corresponding distance difference is determined, wherein the distance difference is the absolute value of the actual distance difference, and the current chord group and the first chord group are obtained.
  • the distance difference between each chord in a common chord group is summed, until the current chord group is matched with each common chord corresponding to the tonality of the humming audio to be processed, and the minimum distance difference is added to the corresponding common chord group.
  • the common chord group with the highest matching degree is determined, that is, the optimized chord group corresponding to the current chord group.
  • a common chord group consists of 4 chords (ie, 4 bars, 16 beats).
  • the original recognized chord is (W, F, G, E, B, W, F, G, C, W)
  • W is an empty chord without sound
  • C, D, E, F, G, A, B correspond to 1 respectively , 2, 3, 4, 5, 6, 7, after adding m, are the same as their corresponding values, for example, C and Cm are both corresponding to 1.
  • the distance difference between F, G, C and the first three chords of the second major chord (F, G, C, Am) is 0, the smallest, then the final result is (W, F, G, Em, Am, W, F, G, C, W), the distance difference is the same as the smallest, and the serial number is taken first. For example, when the sum of the distance difference between the chord group and the 2nd major chord (F, G, C, Am) and the 1st chord (F, G, Em, Am) is 2, the 1st chord ( F, G, Em, Am) as the optimized chord group corresponding to the current chord group.
  • Step S23 Convert the optimized chords into optimized notes according to the pre-obtained correspondence between chords and notes.
  • the optimized chords need to be converted into optimized notes according to the pre-obtained correspondence between the chords and the notes. Specifically, it is necessary to have a pre-acquired correspondence between chords and notes, so that after the optimized chord is obtained, the optimized chord can be converted into an optimized note according to the corresponding relationship between the chord and the note.
  • chords can be made more harmonious, and the chord dissonance caused by the user's out-of-tune when humming is avoided, so that the obtained chord accompaniment sounds more in line with the user's music experience.
  • one chord corresponds to 4 notes
  • one note per beat is common, that is, one chord generally corresponds to 4 beats.
  • arpeggios For playing notes through the guitar, arpeggios need to be added, and arpeggiated chords generally correspond to 4 to 6 notes.
  • arpeggiated chords generally correspond to 4 to 6 notes.
  • the corresponding relationship of specific arpeggios converted into piano notes is shown in Figure 9.
  • Step S24 Determine the audio material information corresponding to each note in the optimized notes according to the musical instrument type parameter and the musical instrument pitch parameter in the chord accompaniment parameters, and adjust the audio material information corresponding to the audio material information according to the preset mixing rules. Audio material is mixed.
  • the audio material information corresponding to each note in the optimized notes may be determined according to the instrument type parameter and the instrument pitch parameter in the chord accompaniment parameters, wherein the audio material information includes a material identifier, a sound height, starting playback position and material duration, put the audio material information into the preset sounding array according to the preset mixing rules, and set the preset sounding array pointed to by the audio material information of the current beat in the preset sounding array
  • the audio material in the audio material library is mixed, wherein the beat is determined according to the beat per minute information.
  • the rhythm information of the chord accompaniment audio is obtained, that is, through the beat information per minute, it can be determined how many notes need to be played evenly in each minute, because the optimized notes It is a sequence of notes, each note is arranged in chronological order, the time corresponding to each optimized note can be determined, that is, the position of each optimized note can be determined, under normal rhythm (BPM less than or equal to 200) time) a beat corresponds to a note, so the corresponding audio material information is put into the preset sounding array according to the preset mixing rules, and the preset audio frequency pointed to by the audio material information of the current beat in the preset sounding array The audio material in the material library is mixed.
  • the audio material information in the preset sounding array points to the end of the audio material, it means that this audio material is mixed this time, and the corresponding audio material information is emitted from the preset sounding removed from the array. If the optimized note sequence is about to end, it is determined whether there is a guitar in the musical instrument corresponding to the instrument type parameter, and if so, a corresponding arpeggio is added.
  • the preset sounding array records the material information that needs to be mixed for the current beat (mainly the material identification—each material content file corresponds to a unique identification, playback start position and material length).
  • the audio material identifier corresponds to the mapping table of the audio material.
  • Guitar accompaniment plays are based on chord patterns extracted from the audio.
  • the optimized chord sequence is obtained, and then the optimized chord sequence is converted into the notes of each beat according to the rhythm rules for mixing.
  • the BPM exceeds 200 it will switch to chorus mode.
  • the current chord will be played in beats 2 and 4 including all the remaining notes, while beat 3 will clear the current vocal array and add cut sound and board material.
  • Chorus mode brings a more cheerful mode.
  • chord instruments and the guitar is explained as an example.
  • the next chord at the normal rate corresponds to exactly one measure, and each chord has 4 notes, so exactly one note is played per beat.
  • BPM exceeds 200 (i.e. ⁇ 0.3s per beat, fast rhythm mode)
  • it is set to chorus mode
  • the first note of the chord is played on the first beat
  • the 2nd, 3rd, and 3rd notes of the chord are played simultaneously on the second beat. 4 notes.
  • the third beat plays the board and cut material, and removes all the remaining guitar audio material information in the vocal array.
  • the fourth beat operates in the same way as the second beat to create a cheerful atmosphere.
  • an arpeggio related to the last non-empty chord is added, which is 4-6 notes (related to the chord type, prior art), playing a Measures, take the measure of 4 beats, the arpeggio of 6 notes as an example, play the first 5 notes in the first two beats, that is, play the next note after each note is played 0.4 beats, and then play the next note on the third beat Play the last note at the beginning until the end of the measure for 2 beats.
  • Kick and Cajon The rhythm of the drum is divided into two types, Kick and Snare.
  • the kick drum hits harder and the snare hits lightly; the box drum is the opposite.
  • Kick timbres are measured in bars, appearing on the upbeat of the first beat, the 3/4 beat of the second beat, and the backbeat of the third beat; Snare timbres appear on two beats and start on the upbeat of the second beat.
  • the Snare rule is consistent with the bass drum, the Kick tone appears on the upbeat of each beat; the hi-hat and bass appear on the backbeat of each beat, and the tone played by the bass is the corresponding mapping of the guitar tone, and the standard tone is used when there is no mapping.
  • Sand Hammer is divided into two timbres, hard and soft. Both hard and soft timbres have two sounds per beat. Hard sounds on the forebeat and backbeat, and soft sounds on 1/4 and 3/4 beats.
  • a measure of 4 beats, its continuation length can be understood as the interval of [0, 4), 0 is the beginning of the first beat, and 4 is the end of the fourth beat.
  • a timbre corresponds to a corresponding material.
  • the upbeat represents the first half of the beat, for example, the upbeat start time of the first beat is 0, and the upbeat start time of the second beat is 1; when the back beat represents the second half of a beat, That is, the start time of the first beat is 0.5, and the second beat is 1.5. Therefore, 1/4 beat, 3/4 beat, etc. means that the material insertion time is at 0.25, 0.75 of one beat, and so on.
  • Step S25 Write the mixed audio into the WAV file to obtain the chord accompaniment audio corresponding to the humming to be processed.
  • the mixed audio can be written into the WAV file to obtain the chord accompaniment audio corresponding to the humming to be processed. Before writing the mixed audio into the WAV file, you can pass the mixed audio through the compressor to prevent popping and noise after mixing.
  • the flow chart is generated for the chord accompaniment.
  • the user setting parameters that is, obtain the chord accompaniment generation parameters, and also need to obtain audio-related information, that is, the aforementioned beat per minute information and the chords, and then determine whether to apply common chords, that is, determine whether the Whether the chord parameters in the chord accompaniment parameters represent common chords, if so, process empty chords in the chord sequence and skip, match other chords with common chords, and obtain improved chords, that is, optimized chords, will After the optimization, the chord is converted into a note duration sequence for each beat, and it is judged whether the note of this beat is empty.
  • the instrument type parameter in the user setting parameters includes the parameters corresponding to guitar and guzheng, and if so, then pre-set it.
  • Set the corresponding guitar and guzheng information to the sounding array and then add the corresponding audio material information to the sounding data according to the parameters and rules set by the user. If the beat note is empty, add the sounding data directly according to the parameters and rules set by the user.
  • Corresponding audio material information mix the audio source (audio material) pointed to by the audio material information in the sounding array of the current beat for processing by the compressor.
  • the terminal may first obtain the humming audio to be processed, send the acquired humming audio to the corresponding server, and the server will perform subsequent processing to obtain the humming audio to be processed.
  • the MIDI files and chord accompaniment audios corresponding to the audio are sung, and the generated MIDI files and chord accompaniment audios are returned to the terminal, so that the server is used for processing, which can improve the processing speed.
  • each step in the aforementioned audio processing method may also be performed at the terminal.
  • the aforementioned entire audio processing process is performed at the terminal, the service unavailability problem caused by the terminal being unable to connect to the corresponding server due to the disconnection of the network can be avoided. .
  • the music information can also be identified by deploying a neural network and other technologies on the server device, and the extraction problem of the terminal can be solved by means of the network, and the neural network can also be miniaturized and deployed on the terminal device. Deploy to avoid networking issues.
  • a trial version APP Application, mobile phone software
  • the user hums through the microphone, and the terminal device can obtain the audio stream of the humming input by sampling.
  • the audio stream is identified and processed.
  • corresponding music information such as BPM, chords, and note pitches are obtained immediately.
  • the obtained music information is displayed in the form of a musical score.
  • the user can choose four styles of national style, folk music, playing and singing, and electronic music according to their own preferences, or freely choose the rhythm speed, chord mode, the instrument used and the amount of Loudness, after acquiring these chord generation parameters in the background, chord accompaniment audio can be generated according to these chord generation parameters, and a MIDI file corresponding to the user's humming audio can be generated according to the music information.
  • chord accompaniment audio can be generated according to these chord generation parameters, and a MIDI file corresponding to the user's humming audio can be generated according to the music information.
  • the user can hum a few sentences into the microphone at will, that is, obtain the corresponding humming audio to be processed.
  • users can experience the accompaniment effects of various instruments. You can also try different built-in genres or styles, and you can combine guzheng, guitar, drums and other instruments arbitrarily to enrich the melody and generate the most suitable accompaniment.
  • the melody generated by the user's humming audio is perfectly combined with the synthesized chord accompaniment to form excellent musical works and store them.
  • More usage scenarios can be developed, such as building user communities, so that users can upload their own works Communicate; collaborate with professionals, upload more instrument style templates, and more.
  • the operation method of the function in the above picture is simple, and it can make full use of the fragmented time of users; users can be a broad group of young people who like music, not limited to professional groups, and have a wider audience; with a younger interface, it will attract more and by adjusting the audio track editing method of the existing professional music software, the user interaction is simplified, so that mainstream non-professionals can get started faster.
  • an audio processing apparatus including:
  • the audio acquisition module 201 is configured to acquire the humming audio to be processed, and obtain music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute;
  • a chord determination module 202 configured to determine a chord corresponding to the audio to be processed based on the note information and the beat information per minute;
  • MIDI file generation module 203 for generating the MIDI file corresponding to the humming audio to be processed according to the note information and the beat information per minute;
  • the chord accompaniment generation module 204 is used to generate the chord accompaniment audio corresponding to the humming audio to be processed according to the beat information per minute, the chord and the obtained chord accompaniment parameter, wherein the chord accompaniment parameter is set by the user chord accompaniment generation parameters;
  • the output module 205 is configured to output the MIDI file and the chord accompaniment audio.
  • the application first obtains the humming audio to be processed, and obtains the music information corresponding to the humming audio to be processed, wherein the music information includes note information and beat information per minute, and then based on the note information, the The minute beat information determines the chord corresponding to the to-be-processed audio, and then generates a MIDI file corresponding to the to-be-processed humming audio according to the note information and the per-minute beat information, and according to the per-minute beat information, the The chords and the pre-acquired chord accompaniment parameters generate chord accompaniment audio corresponding to the to-be-processed humming audio, and then the MIDI file and the chord accompaniment audio can be output.
  • the application can obtain the corresponding music information after obtaining the humming audio to be processed.
  • the resulting MIDI files are analyzed, so it is not easy to cause the error accumulation problem caused by converting the audio into MIDI files first.
  • the present application generates the MIDI file corresponding to the main melody of the humming audio to be processed, and directly generates the chord accompaniment audio corresponding to the humming audio to be processed, so that the performance of the chord accompaniment audio on the audio equipment is relatively low.
  • the experience of different users is consistent, and the expected user experience effect is obtained.
  • FIG. 13 is a schematic structural diagram of an electronic device 30 according to an embodiment of the present application, and the user terminal may specifically include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
  • the electronic device 30 in this embodiment includes: a processor 31 and a memory 32 .
  • the processor 31 may include one or more processing cores, such as a quad-core processor, an octa-core processor, and the like.
  • the processor 31 can be implemented by at least one hardware selected from DSP (digital signal processing, digital signal processing), FPGA (field-programmable gate array, field programmable array), and PLA (programmable logic array, programmable logic array).
  • the processor 31 may also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (central processing unit, central processing unit); the co-processor is A low-power processor for processing data in a standby state.
  • the processor 31 may be integrated with a GPU (graphics processing unit, image processor), and the GPU is used for rendering and drawing images that need to be displayed on the display screen.
  • the processor 31 may include an AI (artificial intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • Memory 32 may include one or more computer-readable storage media, which may be non-transitory. Memory 32 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In this embodiment, the memory 32 is used to store at least the following computer program 321, wherein, after the computer program is loaded and executed by the processor 31, the steps of the audio processing method disclosed in any of the foregoing embodiments can be implemented.
  • the electronic device 30 may further include a display screen 33 , an input/output interface 34 , a communication interface 35 , a sensor 36 , a power supply 37 and a communication bus 38 .
  • FIG. 13 does not constitute a limitation on the electronic device 30, and may include more or less components than those shown.
  • an embodiment of the present application further discloses a computer-readable storage medium for storing a computer program, wherein the computer program implements the audio processing method disclosed in any of the foregoing embodiments when the computer program is executed by a processor.
  • a software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
  • RAM random access memory
  • ROM read only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

一种音频处理方法、装置、电子设备(30)、介质,方法包括:获取待处理哼唱音频,得到待处理哼唱音频对应的音乐信息(S11),其中,音乐信息包括音符信息和每分钟节拍信息;基于音符信息、每分钟节拍信息确定待处理音频对应的和弦(S12);根据音符信息和每分钟节拍信息生成待处理哼唱音频对应的MIDI文件(S13);根据每分钟节拍信息、和弦和预先获取到的和弦伴奏参数生成待处理哼唱音频对应的和弦伴奏音频(S14);输出MIDI文件及和弦伴奏音频(S15)。这样能够生成用户哼唱音频对应的旋律节奏以及和弦伴奏音频,且不易产生累计误差,使得不同用户的音乐体验一致。

Description

一种音频处理方法、装置、设备及介质
本申请要求于2020年11月03日提交中国专利局、申请号为202011210970.6、发明名称为“一种音频处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种音频处理方法、装置、设备、介质。
背景技术
在原创歌曲的创作中,需要由专业的音乐人对曲谱进行配和弦,并录制由专业的乐器演奏家演奏的主旋律以及和弦伴奏,这样对相关人员的音乐知识要求很高,且整个过程耗时长,成本高。
为解决上述问题,现有技术主要是,先将采集到的用户音频转换为MIDI(Musical Instrument Digital Interface,乐器数字接口)文件,然后对MIDI文件进行分析,以生成和弦伴奏对应的MIDI文件。
发明人发现以上现有技术中至少存在如下问题,上述现有技术依赖MIDI文件作为输入与输出,需要使用其他方法对输入采样处理为MIDI文件。这会因为MIDI文件信息量较少、识别转换不完全准确等原因产生积累误差。同时,最后只生成MIDI文件,而MIDI文件播放依赖音频设备的性能,易产生音频音色失真的问题,从而可能达不到预期的效果,在传播过程中使得用户体验不一致。
发明内容
有鉴于此,本申请的目的在于提供一种音频处理方法、装置、设备、介质,能够生成用户哼唱音频对应的旋律节奏和和弦伴奏音频,且不易产生累计误差,使得不同用户的音乐体验一致。其具体方案如下:
为实现上述目的,第一方面,提供了一种音频处理方法,包括:
获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其 中,所述音乐信息包括音符信息和每分钟节拍信息;
基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦;
根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件;
根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数;
输出所述MIDI文件及所述和弦伴奏音频。
可选地,所述获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,包括:
获取待处理哼唱音频;
确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,并基于所述目标基音周期确定出各个第一音频帧对应的音符信息,其中,所述第一音频帧为时长等于第一预设时长的音频帧;
确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,其中,所述第二音频帧为包括预设数量个采样点的音频帧。
可选地,所述确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,包括:
利用短时自相关函数和预设清浊音检测方法确定所述待处理哼唱音频中各个第一音频帧的目标基音周期。
可选地,所述利用短时自相关函数和预设清浊音检测方法确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,包括:
利用短时自相关函数确定所述待处理哼唱音频中各个第一音频帧的预选基音周期;
利用预设清浊音检测方法确定各个所述第一音频帧是否为浊音帧;
如果所述第一音频帧为浊音帧,则将所述第一音频帧对应的预选基音周期确定为所述第一音频帧对应的目标基音周期。
可选地,所述基于所述目标基音周期确定出各个第一音频帧对应的音符信息,包括:
分别基于各个所述目标基音周期确定各个所述第一音频帧的音高;
基于各个所述第一音频帧的音高确定各个所述第一音频帧对应的音符;
将各个第一音频帧对应的音符和各个第一音频帧对应的起止时间确定为各个所述第一音频帧对应的音符信息。
可选地,所述确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,包括:
确定所述待处理哼唱音频中当前第二音频帧的声能以及当前第二音频帧对应的平均声能,其中,所述平均声能为当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能的平均值;
基于所述平均声能构建目标比较参数;
判断当前第二音频帧的声能是否大于所述目标比较参数;
如果当前第二音频帧的声能大于所述目标比较参数,则判定当前第二音频帧为一个节拍,直到所述待处理哼唱音频中的各个第二音频帧检测完成,得到所述待处理哼唱歌曲中的节拍总数,基于所述节拍总数确定出所述待处理哼唱音频对应的每分钟节拍信息。
可选地,所述基于所述平均声能构建目标比较参数,包括:
确定出当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能相对于所述平均声能的偏移量和;
基于所述偏移量和确定所述平均声能的校准因子;
基于所述校准因子对所述平均声能进行校准,得到所述目标比较参数。
可选地,所述基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦,包括:
基于所述音符信息确定出所述待处理哼唱音频的调性;
基于所述待处理哼唱音频的调性从预设和弦中确定出预选和弦;
基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦。
可选地,所述基于所述音符信息确定出所述待处理哼唱音频的调性,包括:
在预设调节参数取不同值时,确定出所述音符信息中的音符序列对应的实时调性特征;
将各个实时调性特征与预设调性特征相匹配,并将匹配度最高的实时调性特征确定为目标实时调性特征;
基于所述目标实时调性特征对应的所述预设调节参数的取值以及与所述目标实时调性特征最匹配的预设调性特征对应的预设调节参数取值与调性对应关系确定出所述待处理哼唱音频的调性。
可选地,所述基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦,包括:
基于所述每分钟节拍信息将所述音符信息中的音符按照时间序列划分为不同的小节;
将各个小节的音符分别与各个所述预选和弦进行匹配,确定出各个小节对应的和弦,以确定出所述待处理音频对应的和弦。
可选地,所述根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,包括:
判断所述和弦伴奏参数中的和弦参数是否表示常用和弦;
如果所述和弦伴奏参数中的和弦参数表示常用和弦,则根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦;
根据预先获取到的和弦和音符对应关系将所述优化后和弦转换为优化后音符;
根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音;
将混合后音频写入WAV文件中,得到待处理哼唱对应的和弦伴奏音频。
可选地,所述根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦,包括:
基于所述音符信息确定出所述待处理哼唱音频的调性;
对所述和弦进行分组,得到不同的和弦组;
分别将当前和弦组与预设常用和弦库中的与所述调性对应的各个常用和弦组进行匹配,并将匹配度最高的常用和弦组确定为当前和弦组对应的优化后和弦组,直到确定出各个和弦组对应的优化后和弦组,得到优化后和弦。
可选地,所述根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音,包括:
根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,其中,所述音频素材信息包括素材标识、音高、起始播放位置以及素材时长;
将所述音频素材信息按照预设混音规则放入预设发声数组中,并对当前节拍在所述预设发声数组中的音频素材信息指向的预设音频素材库中的音频素材进行混音,其中,节拍根据所述每分钟节拍信息确定。
第二方面,提供了一种音频处理装置,包括:
音频获取模块,用于获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息;
和弦确定模块,用于基于所述音符信息、所述每分钟节拍信息确定出所述待处理音频对应的和弦;
MIDI文件生成模块,用于根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件;
和弦伴奏生成模块,用于根据所述每分钟节拍信息、所述和弦和获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数;
输出模块,用于输出所述MIDI文件及所述和弦伴奏音频。
第三方面,提供了一种电子设备,包括:
存储器和处理器;
其中,所述存储器,用于存储计算机程序;
所述处理器,用于执行所述计算机程序,以实现前述公开的音频处理方法。
第四方面,本申请公开了一种计算机可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述公开的音频处理方法。
可见,本申请先获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息,然后基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦,再根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件,并根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,然后便可以输出所述MIDI文件及所述和弦伴奏音频。由此可见,本申请在获取到待处理哼唱音频之后,便可以得到对应的音乐信息,相比于前述现有技术,不需要先将待处理哼唱音频转换成MIDI文件,然后再对转化成的MIDI文件进行分析,所以也就不易造成先将音频转换成MIDI文件带来的误差累积问题。此外,不仅需要根据音乐信息生成主旋律音频对应的MIDI文件,还需要根据音乐信息以及和弦生成对应的和弦伴奏音频,相比于前述现有技术中的只是生成和弦伴奏对应的MIDI文件带来的用体验不一致问题,本申请通过既生成待处理哼唱音频主旋律对应的MIDI文件,又直接生成待处理哼唱音频对应的和弦伴奏音频,这样由于和弦伴奏音频对音频设备的性能依赖较低,能够使得不同用户的体验一致,得到预期的用户体验效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请提供的音频处理方案所适用的系统框架示意图;
图2为本申请公开的一种音频处理方法流程图;
图3为本申请公开的一种音频处理方法流程图;
图4为本申请公开的一种音符对照图;
图5为本申请公开的一种音符检测结果图;
图6为本申请公开的一种主音表;
图7为本申请公开的一种具体的音频处理方法流程图;
图8为一种和弦和音符对照表;
图9为一种琶音和音符对照表;
图10为本申请公开的一种具体的音频素材混合流程图;
图11a为本申请公开的一种APP应用界面;
图11b为本申请公开的一种APP应用界面;
图11c为本申请公开的一种APP应用界面;
图12为本申请公开的一种音频处理装置结构示意图;
图13为本申请公开的一种电子设备结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,先对本申请的音频处理方法所适用的系统框架进行介绍。可以理解的是,本申请实施例中并不对计算机设备的数量进行限定,其可以是多个计算机设备共同协作完成音频处理功能。在一种可能的情况中,请参考图1。由图1可知,该硬件组成框架可以包括:第一计算机设备101、第二计算机设备102。第一计算机设备101与第二计算机设备102之间通过网络103实现通信连接。
在本申请实施例中,在此不具体限定第一计算机设备101与第二计算机设备102的硬件结构,第一计算机设备101与第二计算机设备102两者进行数 据交互,实现音频处理功能。进一步,本申请实施例中并不对网络103的形式进行限定,如,网络103可以是无线网络(如WIFI、蓝牙等),也可以是有线网络。
其中,第一计算机设备101和第二计算机设备102可以是同一种计算机设备,如第一计算机设备101和第二计算机设备102均为服务器;也可以是不同类型的计算机设备,如,第一计算机设备101可以是终端或智能电子设备,第二计算机设备102可以服务器。在又一种可能的情况中,可以利用计算能力强的服务器作为第二计算机设备102来提高数据处理效率及可靠性,进而提高音频处理效率。同时利用成本低、应用范围广的终端或智能电子设备作为第一计算机设备101,用于实现第二计算机设备102与用户之间的交互。
举例说明,请参考图2,终端在获取到待处理哼唱音频后,将待处理哼唱音频发送到所述终端对应的服务器,服务器在接收到所述待处理哼唱音频之后,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息,然后再基于音符信息、每分钟节拍信息确定所述待处理音频对应的和弦,接着还需要根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件,并根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频。然后便可以将生成的MIDI文件和所述和弦伴奏音频输出至终端,终端在接收到用户触发的第一播放指令时,便可以读取获取到的MIDI文件,播放对应的音频,在接收到用户触发的第二播放指令时,便可以播放获取到的和弦伴奏音频。
当然,在实际应用中,也可以由终端来完成整个前述音频处理过程,也即,通过终端的语音采集模块获取到待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息,然后再基于音符信息、每分钟节拍信息确定所述待处理音频对应的和弦,接着还需要根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件,并根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音 频。然后便可以将生成的MIDI文件和所述和弦伴奏音频输出至相应的路径下进行保存,在接收到用户触发的第一播放指令时,便可以读取获取到的MIDI文件,播放对应的音频,在接收到用户触发的第二播放指令时,便可以播放获取到的和弦伴奏音频。
参见图3所示,本申请实施例公开了一种音频处理方法,该方法包括:
步骤S11:获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息。
在具体的实施过程中,需要先获取待处理哼唱音频,其中,所述待处理哼唱音频可以为通过语音采集设备采集到的用户哼唱的音频,以得到所述待处理哼唱音频对应的音乐信息。具体的,可以先获取待处理哼唱音频,然后对获取到的待处理哼唱音频进行音乐信息检索,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息。
其中,音乐信息检索(Music Information Retrieval)包含对获取到的音频进行音高/旋律提取、自动记谱、节奏分析、和声分析、歌声信息处理、音乐搜索、音乐结构分析、音乐情感计算、音乐推荐、音乐分类、音乐生成中的自动作曲、歌声合成、数字乐器声合成等。
在实际应用中,当前计算机设备获取所述待处理哼唱音频包括通过自身输入单元获取所述待处理哼唱音频,如当前计算机设备通过语音采集模块采集所述待处理哼唱音频,或者当前计算机设备从清唱音频库中获取所述待处理哼唱音频,其中,所述清唱音频库中可以包括预先获取到的不同的用户清唱音频。当前计算机设备也可以通过网络(可以是有线网络或者是无线网络)获取其他设备发送的待处理哼唱音频,当然,本申请实施例中并不限定其他设备(如其他计算机设备)获取所述待处理哼唱音频的方式。例如,其他设备(如终端)可以接收用户通过语音输入模块输入的待处理哼唱音频。
具体的,获取所述待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,包括:获取所述待处理哼唱音频;确定所述待处理哼唱音频中 各个第一音频帧的目标基音周期,并基于所述目标基音周期确定出各个第一音频帧对应的音符信息,其中,所述第一音频帧为时长等于第一预设时长的音频帧;确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,其中,所述第二音频帧为包括预设数量个采样点的音频帧。
也即,可以先确定所述待处理哼唱音频中各个第一音频帧对应的目标基音周期,然后便可以基于所述目标基音周期确定出各个第一音频帧对应的音符信息,在此处的音频分帧方法是将连续第一预设时长的音频分为一个第一音频帧。对于基音检测一般要求一帧至少包含2个以上的周期,通常音高最低50Hz,也即周期最长为20ms,故一个所述第一音频帧的帧长一般要求大于40ms。
其中,确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,包括:利用短时自相关函数和预设清浊音检测方法确定所述待处理哼唱音频中各个第一音频帧的目标基音周期。
人在发音时,根据声带震动可以将语音信号分为清音跟浊音两种,其中,浊音在时域上呈现出明显的周期性。语音信号是非平稳信号,它的特征随时间变化,但在一个很短的时间段内可以认为具有相对稳定的特征即短时平稳性。所以可以利用短时自相关函数和预设清浊音检测方法确定出所述待处理哼唱音频中各个第一音频帧的目标基音周期。
具体的,可以利用短时自相关函数确定所述待处理哼唱音频中各个第一音频帧的预选基音周期;利用预设清浊音检测方法确定各个所述第一音频帧是否为浊音帧;如果所述第一音频帧为浊音帧,则将所述第一音频帧对应的预选基音周期确定为所述第一音频帧对应的目标基音周期。也即,对于当前第一音频帧来说,可以先通过短时自相关函数确定出预选基音周期,然后利用预设清浊音检测方法确定当前第一音频帧是否为浊音帧,如果当前第一音频帧为浊音帧,则将当前第一音频帧的预选基音周期作为当前第一音频帧的目标基音周期,如果当前第一音频帧为清音帧,则将当前第一音频帧的预选基音周期确定为无效基音周期。
其中,所述利用预设清浊音检测方法确定当前第一音频帧是否为浊音 帧,可以通过判断当前第一音频帧上浊音频段上的能量占清浊音频段的能量的比值是否大于或等于预设的能量比阈值来确定出当前第一音频帧是否为浊音帧,所述浊音频段通常为100Hz~4000Hz,清音频段通常为4000Hz~8000Hz,所以所述清浊音频段通常为100Hz~8000Hz。此外,也可以采用其他的清浊音检测方法,在此不做具体的限定。
相应的,在确定出各个所述第一音频帧对应的目标基音周期之后,便可以基于所述目标基音周期确定出各个第一音频帧对应的音符信息。具体的,分别基于各个所述目标基音周期确定各个所述第一音频帧的音高;基于各个所述第一音频帧的音高确定各个所述第一音频帧对应的音符;将各个第一音频帧对应的音符和各个第一音频帧对应的起止时间确定为各个所述第一音频帧对应的音符信息。
将所述基于所述目标基音周期确定出各个第一音频帧对应的音符信息通过第一运算公式表达出来为:
Figure PCTCN2021122559-appb-000001
其中,note表示当前第一音频帧对应的音符,pitch表示当前第一音频帧对应的音高,T当前第一音频帧对应的目标基音周期。
参见图4所示为,音符(note)与钢琴上的音符、频率以及周期的对应关系。通过图4可知,例如,当音高为220Hz时,音符为第57号音符,对应到钢琴音符上为A3音符。
通常计算出来的note为小数,取最接近的整数即可。并同时记录当前音符的起止时间,当未检测到浊音时,则认为是其他干扰或者停顿,并不是有效的哼唱,这样可以得到一串离散分布的音符序列,可以以钢琴卷帘的形式表示如图5所示。
在实际应用中,所述确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,可以具体包括:确定所述待处理哼唱音频中当前第二音频帧的声能以及当前第二音频帧对应的平均声能,其中,所述平均声能为当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能的平 均值;基于所述平均声能构建目标比较参数;判断当前第二音频帧的声能是否大于所述目标比较参数;如果当前第二音频帧的声能大于所述目标比较参数,则判定当前第二音频帧为一个节拍,直到所述待处理哼唱音频中的各个第二音频帧检测完成,得到所述待处理哼唱歌曲中的节拍总数,基于所述节拍总数确定出所述待处理哼唱音频对应的每分钟节拍信息。
其中,基于所述平均声能构建目标比较参数,具体又可以包括:确定出当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能相对于所述平均声能的偏移量和;基于所述偏移量和确定所述平均声能的校准因子;基于所述校准因子对所述平均声能进行校准,得到所述目标比较参数。将上述过程用第二运算公式可以表示为:
P=C·avg(E)
C=-0.0000015var(E)+1.5142857
Figure PCTCN2021122559-appb-000002
Figure PCTCN2021122559-appb-000003
Figure PCTCN2021122559-appb-000004
其中,P表示当前第二音频帧的目标比较参数,C表示当前第二音频帧的校准因子,E j表示当前第二音频帧的声能,var(E)表示当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能相对于所述平均声能的偏移量和,N表示当前第二音频帧对应的结束时间之前的过去连续第二预设时长之内的第二音频帧总数,M表示当前第二音频帧中的采样点总数,input i表示当前第二音频帧中第i个采样点的值。
以每帧1024点为例,先计算当前帧的能量如下:
Figure PCTCN2021122559-appb-000005
然后将该帧的能量存到一个循环buffer中,记录过去1s时长的所有帧能量,以44100Hz采样率为例,则保存43帧的能量,并计算过去1s内平 均能量如下:
Figure PCTCN2021122559-appb-000006
如果当前帧能量E j大于P,则认为检测到了一个节拍(beat),其中P的计算如下:
P=C·avg(E)
C=-0.0000015var(E)+1.5142857
Figure PCTCN2021122559-appb-000007
直到检测完毕,得到所述待处理哼唱音频中包括的节拍总数,将节拍总数除以所述待处理哼唱音频对应的时长,其中,所述时长以分钟为单位,即换算成一分钟的beat数即为每分钟节拍数(beats per minute,BPM)。得到了BPM后,以4/4拍为例,可以计算得到每一小节的时长为4*60/BPM。
在实际应用中,由于前1s的干扰较多,所以通常是从第1s开始的第一个第二音频帧开始检测节拍,也即,从第1s开始,每1024个采样点作为一个第二音频帧,例如,将从第1s开始的连续1024个采样点作为第一个第二音频帧,然后计算这个第二音频帧的声能以及第1s开始的第1024个采样点之前的过去1s之内各个第二音频帧的声能的平均声能,以及进行之后的操作。
步骤S12:基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦。
在确定出所述待处理哼唱音频对应的音乐信息之后,便可以基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦。
具体的,需要先基于所述音符信息确定出所述待处理哼唱音频的调性,然后基于所述待处理哼唱音频的调性从预设和弦中确定出预选和弦,再基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦。其中,所述预设和弦为预先设定的和弦,不同调性有对应的预设和弦,所述预设和弦可以支持扩展,也即可以将向所述预设和弦中 增加和弦。
首先,所述基于所述音符信息确定出所述待处理哼唱音频的调性,可以具体包括:在预设调节参数取不同值时,确定出所述音符信息中的音符序列对应的实时调性特征,然后将各个实时调性特征与预设调性特征相匹配,并将匹配度最高的实时调性特征确定为目标实时调性特征,再基于所述目标实时调性特征对应的所述预设调节参数的取值以及与所述目标实时调性特征最匹配的预设调性特征对应的预设调节参数取值与调性对应关系确定出所述待处理哼唱音频的调性。
在配和弦样式前,首先要确定哼唱的调,也即调性,即需要确定哼唱的主音和调式,调式分为大调和小调,而主音有12个,总共有24个调。大调和小调每个音之间的音程关系分别如下:
Figure PCTCN2021122559-appb-000008
也即,大调时,从主音开始两个音之间的音程关系依次为全音、全音、半音、全音、全音、全音、半音,小调时,从主音开始两个音之间的音程关系依次为全音、半音、全音、全音、半音、全音、全音。
参见图6所示为,大调的12个主音和小调的12个主音。图6表示的左列(Major Key)为大调,右列(Minor Key)为小调,其中,表中的“#”表示升一个半音,“b”表示降一个半音。也即,大调一共12个,分别是C大调、C#大调、D大调、D#大调、E大调、F大调、F#大调、G大调、G#大调、A大调、A#大调、B大调。小调一共12个,分别是A小调、A#小调、B小调、C小调、C#小调、D小调、D#小调、E小调、F小调、F#小调、G小调、G#小调。
可以采用shift表示所述预设调节参数,且shift可以取0-11,在所述预设调节参数取不同值时,确定出所述音符信息中的音符序列对应的实时调性特征。也即,在所述预设调节参数取不同值时,通过第三运算公式确定出所述音符信息中的音符序列中各个音符的模值,将所述预设调节参数在 当前取值下,各个音符对应的模值作为音符信息中的音符序列对应的实时调性特征,其中,所述第三运算公式为:
M i=(note_array[i]+shift)%12
其中,M i表示所述音符序列中第i个音符对应的模值,note_array[i]表示所述音符序列中第i个音符的MIDI数值,%表示取模运算,shift表示所述预设调节参数,取0到11。
在所述预设调节参数取不同值时,得到对应的实时调性特征,将各个所述实时调性特征与预设调性特征相匹配,并将匹配度最高的实时调性特征确定为目标实时调性特征。其中,所述预设调性特征为C大调的调性特征(0 2 4 5 7 9 11 12)以及C小调的调性特征(0 2 3 5 7 8 10 12)。具体的,就是将各个实时调性特征分别与上述两个调性特征进行匹配,看哪一个实时调性特征中的模值落入这两个预设调性特征中的个数最多,将其确定为确定出目标实时调性特征。例如,实时调性特征S、H、X中均包括10个模值,然后实时调性特征S落入C大调的调性特征中的模值为10个,落入C小调的调性特征中的模值为5个;实时调性特征H落入C大调的调性特征中的模值为7个,落入C小调的调性特征中的模值为4个;实时调性特征X落入C大调的调性特征中的模值为6个,落入C小调的调性特征中的模值为8个。则实时调性特征S与C大调的调性特征匹配度最高,则将实时调性特征S确定出目标实时调性特征。
C大调对应的预设调节参数取值与调性对应关系为:当shift取0时,对应为C大调;当shift取1时,对应为B大调;当shift取2时,对应为A#大调;当shift取3时,对应为A大调;当shift取4时,对应为G#大调;当shift取5时,对应为G大调;当shift取6时,对应为F#大调;当shift取7时,对应为F大调;当shift取8时,对应为E大调;当shift取9时,对应为D#大调;当shift取10时,对应为D大调;当shift取11时,对应为C#大调。
C小调对应的预设调节参数取值与调性对应关系为:当shift取0时,对应为C小调;当shift取1时,对应为B小调;当shift取2时,对应为A#小调;当shift取3时,对应为A小调;当shift取4时,对应为G#小调;当 shift取5时,对应为G小调;当shift取6时,对应为F#小调;当shift取7时,对应为F小调;当shift取8时,对应为E小调;当shift取9时,对应为D#小调;当shift取10时,对应为D小调;当shift取11时,对应为C#小调。
所以,便可以基于所述目标实时调性特征对应的所述预设调节参数的取值以及与所述目标实时调性特征最匹配的预设调性特征对应的预设调节参数取值与调性对应关系确定出所述待处理哼唱音频的调性。例如,上述将实时调性特征S确定出目标实时调性特征之后,由于与实时调性特征S最匹配的为C大调,所以如果实时调性特征S对应的shift取2时,该待处理哼唱音频对应为A#大调。
确定出所述待处理哼唱音频的调性之后,便可以基于所述待处理哼唱音频的调性从预设和弦中确定出预选和弦,也即,预先设置各个调性对应的预设和弦,不同的调性可以对应不同的预设和弦,然后在确定出所述待处理哼唱音频对应的调性之后,便可以根据所述待处理哼唱频频对应的调性从预设和弦中确定出预选和弦。
C大调是由7个音组成的音阶,所以C调是7个和弦。具体情况如下:
(1)、主音上的是1 3 5大三和弦。
(2)、上主音上的是2 4 6小三和弦。
(3)、中音上的是3 5 7小三和弦。
(4)、下属音上的是4 6 1大三和弦。
(5)、属音上的是5 7 2大三和弦。
(6)、下中音上的是6 1 3小三和弦。
(7)、导音上的是7 2 4减三和弦。
其中,C大调有三个大三和弦,C也即(1)、F也即(4)、G也即(5),三个小三和弦,Dm也即(2)、Em也即(3)、Am也即(6),一个减三和弦,Bdmin也即(7)。其中,m表示小三和弦,dmin表示减少和弦。
上述7个和弦中所述的主音、上主音、中音、下属音、属音、下中音以及导音的具体概念可以参考现有技术,在此不做具体解释。
而C小调和弦包括:Cm(1-b3-5)、Ddim(2-4-b6)、bE(b3-5-7)、Fm (4-b6-1)、G7(5-7-2-4)、bA(b6-1-b3)、bB(b7-b2-4)。
调性为C#小调时,预设和弦可以见下表一所示,此时不考虑减三和弦:
表一
7种和弦 1 2 3 4 5 6 7
小调音程 0 2 3 5 7 8 10
C#小调音程 1 3 4 6 8 9 11
小三和弦 C#m --   F#m G#m    
大三和弦     E     A B
大小七和弦     E7     A7 B7
具体的,就是预设了以C#为根音的小三和弦C#、E、G#,以F#为根音的小三和弦F#、A、C#,以G#为根音的小三和弦G#、B、D#,以及分别以E、A、B为根音的大三和弦,以及分别以E、A、B为根音的大小七和弦。
当所述待处理哼唱音频为C#小调时,便将上表中的9个和弦确定为所述待处理音频哼唱音频对应的预选和弦,然后便可以基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦,具体的,基于所述每分钟节拍信息将所述音符信息中的音符按照时间序列划分为不同的小节;将各个小节的音符分别与各个所述预选和弦进行匹配,确定出各个小节对应的和弦,以确定出所述待处理音频对应的和弦。
例如,第一小节的音符是E、F、G#、D#,对于大三和弦,音程关系是0、4、7,当所述待处理哼唱音频对应的调性为C#小调时,如果有音符落入E+0,E+4,E+7中,则计数加1,发现E(1)+4=G#,其中E(1)括号中便是当前落入大三和弦E中的音符数,说明当前小节又有一个音符落入大三和弦E,E(2)+7=B,此时可以确定第一小节中有2个音符落入大三和弦E中,统计完第一小节落入所有和弦样式的音符数,找到落入音符数最多的那个和弦样式即为该小节对应的和弦。
直到确定出待处理哼唱音频中各个小节对应的和弦,便得到了所述待 处理哼唱音频对应的和弦。
步骤S13:根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件。
确定出所述待处理哼唱音频对应的和弦之后,便可以根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件。
其中,MIDI(乐器数字接口,Musical Instrument Digital Interface)。大部分可播放音频的数码产品支持播放这类文件。与波形文件不同,MIDI文件不对音频进行抽样,而是将音乐的每个音符记录为一个数字,所以与波形文件相比要小得多。MIDI标准规定了各种音调、乐器的混合及发音,通过输出装置可以将这些数字重新合成为音乐。
结合计算得到了所述待处理哼唱音频对应的BPM,即得到了节奏信息,又获取了音符序列的起止时间,即可按照MIDI的格式编码成MIDI文件。
步骤S14:根据所述每分钟节拍信息、所述和弦和获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频。
确定出所述待处理哼唱音频对应的和弦之后,便可以根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数。在具体的实施过程中,所述和弦伴奏参数可以为用户选择的默认和弦伴奏生成参数,也可以为用户具体设置的和弦伴奏生成参数。
步骤S15:输出所述MIDI文件及所述和弦伴奏音频。
可以理解的是,在生成所述MIDI文件及所述和弦伴奏音频之后,便可以输出所述MIDI文件及所述和弦伴奏音频。其中,所述输出所述MIDI文件及所述和弦伴奏音频可以为将所述MIDI文件及所述和弦伴奏音频从一个设备传输到另外一个设备,或者是将MIDI文件及所述和弦伴奏音频输出到具体路径下进行存储,以及对外播放所述MIDI文件及所述和弦伴奏音频等,在此不做具体限定,可以根据具体情况确定。
可见,本申请先获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息,然后 基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦,再根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件,并根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,然后便可以输出所述MIDI文件及所述和弦伴奏音频。由此可见,本申请在获取到待处理哼唱音频之后,便可以得到对应的音乐信息,相比于前述现有技术,不需要先将待处理哼唱音频转换成MIDI文件,然后再对转化成的MIDI文件进行分析,所以也就不易造成先将音频转换成MIDI文件带来的误差累积问题。此外,不仅需要根据音乐信息生成主旋律音频对应的MIDI文件,还需要根据音乐信息以及和弦生成对应的和弦伴奏音频,相比于前述现有技术中的只是生成和弦伴奏对应的MIDI文件带来的用体验不一致问题,本申请通过既生成待处理哼唱音频主旋律对应的MIDI文件,又直接生成待处理哼唱音频对应的和弦伴奏音频,这样由于和弦伴奏音频对音频设备的性能依赖较低,能够使得不同用户的体验一致,得到预期的用户体验效果。
参见图7所示,根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,具体可以包括:
步骤S21:判断所述和弦伴奏参数中的和弦参数是否表示常用和弦。
首先需要判断获取的和弦伴奏生成参数中的和弦参数是否表示常用和弦,如果是,则表示需要对前述确定出的和弦中的和弦进行优化,以便解决和弦中因用户哼唱错误导致的和弦不和谐问题。如果所述和弦参数表示自由和弦,则可以直接将所述和弦作为所述优化后和弦。
步骤S22:如果所述和弦伴奏参数中的和弦参数表示常用和弦,则根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦。
相应的,当所述和弦参数表示常用和弦时,便需要根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦。通过预设常用和弦库中的常用和弦组对所述和弦进行优化可以使得得到的优化后和弦中不易出现因为所述待处理哼唱音频中的走音等带来的不和谐和弦,使得最终生成的和弦伴奏音频更符合用户的听觉体验。
具体的,就是对所述和弦进行分组,得到不同的和弦组;分别将当前和弦组与预设常用和弦库中的与所述调性对应的各个常用和弦组进行匹配,并将匹配度最高的常用和弦组确定为当前和弦组对应的优化后和弦组,直到确定出各个和弦组对应的优化后和弦组,得到优化后和弦。
也即,分别将当前和弦组与预设常用和弦库中的与所述调性对应的各个常用和弦组进行匹配,得到当前和弦组与各个常用和弦组的匹配度,并将匹配度最高的常用和弦组确定为当前和弦组对应的优化后和弦组,直到确定出各个和弦组对应的优化后和弦组,得到优化后和弦。
其中,对所述和弦进行分组,得到不同的和弦组,可以具体为将所述和弦中的每四个和弦分为一个和弦组,如果没有到连续的四个和弦就出现了空和弦,则可以直接有多少个连续的和弦就将这几个和弦分为一个和弦组。
例如,和弦为C,E,F,A,C,A,B,W,G,D,C,其中W表示空和弦,则先将C,E,F,A分为一个和弦组,然后再将C,A,B分为一个和弦组,然后再将G,D,C分为一个和弦组。
参见下表二所示,常用和弦库中的常用和弦组包括大调对应的9个和弦组,以及小调对应的3个和弦组,当然,可以是包括更多或更少的常用和弦组,以及其他常用和弦组样式,在此不对具体的常用和弦组做具体限定,可以根据实际情况设置。
表二
Figure PCTCN2021122559-appb-000009
将当前和弦组与预设常用和弦库中的与所述调性对应的各个常用和弦 组进行匹配,得到当前和弦组与各个常用和弦组的匹配度。具体为,将当前和弦组合与第一个常用和弦组中的对应位置的和弦进行匹配,确定出对应的距离差,其中,所述距离差为实际距离差的绝对值,得到当前和弦组与第一个常用和弦组中的各个和弦之间的距离差和,直到将当前和弦组与所述待处理哼唱音频的调性对应的各个常用和弦匹配完毕,将最小距离差和对应的常用和弦组确定为匹配度最高的常用和弦组,也即当前和弦组对应的优化后和弦组。
例如,常用和弦组以4个和弦为一组(即4小节,16拍)。假设原始识别和弦为(W,F,G,E,B,W,F,G,C,W),W为空和弦不发声,C、D、E、F、G、A、B分别对应1、2、3、4、5、6、7,加上m之后与自身对应值相同,例如,C和Cm都是对应1。
对于F,G,E,B,假设前述确定出的调性中的调式为大调,在大调中进行匹配,计算距离差和。第1种和弦(F,G,Em,Am),距离差为(0,0,0,1),因此距离差和为1,第2种和弦(F,G,C,Am),距离差为(0,0,2,1),距离差和为3,经过对比,第1种和弦的距离差和最小,因此和弦序列将变为(W,F,G,Em,Am,W,A,F,C,W)。
跳过空拍,F,G,C与第2种大调和弦(F,G,C,Am)前三个和弦的距离差和为0,最小,则最终得到的结果为(W,F,G,Em,Am,W,F,G,C,W),距离差和同样小的取序列序号靠前的。例如,当和弦组与第2种大调和弦(F,G,C,Am)、第1种和弦(F,G,Em,Am)的距离差和都是2时,将第1种和弦(F,G,Em,Am)作为当前和弦组对应的优化后和弦组。
步骤S23:根据预先获取到的和弦和音符对应关系将所述优化后和弦转换为优化后音符。
在得到所述优化后和弦之后,还需要根据预先获取到的和弦和音符对应关系将所述优化后和弦转换为优化后音符。具体的,就是需要有预先获取到的和弦和音符对应关系,这样在得到所述优化后和弦之后,便可以根据所述和弦和音符对应关系将所述优化后和弦转换为优化后音符。
在得到所述优化后和弦之后,可以使得和弦更和谐,避免因为用户哼 唱时的跑调等原因导致的和弦不和谐,使得得到的和弦伴奏听起来更符合用户的音乐体验。
其中,将普通和弦转换成钢琴音符的对应关系可以见图8所示,一个和弦对应4个音符,普通的一拍一个音符,也即一个和弦一般对应4拍。
对于通过吉他弹奏音符时,需要添加琶音,琶音和弦一般对应4到6个音符。具体的琶音转换成钢琴音符的对应关系可以见图9所示。
步骤S24:根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音。
转换成所述优化后音符之后,还需要根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音。
具体的,就是可以将根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,其中,所述音频素材信息包括素材标识、音高、起始播放位置以及素材时长,将所述音频素材信息按照预设混音规则放入预设发声数组中,并对当前节拍在所述预设发声数组中的音频素材信息指向的预设音频素材库中的音频素材进行混音,其中,节拍根据所述每分钟节拍信息确定。
得到前述的每分钟节拍信息(也即BPM)之后,便得到和弦伴奏音频的节奏信息,也即通过每分钟节拍信息可以确定出每分钟之内需要均匀演奏多少个音符,由于所述优化后音符为一个音符序列,各个音符是按照在时间先后顺序排列的,便可以确定出每个优化后音符对应的时间,也即可以确定出每个优化后音符的位置,正常节奏下(BPM小于等于200时)一个节拍对应一个音符,所以将对应的音频素材信息按照预设混音规则放入预设发声数组中,并对当前节拍在所述预设发声数组中的音频素材信息指向的预设音频素材库中的音频素材进行混音。
在具体的实施过程中,如果所述预设发声数组中有的音频素材信息指向音频素材的结尾,则表示这段音频素材本次混和完毕,则将对应的音频 素材信息从所述预设发声数组中移除。如果优化后音符序列要结束了,则判断所述乐器类型参数对应的乐器中是否有吉他,如果是,则添加相应的琶音。
通过对预先处理过的各类乐器不同音符演奏的音频进行混音,获得近似实际弹奏的效果。实际弹奏音符不会瞬间消失,因此需要一套当前发声序列机制,通过给还未播放完的音频素材设置播放指针,存入发声数组,将其与新加入的音频素材混音并通过压限器修正后一同写入输出WAV文件,以达到更接近真实演奏的伴奏生成效果。
预设发声数组记录的为当前节拍需要混音的素材信息(主要为素材标识—每个素材内容文件对应唯一的一个标识、播放起始位置与素材长度),混音流程示例:假设用户哼唱的原始音频BPM经过识别为60,即每拍60/60=1s,以开头的4拍为例,若每拍新增一个音频素材,时长分别为2s、3s、2s、2s,设素材id分别为1,2,1,4(即第一拍与第三拍使用同一个素材)。因此第一拍时,发声数组内情况为[(1,0)],(1,0)表示素材id=1,起始位置为0,将素材id=1的素材0-1秒(起始为0,一拍持续1s,因此结束为1)的信息,经过压限器写入输出(以下简称输出);当第二拍开始时,第一个素材还有1s才结束,起始位置变为1,而第二拍的素材一开始了,此时发声数组内情况为[(1,1),(2,0)],混合素材id=1的素材1-2秒的信息和素材id=2的素材0-1秒的内容,输出;当第三拍开始时,第一拍的素材已播完,弹出发声数组,第三拍的素材id=1与第一拍一致,此时发声数组内情况为[(2,1),(1,0)],混合素材id=2的素材1-2秒的信息和素材id=1的素材0-1秒的内容,输出;当第四拍开始时,发声数组内情况为[(2,2),(1,1),(4,0)],将三个素材对应时间的内容输出;当第四拍结束时,发声数组内情况为[(4,1)],交给下一拍,其他素材信息已结束弹出。
这样采用将音频素材与音频素材信息分离的机制,通过音频素材标识与音频素材的映射表对应。此时,当伴奏中反复出现相同乐器的相同音符时仅需加载一次音频素材,避免了重复读写带来的较大读写延迟,以达到省时的目的。
在实际应用中,对不同乐器的音频素材在混音时,需要有一定的规则,也即所述预设混音规则,其中,以下规则中所说的弹奏是指音频素材信息添加到发声数组,规则如下:
吉他:吉他伴奏弹奏的基础是音频中提取的和弦样式。正常速率下,通过选择是否对常用和弦匹配,得到优化后和弦序列,而后将优化后和弦序列以音律规则转化为每个节拍的音符,以便进行混音。当BPM超过200时,将切换为副歌模式,除第1拍正常外,第2拍与第4拍中会弹奏当前和弦包含剩余所有音符,而第3拍将清除当前发声数组,加入切音与打板素材。副歌模式带来了更为欢快的模式。伴奏结尾时,一个以结尾和弦样式为基准,通过琶音转换原则获得的琶音音节序列,将最后音节拉长为时长半小节,而其他音节在前半小节匀速弹奏完毕,以达到结尾琶音的效果。
古筝:弹奏方式与正常速率下的吉他一致,但不添加琶音。
以上为和弦乐器规则,吉他为例进行解释,例如,一个小节4拍时,正常速率下一个和弦正好对应一个小节,每个和弦4个音符,因此每拍正好弹奏一个音符。
当BPM超过200(即每拍<0.3s,快节奏模式)时,设置为副歌模式,第一拍弹奏和弦的第一个音符,在第二拍则同时弹奏和弦的2、3、4个音符。第三拍弹奏打板与切音素材,并将发声数组中剩余的吉他音频素材信息全部移除,第四拍操作与第二拍一致,以此营造一种欢快的气氛。
在除空和弦以外的和弦序列弹奏结束后,将增加与最后一个非空和弦相关的琶音,该琶音为4-6个音符(与和弦类型相关,为现有技术),弹奏一个小节,以4拍的小节,6个音符的琶音为例,将前5个音符在前两拍中弹奏,即每个音符弹奏0.4拍后弹奏下一个音符,然后在第三拍开始时弹奏最后一个音符,直到该小节结束,持续2拍。
大鼓与箱鼓:鼓的节奏分为Kick与Snare两种音色。大鼓的Kick打击力度较重,Snare打击力度较轻;而箱鼓正相反。Kick音色以小节为单位,分别在第一拍的正拍,第二拍的3/4拍以及第三拍的反拍出现;Snare音色两拍一个,在第二拍正拍开始。
电音:以架子鼓中的定音鼓、踩镲与贝斯合并生成的音色。定音鼓也 分为Kick与Snare两种音色。Snare规则与大鼓一致,Kick音色在每拍的正拍出现;踩镲与贝斯在每拍的反拍出现,其中贝斯弹奏的音调为吉他音的对应映射,没有映射时使用标准音。
沙锤:沙锤分为hard和soft两种音色,hard与soft音色均为一拍两个,hard在正拍上和反拍上发声,soft在1/4拍和3/4拍上发声。
以上打击乐器规则,对上述打击乐器的规则进行解释:一个4拍的小节,其延续长度可以理解为[0,4)的区间,0为第一拍开头,4为第四拍结尾。一种音色对应相应的一种素材,正拍代表节拍前半部分时,如第一拍的正拍开始时间为0,第二拍正拍开始时间为1;反拍代表一拍后半部分时,即第一拍反拍开始时间为0.5,第二拍为1.5。因此1/4拍,3/4拍等即代表素材插入时间处于一拍的0.25、0.75处,以此类推。
步骤S25:将混合后音频写入WAV文件中,得到待处理哼唱对应的和弦伴奏音频。
将对应的音频素材进行混音之后,可以将混合后音频写入WAV文件中,得到待处理哼唱对应的和弦伴奏音频。在将混合后音频写入WAV文件中之前,可以先将混合后音频通过压限器,以防止混音后出现爆音、杂音。
参见图10所示,为和弦伴奏生成流程图。首先读取用户设置参数,也即获取所述和弦伴奏生成参数,还需要获取音频相关信息,也即,前述的每分钟节拍信息、所述和弦,然后判断是否套用常用和弦,也即,判断所述和弦伴奏参数中的和弦参数是否表示常用和弦,如果是,则处理和弦序列中的空和弦并跳过,对其他的和弦与常用和弦匹配,获取改进的和弦,也即,优化后和弦,将优化后和弦转化为每拍音符时长序列,判断此拍音符是否为空,如果否,则先判断所述用户设置参数中的乐器类型参数是否包括吉他、古筝对应的参数,如果是,则再预设发声数组中添加相应的吉他、古筝信息,然后依据用户设置参数和规则在发声数据中添加相应的音频素材信息,如果此拍音符为空,则直接依据用户设置参数和规则在发声数据中添加相应的音频素材信息,混合当前节拍在发声数组中的音频素材信息指向的音源(音频素材),供压限器处理,压限器消除爆音杂音后,写入WAV文件中,判断发声数组中是否有音频素材信息指向音频素材结尾, 如果是,则发声数组中去除已结束音频素材信息,如果否,则判断节拍序列是否结束,如果是,则判断对应的乐器是否有吉他,如果有,则添加琶音,然后结束,如果没有,则直接结束。
在实际的实施过程中,前述音频处理方法中可以先由终端获取待处理哼唱音频,将获取到的待处理哼唱音频发送到对应的服务器,由服务器进行后续处理,得到所述待处理哼唱音频对应的MIDI文件和和弦伴奏音频,再将生成的MIDI文件和和弦伴奏音频返回到所述终端,这样利用服务器进行处理,可以提高处理速度。
或者,前述音频处理方法中各个步骤也可以均在终端进行,当前述的整个音频处理过程都在终端进行时,可以避免由于断网时,终端连接不到对应的服务器带来的服务不可用问题。
在对所述待处理哼唱音频进行音乐信息检索时,也可以通过在服务器设备部署神经网络等技术识别音乐信息,借助网络解决终端的提取问题,也可以将神经网络小型化后部署在终端设备部署避免联网问题。
参见图11所示,为前述的音频处理方法的具体实现,以试用版APP(Application,手机软件)为例。首先通过图11a所示的首页进入之后,用户通过麦克风进行哼唱,终端设备即可通过采样得到哼唱输入的音频流。对音频流进行识别、处理,当哼唱完毕后,BPM、和弦、音符音高等相应音乐信息随即被获取,参见图11b所示,以乐谱的形式对获取到的音乐信息进行了展示。随后,参见图11c所示,用户可以根据自己的喜好选择国风、民谣、弹唱、电音四种风格,或通过自定义方式,自由选择节奏快慢、和弦模式,使用的乐器及其所占的响度,后台在获取到这些和弦生成参数之后,便可以根据这些和弦生成参数生成和弦伴奏音频,以及根据音乐信息生成用户哼唱音频对应的MIDI文件。这样将通过用户选择的参数,结合使用MIR技术获取的音乐信息,生成相应的符合原始哼唱音频的旋律节奏与音符的伴奏音频,供用户聆听。
这样用户在使用上图中的应用时,可以对麦克风随意哼唱几句,即获得相应的待处理哼唱音频。再通过简单的参数设置,用户即可体验多种乐 器的伴奏效果。还可以尝试内置的不同流派或风格,也可以任意组合古筝、吉他、鼓等乐器,丰富旋律,生成最适合的伴奏。
经过后期处理,将用户哼唱音频对应生成的旋律与合成的和弦伴奏完美结合,形成优秀的音乐作品并存储,可开发出更多的使用场景,如建设用户社区,使用户可以上传各自作品进行交流;与专业人士合作,上传更多乐器风格模板等。
且上图中实现功能的操作方式简单,能充分利用到用户的碎片化时间;用户可以为喜欢音乐的广大年轻群体而不限于专业人群,受众更广;配合年轻化的界面将能吸引更多的新兴年轻群体,且通过对现有的专业音乐软件的音轨编辑方式进行调整,使得用户的交互简洁化,以达到主流非专业人士能够更快上手的目的。
参见图12所示,本申请实施例公开了一种音频处理装置,包括:
音频获取模块201,用于获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息;
和弦确定模块202,用于基于所述音符信息、所述每分钟节拍信息确定出所述待处理音频对应的和弦;
MIDI文件生成模块203,用于根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件;
和弦伴奏生成模块204,用于根据所述每分钟节拍信息、所述和弦和获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数;
输出模块205,用于输出所述MIDI文件及所述和弦伴奏音频。
可见,本申请先获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息,然后基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦,再根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件,并根据所述每分钟节拍信息、所述和弦和预先获取到的和弦 伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,然后便可以输出所述MIDI文件及所述和弦伴奏音频。由此可见,本申请在获取到待处理哼唱音频之后,便可以得到对应的音乐信息,相比于前述现有技术,不需要先将待处理哼唱音频转换成MIDI文件,然后再对转化成的MIDI文件进行分析,所以也就不易造成先将音频转换成MIDI文件带来的误差累积问题。此外,不仅需要根据音乐信息生成主旋律音频对应的MIDI文件,还需要根据音乐信息以及和弦生成对应的和弦伴奏音频,相比于前述现有技术中的只是生成和弦伴奏对应的MIDI文件带来的用体验不一致问题,本申请通过既生成待处理哼唱音频主旋律对应的MIDI文件,又直接生成待处理哼唱音频对应的和弦伴奏音频,这样由于和弦伴奏音频对音频设备的性能依赖较低,能够使得不同用户的体验一致,得到预期的用户体验效果。
图13为本申请实施例提供的一种电子设备30的结构示意图,该用户终端具体可以包括但不限于智能手机、平板电脑、笔记本电脑或台式电脑等。
通常,本实施例中的电子设备30包括:处理器31和存储器32。
其中,处理器31可以包括一个或多个处理核心,比如四核心处理器、八核心处理器等。处理器31可以采用DSP(digital signal processing,数字信号处理)、FPGA(field-programmable gate array,现场可编程们阵列)、PLA(programmable logic array,可编程逻辑阵列)中的至少一种硬件来实现。处理器31也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(central processing unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器31可以集成有GPU(graphics processing unit,图像处理器),GPU用于负责显示屏所需要显示的图像的渲染和绘制。一些实施例中,处理器31可以包括AI(artificial intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器32可以包括一个或多个计算机可读存储介质,计算机可读存储介质可以是非暂态的。存储器32还可以包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例 中,存储器32至少用于存储以下计算机程序321,其中,该计算机程序被处理器31加载并执行之后,能够实现前述任一实施例中公开的音频处理方法步骤。
在一些实施例中,电子设备30还可包括有显示屏33、输入输出接口34、通信接口35、传感器36、电源37以及通信总线38。
本技术领域人员可以理解,图13中示出的结构并不构成对电子设备30的限定,可以包括比图示更多或更少的组件。
进一步的,本申请实施例还公开了一种计算机可读存储介质,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现前述任一实施例中公开的音频处理方法。
其中,关于上述音频处理方法的具体过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或者操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得一系列包含其他要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、 方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请所提供的一种音频处理方法、装置、设备、介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (16)

  1. 一种音频处理方法,其特征在于,包括:
    获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息;
    基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦;
    根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件;
    根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数;
    输出所述MIDI文件及所述和弦伴奏音频。
  2. 根据权利要求1所述的音频处理方法,其特征在于,所述获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,包括:
    获取待处理哼唱音频;
    确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,并基于所述目标基音周期确定出各个第一音频帧对应的音符信息,其中,所述第一音频帧为时长等于第一预设时长的音频帧;
    确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,其中,所述第二音频帧为包括预设数量个采样点的音频帧。
  3. 根据权利要求2所述的音频处理方法,其特征在于,所述确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,包括:
    利用短时自相关函数和预设清浊音检测方法确定所述待处理哼唱音频中各个第一音频帧的目标基音周期。
  4. 根据权利要求3所述的音频处理方法,其特征在于,所述利用短时自相关函数和预设清浊音检测方法确定所述待处理哼唱音频中各个第一音频帧的目标基音周期,包括:
    利用短时自相关函数确定所述待处理哼唱音频中各个第一音频帧的预 选基音周期;
    利用预设清浊音检测方法确定各个所述第一音频帧是否为浊音帧;
    如果所述第一音频帧为浊音帧,则将所述第一音频帧对应的预选基音周期确定为所述第一音频帧对应的目标基音周期。
  5. 根据权利要求2所述的音频处理方法,其特征在于,所述基于所述目标基音周期确定出各个第一音频帧对应的音符信息,包括:
    分别基于各个所述目标基音周期确定各个所述第一音频帧的音高;
    基于各个所述第一音频帧的音高确定各个所述第一音频帧对应的音符;
    将各个第一音频帧对应的音符和各个第一音频帧对应的起止时间确定为各个所述第一音频帧对应的音符信息。
  6. 根据权利要求2所述的音频处理方法,其特征在于,所述确定所述待处理哼唱音频中各个第二音频帧的声能,并基于所述声能确定出所述待处理哼唱音频对应的每分钟节拍信息,包括:
    确定所述待处理哼唱音频中当前第二音频帧的声能以及当前第二音频帧对应的平均声能,其中,所述平均声能为当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能的平均值;
    基于所述平均声能构建目标比较参数;
    判断当前第二音频帧的声能是否大于所述目标比较参数;
    如果当前第二音频帧的声能大于所述目标比较参数,则判定当前第二音频帧为一个节拍,直到所述待处理哼唱音频中的各个第二音频帧检测完成,得到所述待处理哼唱歌曲中的节拍总数,基于所述节拍总数确定出所述待处理哼唱音频对应的每分钟节拍信息。
  7. 根据权利要求6所述的音频处理方法,其特征在于,所述基于所述平均声能构建目标比较参数,包括:
    确定出当前第二音频帧的终止时刻之前的过去连续第二预设时长之内的各个第二音频帧的声能相对于所述平均声能的偏移量和;
    基于所述偏移量和确定所述平均声能的校准因子;
    基于所述校准因子对所述平均声能进行校准,得到所述目标比较参数。
  8. 根据权利要求1至7任一项所述的音频处理方法,其特征在于,所述基于所述音符信息、所述每分钟节拍信息确定所述待处理音频对应的和弦,包括:
    基于所述音符信息确定出所述待处理哼唱音频的调性;
    基于所述待处理哼唱音频的调性从预设和弦中确定出预选和弦;
    基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦。
  9. 根据权利要求8所述的音频处理方法,其特征在于,所述基于所述音符信息确定出所述待处理哼唱音频的调性,包括:
    在预设调节参数取不同值时,确定出所述音符信息中的音符序列对应的实时调性特征;
    将各个所述实时调性特征与预设调性特征相匹配,并将匹配度最高的实时调性特征确定为目标实时调性特征;
    基于所述目标实时调性特征对应的所述预设调节参数的取值以及与所述目标实时调性特征最匹配的预设调性特征对应的预设调节参数取值与调性对应关系确定出所述待处理哼唱音频的调性。
  10. 根据权利要求8所述的音频处理方法,其特征在于,所述基于所述音符信息和所述每分钟节拍信息从所述预选和弦中确定出待处理音频对应的和弦,包括:
    基于所述每分钟节拍信息将所述音符信息中的音符按照时间序列划分为不同的小节;
    将各个小节的音符分别与各个所述预选和弦进行匹配,确定出各个小节对应的和弦,以确定出所述待处理音频对应的和弦。
  11. 根据权利要求1所述的音频处理方法,其特征在于,所述根据所述每分钟节拍信息、所述和弦和预先获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,包括:
    判断所述和弦伴奏参数中的和弦参数是否表示常用和弦;
    如果所述和弦伴奏参数中的和弦参数表示常用和弦,则根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦;
    根据预先获取到的和弦和音符对应关系将所述优化后和弦转换为优化后音符;
    根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音;
    将混合后音频写入WAV文件中,得到待处理哼唱对应的和弦伴奏音频。
  12. 根据权利要求11所述的音频处理方法,其特征在于,所述根据预设常用和弦库中的常用和弦组对所述和弦进行优化,得到优化后和弦,包括:
    基于所述音符信息确定出所述待处理哼唱音频的调性;
    对所述和弦进行分组,得到不同的和弦组;
    分别将当前和弦组与预设常用和弦库中的与所述调性对应的各个常用和弦组进行匹配,并将匹配度最高的常用和弦组确定为当前和弦组对应的优化后和弦组,直到确定出各个和弦组对应的优化后和弦组,得到优化后和弦。
  13. 根据权利要求11所述的音频处理方法,其特征在于,所述根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,并按照预设混音规则对所述音频素材信息对应的音频素材进行混音,包括:
    根据所述和弦伴奏参数中的乐器类型参数和乐器音高参数确定出所述优化后音符中各个音符对应的音频素材信息,其中,所述音频素材信息包括素材标识、音高、起始播放位置以及素材时长;
    将所述音频素材信息按照预设混音规则放入预设发声数组中,并对当前节拍在所述预设发声数组中的音频素材信息指向的预设音频素材库中的音频素材进行混音,其中,节拍根据所述每分钟节拍信息确定。
  14. 一种音频处理装置,其特征在于,包括:
    音频获取模块,用于获取待处理哼唱音频,得到所述待处理哼唱音频对应的音乐信息,其中,所述音乐信息包括音符信息和每分钟节拍信息;
    和弦确定模块,用于基于所述音符信息、所述每分钟节拍信息确定出所述待处理音频对应的和弦;
    MIDI文件生成模块,用于根据所述音符信息和所述每分钟节拍信息生成所述待处理哼唱音频对应的MIDI文件;
    和弦伴奏生成模块,用于根据所述每分钟节拍信息、所述和弦和获取到的和弦伴奏参数生成所述待处理哼唱音频对应的和弦伴奏音频,其中,所述和弦伴奏参数为用户设置的和弦伴奏生成参数;
    输出模块,用于输出所述MIDI文件及所述和弦伴奏音频。
  15. 一种电子设备,其特征在于,包括:
    存储器和处理器;
    其中,所述存储器,用于存储计算机程序;
    所述处理器,用于执行所述计算机程序,以实现权利要求1至13任一项所述的音频处理方法。
  16. 一种计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至13任一项所述的音频处理方法。
PCT/CN2021/122559 2020-11-03 2021-10-08 一种音频处理方法、装置、设备及介质 WO2022095656A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/034,032 US20230402026A1 (en) 2020-11-03 2021-10-08 Audio processing method and apparatus, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011210970.6 2020-11-03
CN202011210970.6A CN112382257B (zh) 2020-11-03 2020-11-03 一种音频处理方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2022095656A1 true WO2022095656A1 (zh) 2022-05-12

Family

ID=74578933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122559 WO2022095656A1 (zh) 2020-11-03 2021-10-08 一种音频处理方法、装置、设备及介质

Country Status (3)

Country Link
US (1) US20230402026A1 (zh)
CN (1) CN112382257B (zh)
WO (1) WO2022095656A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112382257B (zh) * 2020-11-03 2023-11-28 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质
CN113436641A (zh) * 2021-06-22 2021-09-24 腾讯音乐娱乐科技(深圳)有限公司 一种音乐转场时间点检测方法、设备及介质
CN113763913B (zh) * 2021-09-16 2024-06-18 腾讯音乐娱乐科技(深圳)有限公司 一种曲谱生成方法、电子设备及可读存储介质
CN113838444A (zh) * 2021-10-13 2021-12-24 广州酷狗计算机科技有限公司 生成编曲的方法、装置、设备、介质及计算机程序
CN115132155A (zh) * 2022-05-12 2022-09-30 天津大学 一种基于声调音高空间的预测和弦解释音符的方法
CN117437897A (zh) * 2022-07-12 2024-01-23 北京字跳网络技术有限公司 音频处理方法、装置及电子设备
CN115831080A (zh) * 2022-11-18 2023-03-21 北京字跳网络技术有限公司 确定音频的方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854644A (zh) * 2012-12-05 2014-06-11 中国传媒大学 单声道多音音乐信号的自动转录方法及装置
CN105244021A (zh) * 2015-11-04 2016-01-13 厦门大学 哼唱旋律到midi旋律的转换方法
CN105702249A (zh) * 2016-01-29 2016-06-22 北京精奇互动科技有限公司 自动选择伴奏的方法和装置
CN109166566A (zh) * 2018-08-27 2019-01-08 北京奥曼特奇科技有限公司 一种用于音乐智能伴奏的方法及系统
US20190051275A1 (en) * 2017-08-10 2019-02-14 COOLJAMM Company Method for providing accompaniment based on user humming melody and apparatus for the same
CN112382257A (zh) * 2020-11-03 2021-02-19 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854644A (zh) * 2012-12-05 2014-06-11 中国传媒大学 单声道多音音乐信号的自动转录方法及装置
CN105244021A (zh) * 2015-11-04 2016-01-13 厦门大学 哼唱旋律到midi旋律的转换方法
CN105702249A (zh) * 2016-01-29 2016-06-22 北京精奇互动科技有限公司 自动选择伴奏的方法和装置
US20190051275A1 (en) * 2017-08-10 2019-02-14 COOLJAMM Company Method for providing accompaniment based on user humming melody and apparatus for the same
CN109166566A (zh) * 2018-08-27 2019-01-08 北京奥曼特奇科技有限公司 一种用于音乐智能伴奏的方法及系统
CN112382257A (zh) * 2020-11-03 2021-02-19 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质

Also Published As

Publication number Publication date
US20230402026A1 (en) 2023-12-14
CN112382257B (zh) 2023-11-28
CN112382257A (zh) 2021-02-19

Similar Documents

Publication Publication Date Title
WO2022095656A1 (zh) 一种音频处理方法、装置、设备及介质
CN106023969B (zh) 用于将音频效果应用于音乐合辑的一个或多个音轨的方法
US20070289432A1 (en) Creating music via concatenative synthesis
US9852721B2 (en) Musical analysis platform
CN1750116B (zh) 自动表演风格确定设备和方法
WO2009104269A1 (ja) 楽曲判別装置、楽曲判別方法、楽曲判別プログラム及び記録媒体
JP4613923B2 (ja) 楽音処理装置およびプログラム
US20170090860A1 (en) Musical analysis platform
JPH10105169A (ja) ハーモニーデータ生成装置およびカラオケ装置
JP5229998B2 (ja) コード名検出装置及びコード名検出用プログラム
WO2023040332A1 (zh) 一种曲谱生成方法、电子设备及可读存储介质
JP6175812B2 (ja) 楽音情報処理装置及びプログラム
JP6288197B2 (ja) 評価装置及びプログラム
JP6102076B2 (ja) 評価装置
JP5292702B2 (ja) 楽音信号生成装置及びカラオケ装置
WO2019180830A1 (ja) 歌唱評価方法及び装置、プログラム
JP5678935B2 (ja) 楽器演奏評価装置、楽器演奏評価システム
JP5782972B2 (ja) 情報処理システム,プログラム
JP2000293188A (ja) 和音リアルタイム認識方法及び記憶媒体
JP3879524B2 (ja) 波形生成方法、演奏データ処理方法および波形選択装置
CN112992110A (zh) 音频处理方法、装置、计算设备以及介质
JP3777976B2 (ja) 演奏情報解析装置及び記録媒体
JP7107427B2 (ja) 音信号合成方法、生成モデルの訓練方法、音信号合成システムおよびプログラム
WO2020171035A1 (ja) 音信号合成方法、生成モデルの訓練方法、音信号合成システムおよびプログラム
JP4595851B2 (ja) 演奏データ編集装置及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888353

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21888353

Country of ref document: EP

Kind code of ref document: A1