WO2019237664A1 - 纠正伴奏和干音之间的时延的方法、装置及存储介质 - Google Patents

纠正伴奏和干音之间的时延的方法、装置及存储介质 Download PDF

Info

Publication number
WO2019237664A1
WO2019237664A1 PCT/CN2018/117519 CN2018117519W WO2019237664A1 WO 2019237664 A1 WO2019237664 A1 WO 2019237664A1 CN 2018117519 W CN2018117519 W CN 2018117519W WO 2019237664 A1 WO2019237664 A1 WO 2019237664A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
dry
accompaniment
sequence
correlation function
Prior art date
Application number
PCT/CN2018/117519
Other languages
English (en)
French (fr)
Inventor
张超钢
Original Assignee
广州酷狗计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州酷狗计算机科技有限公司 filed Critical 广州酷狗计算机科技有限公司
Priority to US16/627,954 priority Critical patent/US10964301B2/en
Priority to EP18922771.3A priority patent/EP3633669B1/en
Publication of WO2019237664A1 publication Critical patent/WO2019237664A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/091Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for performance evaluation, i.e. judging, grading or scoring the musical qualities or faithfulness of a performance, e.g. with respect to pitch, tempo or other timings of a reference performance
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/325Synchronizing two or more audio tracks or files according to musical features or musical timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Definitions

  • the present application relates to the field of information processing technology, and in particular, to a method, a device, and a storage medium for correcting a time delay between accompaniment and dry sound.
  • the music library of the music application can store audio of different forms such as the original song audio, accompaniment audio, and dry audio.
  • the original audio refers to the original audio that contains both accompaniment and human voice.
  • the accompaniment audio refers to audio that does not include human voice
  • the dry audio refers to audio that does not include accompaniment, but only human voice. Due to the influence of different versions of the stored audio, or different management methods of the audio versions, there is often a time delay between the accompaniment audio and the dry audio of the stored song.
  • the embodiments of the present application provide a method, a device, and a computer-readable storage medium for correcting the delay between the accompaniment and the dry sound, which can effectively improve the correction efficiency and accuracy.
  • a method for correcting a delay between accompaniment and dry sound includes:
  • the determining a first correlation function curve based on the original song human voice audio and the dry voice audio includes:
  • the second correlation function curve is determined based on the first audio sequence and the second audio sequence.
  • the determining the first correlation function curve based on the first pitch sequence and the second pitch sequence includes:
  • the N is a preset number of pitch values, the N is less than or equal to the number of pitch values included in the first pitch sequence, and the N is less than or equal to the second pitch sequence
  • the number of pitch values included, the x (n) represents the nth pitch value in the first pitch sequence, and the y (nt) represents the (nt ) Pitch values, where t is a time offset between the first pitch sequence and the second pitch sequence;
  • the first correlation function curve is determined based on the first correlation function model.
  • the determining a second correlation function curve based on the original song audio and the accompaniment audio includes:
  • the second correlation function curve is determined based on the first audio sequence and the second audio sequence.
  • correcting a time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve includes:
  • correcting the delay between the accompaniment audio and the dry audio based on the first delay and the second delay includes:
  • time delay is used to indicate that the accompaniment audio is later than the dry audio, delete audio data in the accompaniment audio within the same duration as the delay from the start playback time of the accompaniment audio ;
  • the delay is used to indicate that the accompaniment audio is earlier than the dry audio, delete the dry audio from the dry audio at the same time period as the delay Audio data.
  • an apparatus for evaluating annotated quality of pitch information includes:
  • An obtaining module configured to obtain the original song audio corresponding to the accompaniment audio and dry sound audio to be corrected, and extract the original song human voice audio from the original song audio;
  • a determining module configured to determine a first correlation function curve based on the original song vocal audio and the dry sound audio, and determine a second correlation function curve based on the original song audio and the accompaniment audio;
  • a correction module configured to correct a time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve.
  • the determining module includes:
  • a first obtaining submodule configured to obtain a pitch value corresponding to each audio frame in the multiple audio frames included in the original song human audio, and according to a sequence of the multiple audio frames included in the original song human audio Sequentially, sorting a plurality of pitch values of the acquired original human voice audio to obtain a first pitch sequence;
  • the first obtaining submodule is further configured to obtain a pitch value corresponding to each audio frame in the multiple audio frames included in the dry audio, and in accordance with a sequence of the multiple audio frames included in the dry audio To sort the obtained multiple pitch values of the dry-tone audio to obtain a second pitch sequence;
  • a first determining submodule configured to determine the first correlation function curve based on the first pitch sequence and the second pitch sequence
  • the first determining submodule is specifically configured to:
  • the N is a preset number of pitch values, the N is less than or equal to the number of pitch values included in the first pitch sequence, and the N is less than or equal to the second pitch sequence
  • the number of pitch values included, the x (n) represents the nth pitch value in the first pitch sequence, and the y (nt) represents the (nt ) Pitch values, where t is a time offset between the first pitch sequence and the second pitch sequence;
  • the first correlation function curve is determined based on the first correlation function model.
  • the determining module includes:
  • a second acquisition submodule configured to acquire a plurality of audio frames included in the original song audio according to a sequence of multiple audio frames included in the original song audio to obtain a first audio sequence
  • the second acquisition submodule is configured to acquire a plurality of audio frames included in the accompaniment audio according to a sequence of a plurality of audio frames included in the accompaniment audio to obtain a second audio sequence;
  • a second determination sub-module is configured to determine the second correlation function curve based on the first audio sequence and the second audio sequence.
  • the correction module includes:
  • a detection submodule configured to detect a first peak on the first correlation function curve and detect a second peak on the second correlation function curve
  • a third determining submodule configured to determine a first time delay between the original vocal audio and the dry audio based on the first peak, and determine the accompaniment audio and the The second delay between the audio of the original song;
  • a correction submodule configured to correct a time delay between the accompaniment audio and the dry sound audio based on the first time delay and the second time delay.
  • correction submodule is specifically configured to:
  • time delay between the accompaniment audio and the dry-tone audio is used to indicate that the accompaniment audio is later than the dry-tone audio, delete the Audio data within the same time delay between the accompaniment audio and the dry audio;
  • time delay between the accompaniment audio and the dry-tone audio is used to indicate that the accompaniment audio is earlier than the dry-tone audio, delete the dry-tone audio from the start playback time of the dry-tone audio Audio data in the same duration as the time delay between the accompaniment audio and the dry audio.
  • a device for correcting a time delay between accompaniment and dry sound includes:
  • Memory for storing processor-executable instructions
  • the processor is configured as a step of any one of the methods described in the first aspect.
  • a computer-readable storage medium stores instructions, and when the instructions are executed by a processor, the steps of any one of the methods described in the first aspect are implemented.
  • the beneficial effects brought by the technical solutions provided in the embodiments of the present application include at least: obtaining the accompaniment audio, dry sound audio, and original audio of the target song, and extracting the original vocal audio from the original audio; based on the original vocal audio
  • the first correlation function curve is determined with the dry audio
  • the second correlation function curve is determined based on the original audio and the accompaniment audio
  • the time delay between the accompaniment audio and the dry audio is performed based on the first correlation function curve and the second correlation function curve. correct.
  • the accompaniment audio, the dry audio, and the corresponding original audio can be processed to correct the time delay between the accompaniment audio and the dry audio, compared to the current relying on
  • the method of correction performed by the staff not only saves manpower and time, improves the efficiency of correction, but also eliminates correction errors that may be caused by human factors, and improves accuracy.
  • FIG. 1 is a system architecture diagram of a method for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application
  • FIG. 3 is a flowchart of a method for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application
  • FIG. 4 is a block diagram of a device for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of a determination module according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a correction module according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a server for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application.
  • the system may include a server 101 and a terminal 102, where the server 101 and the terminal 102 can communicate.
  • the server 101 may store song identifiers, original song audio, accompaniment audio, and dry sound audio of multiple songs.
  • the terminal 102 may obtain the accompaniment audio and the dry audio to be corrected from the server, and obtain the original song audio corresponding to the accompaniment audio and the dry audio. After that, the terminal 102 The delay between the accompaniment audio and the dry audio can be corrected by the method provided in this application for correcting the delay between the accompaniment audio and the dry audio.
  • the system may not include the terminal 102, that is, the server 101 may perform a method for each of the stored multiple songs according to the method provided by the embodiment of the present application. The delay between the accompaniment audio and the dry audio is corrected.
  • the execution subject of the embodiment of the present application can be either a server or a terminal.
  • the execution subject will be mainly used as a server to correct the accompaniment and The method of delay between dry tones is explained in detail.
  • FIG. 2 is a flowchart of a method for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application. This method can be applied to a server. Referring to FIG. 2, the method includes the following steps:
  • Step 201 Acquire the accompaniment audio, dry sound audio, and original audio of the target song, and extract the original vocal audio from the original audio.
  • the target song can be any song stored in the server
  • the accompaniment audio refers to the audio without human voice
  • the dry audio refers to the pure vocal audio without the accompaniment
  • the original song audio includes the Original audio for accompaniment and vocals.
  • Step 202 Determine a first correlation function curve based on the original song vocal audio and dry sound audio, and determine a second correlation function curve based on the original song audio and accompaniment audio.
  • Step 203 Correct the time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve.
  • the original song audio corresponding to the accompaniment audio and the dry audio is obtained, and the original song human audio is extracted from the original song audio; and a first correlation function curve is determined based on the original song human audio and dry audio.
  • the second correlation function curve is determined based on the original song audio and the accompaniment audio; and the time delay between the accompaniment audio and the dry audio is corrected based on the first correlation function curve and the second correlation function curve.
  • FIG. 3 is a flowchart of a method for correcting a delay between an accompaniment and a dry sound according to an embodiment of the present application.
  • the method may be applied to a server. As shown in FIG. 3, the method includes the following steps:
  • Step 301 Acquire the accompaniment audio, dry audio, and original audio of the target song, and extract the original vocal audio from the original audio.
  • the target song can be any song in the music library
  • the accompaniment audio and dry sound audio refer to the accompaniment and pure vocal audio of the target song.
  • the server may first obtain the accompaniment audio and the dry-tone audio to be corrected.
  • the server may store the correspondence between the song ID, accompaniment audio ID, dry audio ID, and original audio ID of multiple songs. Because the accompaniment audio and dry sound audio to be corrected correspond to the same song, the server may obtain the original audio audio ID corresponding to the accompaniment audio from the corresponding relationship according to the accompaniment audio identification of the accompaniment audio, and according to the original audio ID Get the stored original song audio.
  • the server may also obtain the corresponding original song audio identifier from the stored correspondence relationship according to the dry tone audio identifier of the dry tone audio, and obtain the stored original song audio according to the original song audio identifier.
  • the server can extract the original song vocal audio from the original song audio through a traditional blind separation method.
  • a traditional blind separation method reference may be made to related technologies, which are not described in the embodiment of the present application.
  • the server may also adopt a deep learning method to extract the original song human voice audio from the original song audio.
  • the server can train the original song audio, accompaniment audio, and dry audio of multiple songs to obtain a supervised convolutional neural network model.
  • the server may use the original song audio as input to the supervised convolutional neural network model, and output the original song human voice audio of the original song audio through the supervised convolutional neural network model.
  • neural network models may also be used in the embodiments of the present application to extract the original vocal audio from the original audio, which is not specifically limited in this embodiment of the present application.
  • Step 302 Determine a first correlation function curve based on the original song human voice audio and dry voice audio.
  • the server may determine a first correlation function curve between the original vocal audio and the dry audio based on the original vocal audio and the dry audio.
  • the first correlation function curve can be used to estimate a first time delay between the human voice audio and the dry audio of the original song.
  • the server may obtain a pitch value corresponding to each audio frame in the multiple audio frames included in the original song vocal audio, and perform, on the sequence of the multiple audio frames included in the original song vocal audio, the acquired original song. Sort the multiple pitch values of the vocal audio to obtain the first pitch sequence; obtain the pitch value corresponding to each audio frame in the multiple audio frames included in the dry audio, and follow the multiple audio included in the dry audio.
  • the sequence of the frames is to sort multiple pitch values of the obtained dry-tone audio to obtain a second pitch sequence; based on the first pitch sequence and the second pitch sequence, a first correlation function curve is determined.
  • audio can be composed of multiple audio frames, and the time interval between each adjacent two audio frames is the same. That is, each audio frame corresponds to a time point.
  • the server may obtain the pitch value corresponding to each audio frame in the original vocal audio, and sort multiple pitch values according to the sequence of the time points corresponding to each audio frame, so that Get the first pitch sequence.
  • the first pitch sequence may also include a time point corresponding to each pitch value.
  • the pitch value is mainly used to indicate the height of the sound, which is an important feature of the sound.
  • the pitch value mainly refers to a height value of a human voice.
  • the server may use the same method to obtain the pitch value corresponding to each audio frame in the multiple audio frames included in the dry tone audio, and correspond to each audio frame included in the dry tone audio.
  • the multiple pitch values of multiple audio frames included in the dry-tone audio are sorted to obtain a second pitch sequence.
  • the server may construct a first correlation function model according to the first pitch sequence and the second pitch sequence.
  • the first correlation function model constructed based on the first pitch sequence and the second pitch sequence can be expressed as follows: Show:
  • N is a preset number of pitch values, N is less than or equal to the number of pitch values included in the first pitch sequence, and N is less than or equal to the number of pitch values included in the second pitch sequence.
  • x (n) represents the nth pitch value in the first pitch sequence
  • y (nt) represents the (nt) pitch value in the second pitch sequence
  • t is the first pitch sequence and the second The time offset between the pitch sequences.
  • the server may determine a first correlation function curve according to the correlation function model.
  • N the larger the calculation amount when the server constructs the correlation function model and generates the correlation function curve.
  • the server can set N to take only the first half of the pitch sequence for calculation.
  • Step 303 Determine a second correlation function curve based on the original song audio and the accompaniment audio.
  • the server can determine the first vocal audio and dry audio audio by extracting the pitch sequence of the audio.
  • a correlation function curve For both the original audio and the accompaniment audio, both include accompaniment. Therefore, the server can directly use multiple audio frames included in the original audio as the first audio sequence and the multiple audio frames included in the accompaniment audio as the second audio sequence. And determine a second correlation function curve based on the first audio sequence and the second audio sequence.
  • the server may construct a second correlation function model according to the first audio sequence and the second audio sequence, and generate a second correlation function curve according to the second correlation function model.
  • the second correlation function model For the manner of the second correlation function model, reference may be made to the foregoing first correlation function model, which is not repeatedly described in this embodiment of the present application.
  • step 302 and step 303 may be in no particular order, that is, the server may execute step 302 and then step 303, or may execute step 303 and then execute Step 302. Of course, the server may also perform steps 302 and 303 at the same time.
  • Step 304 Correct the time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve.
  • the server may determine a first time delay between the original vocal audio and the dry sound audio based on the first correlation function curve, and determine the accompaniment audio based on the second correlation function curve. A second time delay from the original song audio; thereafter, the server may correct the time delay between the accompaniment audio and the dry audio based on the first time delay and the second time delay.
  • the server may detect the first peak on the first correlation function curve, determine the first delay according to t corresponding to the first peak, detect the second peak on the second correlation function curve, and determine the second peak corresponding to the second peak. t determines the second delay.
  • the server can calculate the delay difference between the first delay and the second delay, and determine the delay difference as the delay between the dry audio and the accompaniment audio.
  • the server may adjust the accompaniment audio or the dry audio based on the delay to align the accompaniment audio and the dry audio.
  • the server may delete the accompaniment audio from the Audio data with the same time delay.
  • the time delay between several audio and accompaniment audio is positive, which means that the accompaniment audio is earlier than the dry audio.
  • the server may delete the audio data within 2s from the start playback time of the accompaniment audio to align with the dry audio.
  • the server may also add audio data of the same duration as the time delay before the initial playback time of the dry audio. For example, if the accompaniment audio is 2s later than the dry audio, the server may add 2s of audio data before the start of the dry audio to align with the dry audio, where the added 2s audio data may be Data containing any audio information.
  • the server may further use a Dynamic Time Warping algorithm or other delay estimation algorithms.
  • the server may also determine the first delay between the original song audio and the accompaniment audio by using a dynamic time warping algorithm or other delay estimation algorithms. Two delays, and then, the server may determine the delay difference between the first delay and the second delay as the delay between the dry audio and the accompaniment audio, and according to the delay between the dry audio and the accompaniment audio Time delay corrects dry and accompaniment audio.
  • the server may obtain the accompaniment audio, dry sound audio, and original song audio of the target song, and extract the original song human audio from the original song audio; determine the first based on the original song human audio and dry audio.
  • the correlation function curve determines a second correlation function curve based on the original song audio and the accompaniment audio; and corrects the time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve.
  • the accompaniment audio, the dry audio, and the corresponding original audio can be processed to correct the time delay between the accompaniment audio and the dry audio, compared to the current relying on
  • the method of correction performed by the staff not only saves manpower and time, improves the efficiency of correction, but also eliminates correction errors that may be caused by human factors, and improves accuracy.
  • an embodiment of the present application provides a device 400 for correcting a time delay between accompaniment audio and dry audio.
  • the device 400 includes:
  • An obtaining module 401 configured to obtain accompaniment audio, dry sound audio, and original song audio of a target song, and extract the original song human voice audio from the original song audio;
  • a determining module 402 configured to determine a first correlation function curve based on the original song vocal audio and dry sound audio, and determine a second correlation function curve based on the original song audio and accompaniment audio;
  • the correction module 403 is configured to correct the time delay between the accompaniment audio and the dry audio based on the first correlation function curve and the second correlation function curve.
  • the determining module 402 includes:
  • the first acquisition sub-module 4021 is configured to acquire a pitch value corresponding to each audio frame in the multiple audio frames included in the original song human audio, and perform the comparison on the sequence of the multiple audio frames included in the original song human audio. Sort the multiple pitch values of the vocal audio of the original song to obtain a first pitch sequence;
  • the first obtaining sub-module 4021 is further configured to obtain a pitch value corresponding to each audio frame in the multiple audio frames included in the dry audio, and to perform the sequence of the multiple audio frames included in the dry audio on the obtained dry audio. Sorting multiple pitch values of audio and audio to obtain a second pitch sequence;
  • the first determining sub-module 4022 is configured to determine a first correlation function curve of the person based on the first pitch sequence and the second pitch sequence.
  • the first determining submodule 4022 is specifically configured to:
  • N is a preset number of pitch values, N is less than or equal to the number of pitch values included in the first pitch sequence, and N is less than or equal to the number of pitch values included in the second pitch sequence.
  • x (n) represents the nth pitch value in the first pitch sequence
  • y (nt) represents the (nt) pitch value in the second pitch sequence
  • t is the first pitch sequence and the second Time offset between pitch sequences;
  • a first correlation function curve is determined based on the first correlation function model.
  • the determining module 402 includes:
  • a second acquisition submodule configured to acquire a plurality of audio frames included in the original audio according to a sequence of multiple audio frames included in the original audio to obtain a first audio sequence
  • a second acquisition submodule further configured to acquire a plurality of audio frames included in the accompaniment audio according to a sequence of a plurality of audio frames included in the accompaniment audio to obtain a second audio sequence;
  • the second determining sub-module is configured to determine a second correlation function curve based on the first audio sequence and the second audio sequence.
  • the correction module 403 includes:
  • a detection submodule 4031 configured to detect a first peak on a first correlation function curve and detect a second peak on a second correlation function curve
  • a third determining submodule 4032 configured to determine a first time delay between the vocal audio and the dry audio of the original song based on the first peak; and determine a second time delay between the accompaniment audio and the original song audio based on the second peak;
  • the correction sub-module 4033 is configured to correct the delay between the accompaniment audio and the dry audio based on the first delay and the second delay.
  • correction submodule 4033 is specifically configured to:
  • the delay is used to indicate that the accompaniment audio is later than the dry audio, then the audio data in the accompaniment audio with the same time delay as the accompaniment audio is deleted from the start of the accompaniment audio;
  • the audio data in the dry audio with the same duration as the delay is deleted from the start playback time of the dry audio.
  • the accompaniment audio, dry audio, and original audio of the target song are obtained, and the original vocal audio is extracted from the original audio; based on the original vocal audio and dry audio
  • the first correlation function curve is determined, and the second correlation function curve is determined based on the original song audio and the accompaniment audio.
  • the time delay between the accompaniment audio and the dry audio is corrected based on the first correlation function curve and the second correlation function curve.
  • the time delay between the accompaniment audio and the dry audio can be corrected, compared with the current relying on
  • the method of correction performed by the staff not only saves manpower and time, improves the efficiency of correction, but also eliminates correction errors that may be caused by human factors, and improves accuracy.
  • the time delay between the accompaniment and the dry sound provided by the device for correcting the time delay between the accompaniment and the dry sound provided by the above embodiments is described by using only the division of the foregoing functional modules as an example.
  • the above function allocation may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above.
  • the device for correcting the delay between the accompaniment and the dry sound provided in the foregoing embodiment and the method embodiment for correcting the delay between the accompaniment and the dry sound belong to the same concept. For specific implementation processes, see the method embodiment. More details.
  • Fig. 7 is a schematic diagram of a server structure of an apparatus for correcting a time delay between accompaniment and dry sound according to an exemplary embodiment.
  • the functions of the server in the embodiment shown in FIG. 2-3 can be implemented by the server shown in FIG. 7.
  • the server can be a server in a background server cluster. Specifically:
  • the server 700 includes a central processing unit (CPU) 701, a system memory 704 including a random access memory (RAM) 702 and a read-only memory (ROM) 703, and a system bus 705 connecting the system memory 704 and the central processing unit 701.
  • the server 700 also includes a basic input / output system (I / O system) 706 that helps transfer information between various devices in the computer, and a mass storage device 707 for storing the operating system 713, application programs 714, and other program modules 715 .
  • I / O system basic input / output system
  • the basic input / output system 706 includes a display 708 for displaying information and an input device 709 such as a mouse, a keyboard, or the like for a user to input information.
  • the display 708 and the input device 709 are both connected to the central processing unit 701 through an input-output controller 710 connected to the system bus 705.
  • the basic input / output system 706 may further include an input-output controller 710 for receiving and processing input from a plurality of other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input-output controller 710 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705.
  • the mass storage device 707 and its associated computer-readable medium provide non-volatile storage for the server 700. That is, the mass storage device 707 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • Computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory, or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, disk storage, or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other solid-state storage technologies
  • CD-ROM, DVD or other optical storage CD-ROM, DVD or other optical storage
  • tape cartridges magnetic tape
  • disk storage or other magnetic storage devices.
  • computer storage medium is not limited to the above.
  • the above-mentioned system memory 704 and mass storage device 707 may be collectively referred to as a memory.
  • the server 700 may also be operated by a remote computer connected to a network through a network such as the Internet. That is, the server 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or the network interface unit 711 may also be used to connect to other types of networks or remote computer systems (not shown).
  • the memory also includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.
  • the one or more programs include instructions for performing a method for correcting a delay between an accompaniment and a dry sound provided in the embodiments of the present application.
  • the embodiment of the present application further provides a non-transitory computer-readable storage medium, and when the instructions in the storage medium are executed by the processor of the server, the server can perform the correction provided by the embodiment shown in FIG. 2-3 above. Method of delay between accompaniment and dry sound.
  • the embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for correcting a delay between accompaniment and dry sound provided by the embodiment shown in FIG. 2-3 above. .
  • the program may be stored in a computer-readable storage medium.
  • the storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Auxiliary Devices For Music (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种纠正伴奏和干音之间的时延的方法、装置及存储介质,属于信息处理技术领域,方法包括:获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频(201);基于原曲人声音频和干音音频确定第一自相关函数曲线,基于原曲音频和伴奏音频确定第二自相关函数曲线(202);基于第一自相关函数曲线和第二自相关函数曲线对伴奏音频和干音音频之间的时延进行纠正(203)。相较于目前依靠工作人员进行纠正的方法,既节省了人力和时间,提高了纠正效率,同时也排除了人为因素可能引发的纠正错误,提高了准确性。

Description

纠正伴奏和干音之间的时延的方法、装置及存储介质
本申请要求于2018年6月11日提交的申请号为201810594183.2、发明名称为“纠正伴奏和干音之间的时延的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,特别涉及一种纠正伴奏和干音之间的时延的方法、装置及存储介质。
背景技术
当前,考虑到不同用户的需求,音乐应用的曲库中可以存储有歌曲的原曲音频、伴奏音频和干音音频等不同形式的音频。其中,原曲音频是指同时包含有伴奏和人声的原始音频,伴奏音频是指不包含有人声的音频,而干音音频则是指不包含有伴奏,仅包含有人声的音频。由于受到存储的各个音频的版本的不同,或者是音频的版本的管理方式不同等因素的影响,存储的歌曲的伴奏音频和干音音频之间经常会存在时延。而由于伴奏音频和干音音频的起始时刻之前没有任何时域和频域的相关信息,因此,目前,主要依靠工作人员来人工检查伴奏音频和干音音频之间存在的时延,并对该时延进行纠正,纠正效率低下,准确性较低。
发明内容
本申请实施例提供了一种纠正伴奏和干音之间的时延的方法、装置及计算机可读存储介质,可以有效提高纠正效率以及准确性,所述技术方案如下:
第一方面,提供了一种纠正伴奏和干音之间的时延的方法,所述方法包括:
获取目标歌曲的伴奏音频、干音音频和原曲音频,并从所述原曲音频中提取原曲人声音频;
基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,基于所述原曲音频和所述伴奏音频确定第二相关函数曲线;
基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正。
可选的,所述基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,包括:
获取所述原曲人声音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述原曲人声音频包括的多个音频帧的先后顺序,对获取的所述原曲人声音频的多个音高值进行排序,得到第一音高序列;
获取所述干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述干音音频包括的多个音频帧的先后顺序,对获取的所述干音音频的多个音高值进行排序,得到第二音高序列;
基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线;
按照所述原曲音频包括的多个音频帧的先后顺序获取所述原曲音频包括的多个音频帧,以得到第一音频序列;
按照所述伴奏音频包括的多个音频帧的先后顺序获取所述伴奏音频包括的多个音频帧,以得到第二音频序列;
基于所述第一音频序列和所述第二音频序列确定所述第二相关函数曲线。
可选地,所述基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线,包括:
基于所述第一音高序列和所述第二音高序列确定如下式所示的第一相关函数模型;
Figure PCTCN2018117519-appb-000001
其中,所述N为预设的音高值的数量,所述N小于或等于所述第一音高序列包括的音高值的数量,且所述N小于或等于所述第二音高序列包括的音高值的数量,所述x(n)表示所述第一音高序列中的第n个音高值,所述y(n-t)表示所述第二音高序列中的第(n-t)个音高值,所述t为所述第一音高序列和所述第二音高序列之间的时间偏移量;
基于所述第一相关函数模型确定所述第一相关函数曲线。
可选地,所述,基于所述原曲音频和所述伴奏音频确定第二相关函数曲线,包括:
按照所述原曲音频包括的多个音频帧的先后顺序获取所述原曲音频包括的多个音频帧,以得到第一音频序列;
按照所述伴奏音频包括的多个音频帧的先后顺序获取所述伴奏音频包括的多个音频帧,以得到第二音频序列;
基于所述第一音频序列和所述第二音频序列确定所述第二相关函数曲线。
可选地,所述基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正,包括:
在所述第一相关函数曲线上检测第一峰值,在所述第二相关函数曲线检测第二峰值;
基于所述第一峰值确定所述原曲人声音频和所述干音音频之间的第一时延,基于所述第二峰值确定所述伴奏音频与所述原曲音频之间的第二时延;
基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正。
可选地,所述基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正,包括:
将所述第一时延和所述第二时延之间的时延差确定为所述伴奏音频和所述干音音频之间的时延;
若所述时延用于指示所述伴奏音频晚于所述干音音频,则从所述伴奏音频的起始播放时刻起删除所述伴奏音频中与所述时延相同的时长内的音频数据;
若所述时延用于指示所述伴奏音频早于所述干音音频,则从所述干音音频的起始播放时刻起删除所述干音音频中与所述时延相同的时长内的音频数据。
第二方面,提供了一种评价音高信息的标注质量的装置,所述装置包括:
获取模块,用于获取待纠正的伴奏音频和干音音频所对应的原曲音频,并从所述原曲音频中提取原曲人声音频;
确定模块,用于基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,基于所述原曲音频和所述伴奏音频确定第二相关函数曲线;
纠正模块,用于基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正。
可选地,所述确定模块包括:
第一获取子模块,用于获取所述原曲人声音频包括的多个音频帧中每个音 频帧对应的音高值,并按照所述原曲人声音频包括的多个音频帧的先后顺序,对获取的所述原曲人声音频的多个音高值进行排序,得到第一音高序列;
所述第一获取子模块,还用于获取所述干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述干音音频包括的多个音频帧的先后顺序,对获取的所述干音音频的多个音高值进行排序,得到第二音高序列;
第一确定子模块,用于基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线;
可选地,所述第一确定子模块具体用于:
基于所述第一音高序列和所述第二音高序列确定如下式所示的第一相关函数模型;
Figure PCTCN2018117519-appb-000002
其中,所述N为预设的音高值的数量,所述N小于或等于所述第一音高序列包括的音高值的数量,且所述N小于或等于所述第二音高序列包括的音高值的数量,所述x(n)表示所述第一音高序列中的第n个音高值,所述y(n-t)表示所述第二音高序列中的第(n-t)个音高值,所述t为所述第一音高序列和所述第二音高序列之间的时间偏移量;
基于所述第一相关函数模型确定所述第一相关函数曲线。
可选地,所述确定模块包括:
第二获取子模块,用于按照所述原曲音频包括的多个音频帧的先后顺序获取所述原曲音频包括的多个音频帧,以得到第一音频序列;
所述第二获取子模块,用于按照所述伴奏音频包括的多个音频帧的先后顺序获取所述伴奏音频包括的多个音频帧,以得到第二音频序列;
第二确定子模块,用于基于所述第一音频序列和所述第二音频序列确定所述第二相关函数曲线。
可选地,所述纠正模块包括:
检测子模块,用于在所述第一相关函数曲线上检测第一峰值,在所述第二相关函数曲线检测第二峰值;
第三确定子模块,用于基于所述第一峰值确定所述原曲人声音频和所述干音音频之间的第一时延,基于所述第二峰值确定所述伴奏音频与所述原曲音频之间的第二时延;
纠正子模块,用于基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正。
可选地,所述纠正子模块具体用于:
将所述第一时延和所述第二时延之间的时延差确定为所述伴奏音频和所述干音音频之间的时延;
若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频晚于所述干音音频,则从所述伴奏音频的起始播放时刻起删除所述伴奏音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据;
若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频早于所述干音音频,则从所述干音音频的起始播放时刻起删除所述干音音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据。
第三方面,提供了一种纠正伴奏和干音之间的时延的装置,所述装置包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为上述第一方面所述的任一项方法的步骤。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,所述指令被处理器执行时实现上述第一方面所述的任一项方法的步骤。
本申请实施例提供的技术方案带来的有益效果至少包括:获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频;基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线;基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。由此可见,在本申请实施例中,可以通过对伴奏音频、干音音频以及对应的原曲音频进行处理,从而对伴奏音频和干音音频之间的时延进行纠正,相较于目前依靠工作人员进行纠正的方法,既节省了人力和时间,提高了纠正效率,同时也排除了人为因素可能引发的纠正错误,提高了准确性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种纠正伴奏和干音之间的时延的方法的系统架构图;
图2是本申请实施例提供的一种纠正伴奏和干音之间的时延的方法的流程图;
图3是本申请实施例提供的一种纠正伴奏和干音之间的时延的方法的流程图;
图4是本申请实施例提供的一种纠正伴奏和干音之间的时延的装置的框图;
图5是本申请实施例提供的一种确定模块的结构示意图;
图6是本申请实施例提供的一种纠正模块的结构示意图;
图7是本申请实施例提供的一种用于纠正伴奏和干音之间的时延的服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细的解释说明之前,先对本申请实施例的应用场景予以介绍。
当前,为了提高用户使用音乐应用的用户体验,服务商在音乐应用中可以添加各种附加项目和功能。其中,某些功能可能需要同时使用歌曲的伴奏音频和干音音频,并将二者进行合成。然而,由于音频的版本的不同,或者是音频的版本管理方式的不同,同一首歌曲的伴奏音频和干音音频之间可能会存在时延,在这种情况下,就需要首先将伴奏音频和干音音频进行对齐,之后再进行合成。而本申请实施例提供的纠正伴奏音频和干音音频之间的时延的方法即可以用于上述场景下,以对伴奏音频和干音音频之间的时延进行纠正,从而实现伴奏音频和干音音频的对齐。
接下来对本申请实施例提供的纠正伴奏音频和干音音频之间的时延的方法所涉及的系统架构进行介绍。如图1所示,该系统中可以包括服务器101和终端102,其中,服务器101和终端102可以进行通信。
需要说明的是,服务器101中可以存储有多首歌曲的歌曲标识、原曲音频、伴奏音频和干音音频。
在纠正伴奏和干音之间的时延时,终端102可以从服务器中获取待纠正的伴奏音频和干音音频,并获取该伴奏音频和干音音频所对应的原曲音频,之后,终端102可以通过本申请提供的纠正伴奏音频和干音音频之间的时延的方法通过获取的原曲音频对该伴奏音频和干音音频之间的时延进行纠正。可选地,在一种可能的实现方式中,该系统中也可以不包括终端102,也即,可以由服务器101根据本申请实施例提供的方法对存储的多首歌曲中的每首歌曲的伴奏音频和干音音频之间的时延进行纠正。
由前述的系统架构介绍可知,本申请实施例的执行主体既可以是服务器,也可以是终端,在接下来的实施例中,将主要以执行主体为服务器来对本申请实施例提供的纠正伴奏和干音之间的时延的方法进行详细的解释说明。
图2是本申请实施例提供的一种纠正伴奏和干音之间的时延的方法的流程图。该方法可以应用于服务器,参见图2,该方法包括以下步骤:
步骤201:获取目标歌曲的伴奏音频、干音音频和原曲音频,并从该原曲音频中提取原曲人声音频。
其中,目标歌曲可以是指服务器中存储的任一首歌曲,伴奏音频是指不包含有人声的音频,干音音频是指不包含有伴奏的纯人声音频,原曲音频则是指包含有伴奏和人声的原始音频。
步骤202:基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线。
步骤203:基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。
在本申请实施例中,获取伴奏音频和干音音频对应的原曲音频,并从原曲音频中提取原曲人声音频;基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线;基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。由此可见, 在本申请实施例中,可以通过对伴奏音频、干音音频以及对应的原曲音频进行处理,从而对伴奏音频和干音音频之间的时延进行纠正,相较于目前依靠工作人员进行纠正的方法,既节省了人力和时间,提高了纠正效率,同时也排除了人为因素可能引发的纠正错误,提高了准确性。
图3是本申请实施例提供的一种纠正伴奏和干音之间的时延的方法的流程图,该方法可以应用于服务器中,如图3所示,该方法包括以下步骤:
步骤301:获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频。
其中,目标歌曲可以是曲库中的任一首歌曲,而伴奏音频和干音音频则是指该目标歌曲的伴奏和纯原曲人声音频。
在本申请实施例中,服务器可以首先获取待纠正的伴奏音频和干音音频。并且,服务器中可以存储有多首歌曲的歌曲标识、伴奏音频标识、干音音频标识和原曲音频标识的对应关系。由于待纠正的伴奏音频和干音音频对应于同一首歌曲,因此,服务器可以根据该伴奏音频的伴奏音频标识从对应关系中获取该伴奏音频对应的原曲音频标识,并根据该原曲音频标识获取存储的原曲音频。当然,服务器也可以根据该干音音频的干音音频标识从存储的对应关系中获取对应的原曲音频标识,并根据该原曲音频标识获取存储的原曲音频。
在获取到原曲音频之后,服务器可以通过传统的盲分离方式从该原曲音频中提取原曲人声音频。其中,传统的盲分离方式可以参考相关技术,本申请实施例在此不再赘述。
可选地,在一种可能的实现方式中,服务器还可以采用深度学习的方法来从原曲音频中提取原曲人声音频。具体地,服务器可以采用多首歌曲的原曲音频、伴奏音频和干音音频训练得到有监督的卷积神经网络模型。之后,服务器可以将该原曲音频作为该有监督的卷积神经网络模型的输入,通过该有监督的卷积神经网络模型输出该原曲音频的原曲人声音频。
需要说明的是,本申请实施例中还可以采用其他类型的神经网络模型来从原曲音频中提取原曲人声音频,本申请实施例对此不做具体限定。
步骤302:基于原曲人声音频和干音音频确定第一相关函数曲线。
在从原曲音频中提取到原曲人声音频之后,服务器可以基于该原曲人声音频和干音音频确定原曲人声音频与干音音频之间的第一相关函数曲线。其中, 该第一相关函数曲线可以用于估计原曲人声音频和干音音频之间的第一时延。
具体地,服务器可以获取原曲人声音频包括的多个音频帧中每个音频帧对应的音高值,并按照原曲人声音频包括的多个音频帧的先后顺序,对获取的原曲人声音频的多个音高值进行排序,得到第一音高序列;获取干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照干音音频包括的多个音频帧的先后顺序,对获取的干音音频的多个音高值进行排序,得到第二音高序列;基于第一音高序列和第二音高序列,确定第一相关函数曲线。
需要说明的是,通常,音频可以由多个音频帧组成,并且,每相邻两个音频帧之间的时间间隔相同。也即,每个音频帧对应有一个时间点。在本申请实施例中,服务器可以获取原曲人声音频中每个音频帧对应的音高值,并按照每个音频帧对应的时间点的先后顺序,对多个音高值进行排序,从而得到第一音高序列。其中,该第一音高序列中还可以包括每个音高值对应的时间点,另外,需要说明的是,音高值主要用于指示声音的高度,是声音的一种重要特征。在本申请实施例中,音高值主要是指人声的高度值。
在获取到第一音高序列之后,服务器可以采用同样的方法获取干音音频中包括的多个音频帧中每个音频帧对应的音高值,并按照干音音频包括的每个音频帧对应的时间点的先后顺序,对干音音频包括的多个音频帧的多个音高值进行排序,从而得到第二音高序列。
在确定第一音高序列和第二音高序列之后,服务器可以根据第一音高序列和第二音高序列构建第一相关函数模型。
例如,假设第一音高序列为x(n),第二音高序列为y(n),则根据该第一音高序列和第二音高序列构建的第一相关函数模型可以如下式所示:
Figure PCTCN2018117519-appb-000003
其中,N为预设的音高值的数量,N小于或等于第一音高序列包括的音高值的数量,且N小于或等于第二音高序列包括的音高值的数量。x(n)表示第一音高序列中的第n个音高值,y(n-t)表示第二音高序列中的第(n-t)个音高值,t为第一音高序列和第二音高序列之间的时间偏移量。
在确定相关函数模型之后,服务器可以根据该相关函数模型确定第一相关函数曲线。
需要说明的是,N越大,服务器构建相关函数模型,并生成相关函数曲线 时的计算量也就越大,同时,考虑到人声音高重复性等特点,为避免相关函数模型的不准确,服务器可以通过设置N来只取音高序列的前半段进行计算。
步骤303:基于原曲音频和伴奏音频确定第二相关函数曲线。
无论是音高序列还是音频序列,实质上均是一种时间序列。其中,对于原曲人声音频和干音音频而言,由于这类型的音频是不包含伴奏的,因此,服务器可以通过提取音频的音高序列来确定原曲人声音频和干音音频的第一相关函数曲线。而对于原曲音频和伴奏音频,二者均包含伴奏,因此,服务器可以直接将原曲音频包括的多个音频帧作为第一音频序列,将伴奏音频包括的多个音频帧作为第二音频序列,并基于该第一音频序列和第二音频序列确定第二相关函数曲线。
具体地,服务器可以根据第一音频序列和第二音频序列构建第二相关函数模型,并根据第二相关函数模型生成第二相关函数曲线。其中,第二相关函数模型的方式可以参考前述的第一相关函数模型,本申请实施例对此不再赘述。
需要说明的是,在本申请实施例中,步骤302和步骤303的执行顺序可以不分先后,也即,服务器可以先执行步骤302,再执行步骤303,或者,可以先执行步骤303,再执行步骤302,当然,服务器也可以同时执行步骤302和步骤303中。
步骤304:基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。
在确定第一相关函数曲线和第二相关函数曲线之后,服务器可以基于第一相关函数曲线确定原曲人声音频和干音音频之间的第一时延,基于第二相关函数曲线确定伴奏音频和原曲音频之间的第二时延;之后,服务器可以基于该第一时延和第二时延对伴奏音频和干音音频之间的时延进行纠正。
具体地,服务器可以在第一相关函数曲线上检测第一峰值,并根据第一峰值对应的t确定第一时延,在第二相关函数曲线上检测第二峰值,并根据第二峰值对应的t确定第二时延。
在确定第一时延和第二时延之后,由于第一时延是原曲人声音频和干音音频之间的时延,而原曲人声音频是从原曲音频中分离出来的,所以第一时延实际上就是干音音频相对于原曲音频中的人声的时延。另一方面,第二时延是原曲音频和伴奏音频之间的时延,实际上也就是伴奏音频相对于原曲音频的时延。在这种情况下,由于第一时延和第二时延均是以原曲音频为基准的时延, 因此,将第一时延和第二时延相减后得到的时延差实际上就是干音音频和伴奏音频之间的时延。基于此,服务器可以计算第一时延和第二时延之间的时延差,并将该时延差确定为干音音频和伴奏音频之间的时延。
在确定干音音频和伴奏音频之间的时延之后,服务器可以基于该时延调整伴奏音频或者干音音频,从而将伴奏音频和干音音频对齐。
具体地,若干音音频和伴奏音频之间的时延为负值,则说明伴奏音频要晚于干音音频,此时,服务器可以从伴奏音频的起始播放时刻起删除该伴奏音频中与该时延相同的时长内的音频数据。若干音音频和伴奏音频之间的时延为正值,则说明伴奏音频要早于干音音频,此时,则可以从干音音频的起始播放时刻起删除该干音音频中与该时延相同的时长内的音频数据。
例如,假设伴奏音频要比干音音频晚2s,则服务器可以将伴奏音频的起始播放时刻起的2s内的音频数据进行删除,从而与干音音频对齐。
可选地,在一种可能的实现方式中,若伴奏音频晚于干音音频,则服务器还可以在干音音频的起始播放时刻之前添加与时延相同的时长的音频数据。例如,假设伴奏音频要比干音音频晚2s,则服务器可以在干音音频的起始播放时刻之前增加2s的音频数据,从而与干音音频对齐,其中,增加的2s的音频数据可以是不含有任何音频信息的数据。
上述实施例中主要介绍了通过自相关算法来确定原曲人声音频和干音音频之间的第一时延以及原曲音频和伴奏音频之间的第二时延的实现方式。可选地,在本申请实施例中,在步骤302中,当确定第一音高序列和第二音高序列之后,服务器还可以通过动态时间规整(Dynamic Time Warping)算法或其他时延估计算法来确定原曲人声音频和干音音频之间的第一时延,在步骤303中,服务器同样可以通过动态时间规整算法或者其他时延估计算法来确定原曲音频和伴奏音频之间的第二时延,之后,服务器可以将该第一时延和第二时延之间的时延差确定为干音音频和伴奏音频之间的时延,并根据干音音频和伴奏音频之间的时延对干音音频和伴奏音频进行纠正。
其中,服务器通过动态时间规整算法来估算两个序列之间的时延的具体实现方式可以参考相关技术,本申请实施例对此不再赘述。
在本申请实施例中,服务器可以获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频;基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线;基于 第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。由此可见,在本申请实施例中,可以通过对伴奏音频、干音音频以及对应的原曲音频进行处理,从而对伴奏音频和干音音频之间的时延进行纠正,相较于目前依靠工作人员进行纠正的方法,既节省了人力和时间,提高了纠正效率,同时也排除了人为因素可能引发的纠正错误,提高了准确性。
接下来,对本申请实施例提供的纠正伴奏和干音之间的时延的装置进行介绍。
参见图4,本申请实施例提供了一种纠正伴奏音频和干音音频之间的时延的装置400,该装置400包括:
获取模块401,用于获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频;
确定模块402,用于基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线;
纠正模块403,用于基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。
可选地,参见图5,确定模块402包括:
第一获取子模4021,用于获取原曲人声音频包括的多个音频帧中每个音频帧对应的音高值,并按照原曲人声音频包括的多个音频帧的先后顺序,对获取的原曲人声音频的多个音高值进行排序,得到第一音高序列;
第一获取子模块4021,还用于获取干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照干音音频包括的多个音频帧的先后顺序,对获取的干音音频的多个音高值进行排序,得到第二音高序列;
第一确定子模块4022,用于基于第一音高序列和第二音高序列,确定人第一相关函数曲线。
可选地,第一确定子模块4022具体用于:
基于第一音高序列和第二音高序列确定如下式所示的第一相关函数模型;
Figure PCTCN2018117519-appb-000004
N为预设的音高值的数量,N小于或等于第一音高序列包括的音高值的数量,且N小于或等于第二音高序列包括的音高值的数量。x(n)表示第一音高序 列中的第n个音高值,y(n-t)表示第二音高序列中的第(n-t)个音高值,t为第一音高序列和第二音高序列之间的时间偏移量;
基于第一相关函数模型确定第一相关函数曲线。
可选地,确定模块402包括:
第二获取子模块,用于按照原曲音频包括的多个音频帧的先后顺序获取原曲音频包括的多个音频帧,以得到第一音频序列;
第二获取子模块,还用于按照伴奏音频包括的多个音频帧的先后顺序获取伴奏音频包括的多个音频帧,以得到第二音频序列;
第二确定子模块,用于基于第一音频序列和第二音频序列确定第二相关函数曲线。
可选地,参见图6,纠正模块403包括:
检测子模块4031,用于在第一相关函数曲线上检测第一峰值,在第二相关函数曲线检测第二峰值;
第三确定子模块4032,用于基于第一峰值确定原曲人声音频和干音音频之间的第一时延;基于第二峰值确定伴奏音频与原曲音频之间的第二时延;
纠正子模块4033,用于基于第一时延和第二时延对伴奏音频和干音音频之间的时延进行纠正。
可选地,纠正子模块4033具体用于:
将第一时延和第二时延之间的时延差确定为伴奏音频和干音音频之间的时延;
若时延用于指示伴奏音频晚于干音音频,则从伴奏音频的起始播放时刻起删除伴奏音频中与时延相同的时长内的音频数据;
若时延用于指示伴奏音频早于干音音频,则从干音音频的起始播放时刻起删除干音音频中与时延相同的时长内的音频数据。
综上所述,在本申请实施例中,获取目标歌曲的伴奏音频、干音音频和原曲音频,并从原曲音频中提取原曲人声音频;基于原曲人声音频和干音音频确定第一相关函数曲线,基于原曲音频和伴奏音频确定第二相关函数曲线;基于第一相关函数曲线和第二相关函数曲线对伴奏音频和干音音频之间的时延进行纠正。由此可见,在本申请实施例中,可以通过对伴奏音频、干音音频以及对应的原曲音频进行处理,从而对伴奏音频和干音音频之间的时延进行纠正,相较于目前依靠工作人员进行纠正的方法,既节省了人力和时间,提高了纠正 效率,同时也排除了人为因素可能引发的纠正错误,提高了准确性。
需要说明的是:上述实施例提供的纠正伴奏和干音之间的时延的装置在纠正伴奏和干音之间的时延时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的纠正伴奏和干音之间的时延的装置与纠正伴奏和干音之间的时延的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图7是根据一示例性实施例示出的一种纠正伴奏和干音之间的时延的装置的服务器结构示意图。上述图2-3所示的实施例中的服务器的功能即可以通过图7中所示的服务器来实现。该服务器可以是后台服务器集群中的服务器。具体来讲:
服务器700包括中央处理单元(CPU)701、包括随机存取存储器(RAM)702和只读存储器(ROM)703的系统存储器704,以及连接系统存储器704和中央处理单元701的系统总线705。服务器700还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)706,和用于存储操作系统713、应用程序714和其他程序模块715的大容量存储设备707。
基本输入/输出系统706包括有用于显示信息的显示器708和用于用户输入信息的诸如鼠标、键盘之类的输入设备709。其中显示器708和输入设备709都通过连接到系统总线705的输入输出控制器710连接到中央处理单元701。基本输入/输出系统706还可以包括输入输出控制器710以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器710还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备707通过连接到系统总线705的大容量存储控制器(未示出)连接到中央处理单元701。大容量存储设备707及其相关联的计算机可读介质为服务器700提供非易失性存储。也就是说,大容量存储设备707可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介 质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器704和大容量存储设备707可以统称为存储器。
根据本申请的各种实施例,服务器700还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器700可以通过连接在系统总线705上的网络接口单元711连接到网络712,或者说,也可以使用网络接口单元711来连接到其他类型的网络或远程计算机系统(未示出)。
上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。所述一个或者一个以上程序包含用于进行本申请实施例提供的纠正伴奏和干音之间的时延的方法的指令。
本申请实施例还提供了一种非临时性计算机可读存储介质,当所述存储介质中的指令由服务器的处理器执行时,使得服务器能够执行上述图2-3所示实施例提供的纠正伴奏和干音之间的时延的方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述图2-3所示实施例提供的纠正伴奏和干音之间的时延的方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种纠正伴奏和干音之间的时延的方法,所述方法包括:
    获取目标歌曲的伴奏音频、干音音频和原曲音频,并从所述原曲音频中提取原曲人声音频;
    基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,基于所述原曲音频和所述伴奏音频确定第二相关函数曲线;
    基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正。
  2. 根据权利要求1所述的方法,其中,所述基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,包括:
    获取所述原曲人声音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述原曲人声音频包括的多个音频帧的先后顺序,对获取的所述原曲人声音频的多个音高值进行排序,得到第一音高序列;
    获取所述干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述干音音频包括的多个音频帧的先后顺序,对获取的所述干音音频的多个音高值进行排序,得到第二音高序列;
    基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线,包括:
    基于所述第一音高序列和所述第二音高序列确定如下式所示的第一相关函数模型;
    Figure PCTCN2018117519-appb-100001
    其中,所述N为预设的音高值的数量,所述N小于或等于所述第一音高序列包括的音高值的数量,且所述N小于或等于所述第二音高序列包括的音高值的数量,所述x(n)表示所述第一音高序列中的第n个音高值,所述y(n-t)表示所述第二音高序列中的第(n-t)个音高值,所述t为所述第一音高序列和所述第二音高序列之间的时间偏移量;
    基于所述第一相关函数模型确定所述第一相关函数曲线。
  4. 根据权利要求1所述的方法,其中,所述基于所述原曲音频和所述伴奏音频确定第二相关函数曲线,包括:
    按照所述原曲音频包括的多个音频帧的先后顺序获取所述原曲音频包括的多个音频帧,以得到第一音频序列;
    按照所述伴奏音频包括的多个音频帧的先后顺序获取所述伴奏音频包括的多个音频帧,以得到第二音频序列;
    基于所述第一音频序列和所述第二音频序列确定所述第二相关函数曲线。
  5. 根据权利要求1-4任一所述的方法,其中,所述基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正,包括:
    在所述第一相关函数曲线上检测第一峰值,在所述第二相关函数曲线检测第二峰值;
    基于所述第一峰值确定所述原曲人声音频和所述干音音频之间的第一时延,基于所述第二峰值确定所述伴奏音频与所述原曲音频之间的第二时延;
    基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正。
  6. 根据权利要求5所述的方法,其中,所述基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正,包括:
    将所述第一时延和所述第二时延之间的时延差确定为所述伴奏音频和所述干音音频之间的时延;
    若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频晚于所述干音音频,则从所述伴奏音频的起始播放时刻起删除所述伴奏音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据;
    若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频早于所述干音音频,则从所述干音音频的起始播放时刻起删除所述干音音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据。
  7. 一种纠正伴奏和干音之间的时延的装置,所述装置包括:
    获取模块,用于获取目标歌曲的伴奏音频、干音音频和原曲人声音频,并从所述原曲音频中提取原曲人声音频;
    确定模块,用于基于所述原曲人声音频和所述干音音频确定第一相关函数曲线,基于所述原曲音频和所述伴奏音频确定第二相关函数曲线;
    纠正模块,用于基于所述第一相关函数曲线和所述第二相关函数曲线对所述伴奏音频和所述干音音频之间的时延进行纠正。
  8. 根据权利要求7所述的装置,其中,所述确定模块包括:
    第一获取子模块,用于获取所述原曲人声音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述原曲人声音频包括的多个音频帧的先后顺序,对获取的所述原曲人声音频的多个音高值进行排序,得到第一音高序列;
    所述第一获取子模块,还用于获取所述干音音频包括的多个音频帧中每个音频帧对应的音高值,并按照所述干音音频包括的多个音频帧的先后顺序,对获取的所述干音音频的多个音高值进行排序,得到第二音高序列;
    第一确定子模块,用于基于所述第一音高序列和所述第二音高序列,确定所述第一相关函数曲线。
  9. 根据权利要求8所述的装置,其中,所述第一确定子模块具体用于:
    基于所述第一音高序列和所述第二音高序列确定如下式所示的第一相关函数模型;
    Figure PCTCN2018117519-appb-100002
    其中,所述N为预设的音高值的数量,所述N小于或等于所述第一音高序列包括的音高值的数量,且所述N小于或等于所述第二音高序列包括的音高值的数量,所述x(n)表示所述第一音高序列中的第n个音高值,所述y(n-t)表示所述第二音高序列中的第(n-t)个音高值,所述t为所述第一音高序列和所述第二音高序列之间的时间偏移量;
    基于所述第一相关函数模型确定所述第一相关函数曲线。
  10. 根据权利要求7所述的装置,其中,所述确定模块包括:
    第二获取子模块,用于按照所述原曲音频包括的多个音频帧的先后顺序获取所述原曲音频包括的多个音频帧,以得到第一音频序列;
    所述第二获取子模块,用于按照所述伴奏音频包括的多个音频帧的先后顺序获取所述伴奏音频包括的多个音频帧,以得到第二音频序列;
    第二确定子模块,用于基于所述第一音频序列和所述第二音频序列确定所述第二相关函数曲线。
  11. 根据权利要求6-10任一所述的装置,其中,所述纠正模块包括:
    检测子模块,用于在所述第一相关函数曲线上检测第一峰值,在所述第二相关函数曲线检测第二峰值;
    第三确定子模块,用于基于所述第一峰值确定所述原曲人声音频和所述干音音频之间的第一时延,基于所述第二峰值确定所述伴奏音频与所述原曲音频之间的第二时延;
    纠正子模块,用于基于所述第一时延和所述第二时延对所述伴奏音频和所述干音音频之间的时延进行纠正。
  12. 根据权利要求11所述的装置,其中,所述纠正子模块具体用于:
    将所述第一时延和所述第二时延之间的时延差确定为所述伴奏音频和所述干音音频之间的时延;
    若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频晚于所述干音音频,则从所述伴奏音频的起始播放时刻起删除所述伴奏音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据;
    若所述伴奏音频和所述干音音频之间的时延用于指示所述伴奏音频早于所述干音音频,则从所述干音音频的起始播放时刻起删除所述干音音频中与所述伴奏音频和所述干音音频之间的时延相同的时长内的音频数据。
  13. 一种纠正伴奏和干音之间的时延的装置,所述装置包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为权利要求1-6所述的任一项方法的步骤。
  14. 一种计算机可读存储介质所述计算机可读存储介质上存储有指令,所述指令被处理器执行时实现权利要求1-6所述的任一项方法的步骤。
PCT/CN2018/117519 2018-06-11 2018-11-26 纠正伴奏和干音之间的时延的方法、装置及存储介质 WO2019237664A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/627,954 US10964301B2 (en) 2018-06-11 2018-11-26 Method and apparatus for correcting delay between accompaniment audio and unaccompanied audio, and storage medium
EP18922771.3A EP3633669B1 (en) 2018-06-11 2018-11-26 Method and apparatus for correcting time delay between accompaniment and dry sound, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810594183.2A CN108711415B (zh) 2018-06-11 2018-06-11 纠正伴奏和干音之间的时延的方法、装置及存储介质
CN201810594183.2 2018-06-11

Publications (1)

Publication Number Publication Date
WO2019237664A1 true WO2019237664A1 (zh) 2019-12-19

Family

ID=63871572

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117519 WO2019237664A1 (zh) 2018-06-11 2018-11-26 纠正伴奏和干音之间的时延的方法、装置及存储介质

Country Status (4)

Country Link
US (1) US10964301B2 (zh)
EP (1) EP3633669B1 (zh)
CN (1) CN108711415B (zh)
WO (1) WO2019237664A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108711415B (zh) 2018-06-11 2021-10-08 广州酷狗计算机科技有限公司 纠正伴奏和干音之间的时延的方法、装置及存储介质
CN112133269B (zh) * 2020-09-22 2024-03-15 腾讯音乐娱乐科技(深圳)有限公司 一种音频处理方法、装置、设备及介质
CN112687247B (zh) * 2021-01-25 2023-08-08 北京达佳互联信息技术有限公司 音频对齐方法、装置、电子设备及存储介质
CN113192477A (zh) * 2021-04-28 2021-07-30 北京达佳互联信息技术有限公司 音频处理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7333865B1 (en) * 2006-01-03 2008-02-19 Yesvideo, Inc. Aligning data streams
CN103310776A (zh) * 2013-05-29 2013-09-18 亿览在线网络技术(北京)有限公司 一种实时混音的方法和装置
CN104885153A (zh) * 2012-12-20 2015-09-02 三星电子株式会社 音频校正设备及其音频校正方法
CN104978982A (zh) * 2015-04-02 2015-10-14 腾讯科技(深圳)有限公司 一种流媒体版本对齐方法,及设备
CN106251890A (zh) * 2016-08-31 2016-12-21 广州酷狗计算机科技有限公司 一种录制歌曲音频的方法、装置和系统
CN107862093A (zh) * 2017-12-06 2018-03-30 广州酷狗计算机科技有限公司 文件属性识别方法及装置
CN108711415A (zh) * 2018-06-11 2018-10-26 广州酷狗计算机科技有限公司 纠正伴奏和干音之间的时延的方法、装置及存储介质

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142961A (en) * 1989-11-07 1992-09-01 Fred Paroutaud Method and apparatus for stimulation of acoustic musical instruments
US5648627A (en) * 1995-09-27 1997-07-15 Yamaha Corporation Musical performance control apparatus for processing a user's swing motion with fuzzy inference or a neural network
US5808219A (en) * 1995-11-02 1998-09-15 Yamaha Corporation Motion discrimination method and device using a hidden markov model
US6077084A (en) * 1997-04-01 2000-06-20 Daiichi Kosho, Co., Ltd. Karaoke system and contents storage medium therefor
EP0913808B1 (en) * 1997-10-31 2004-09-29 Yamaha Corporation Audio signal processor with pitch and effect control
JPH11194773A (ja) * 1997-12-29 1999-07-21 Casio Comput Co Ltd 自動伴奏装置および自動伴奏方法
US6353174B1 (en) * 1999-12-10 2002-03-05 Harmonix Music Systems, Inc. Method and apparatus for facilitating group musical interaction over a network
US6541692B2 (en) * 2000-07-07 2003-04-01 Allan Miller Dynamically adjustable network enabled method for playing along with music
JP4580548B2 (ja) * 2000-12-27 2010-11-17 大日本印刷株式会社 周波数解析方法
EP2175440A3 (en) * 2001-03-23 2011-01-12 Yamaha Corporation Music sound synthesis with waveform changing by prediction
US6696631B2 (en) * 2001-05-04 2004-02-24 Realtime Music Solutions, Llc Music performance system
US6482087B1 (en) * 2001-05-14 2002-11-19 Harmonix Music Systems, Inc. Method and apparatus for facilitating group musical interaction over a network
US6653545B2 (en) * 2002-03-01 2003-11-25 Ejamming, Inc. Method and apparatus for remote real time collaborative music performance
US6898729B2 (en) * 2002-03-19 2005-05-24 Nokia Corporation Methods and apparatus for transmitting MIDI data over a lossy communications channel
US20070028750A1 (en) * 2005-08-05 2007-02-08 Darcie Thomas E Apparatus, system, and method for real-time collaboration over a data network
US7518051B2 (en) * 2005-08-19 2009-04-14 William Gibbens Redmann Method and apparatus for remote real time collaborative music performance and recording thereof
KR100636248B1 (ko) * 2005-09-26 2006-10-19 삼성전자주식회사 보컬 제거 장치 및 방법
US20090320669A1 (en) * 2008-04-14 2009-12-31 Piccionelli Gregory A Composition production with audience participation
US20070245881A1 (en) * 2006-04-04 2007-10-25 Eran Egozy Method and apparatus for providing a simulated band experience including online interaction
US8079907B2 (en) * 2006-11-15 2011-12-20 Harmonix Music Systems, Inc. Method and apparatus for facilitating group musical interaction over a network
TWI331744B (en) 2007-07-05 2010-10-11 Inventec Corp System and method of automatically adjusting voice to melody according to marked time
KR20080011457A (ko) * 2008-01-15 2008-02-04 주식회사 엔터기술 음성 또는 영상신호의 딜레이 컨트롤 기능을 가지는노래반주기 및 그의 컨트롤 방법
US8653349B1 (en) * 2010-02-22 2014-02-18 Podscape Holdings Limited System and method for musical collaboration in virtual space
GB2493470B (en) * 2010-04-12 2017-06-07 Smule Inc Continuous score-coded pitch correction and harmony generation techniques for geographically distributed glee club
JP6127476B2 (ja) * 2012-11-30 2017-05-17 ヤマハ株式会社 ネットワーク音楽セッションにおける遅延測定方法及び装置
US9236039B2 (en) * 2013-03-04 2016-01-12 Empire Technology Development Llc Virtual instrument playing scheme
FR3022051B1 (fr) * 2014-06-10 2016-07-15 Weezic Procede de suivi d'une partition musicale et procede de modelisation associe
WO2016009444A2 (en) * 2014-07-07 2016-01-21 Sensibiol Audio Technologies Pvt. Ltd. Music performance system and method thereof
CN204559866U (zh) * 2015-05-20 2015-08-12 徐文波 音频设备
CN105827829B (zh) * 2016-03-14 2019-07-26 联想(北京)有限公司 收音方法及电子设备
CN107203571B (zh) * 2016-03-18 2019-08-06 腾讯科技(深圳)有限公司 歌曲旋律信息处理方法和装置
CN107666638B (zh) * 2016-07-29 2019-02-05 腾讯科技(深圳)有限公司 一种估计录音延迟的方法及终端设备
CN106448637B (zh) * 2016-10-21 2018-09-04 广州酷狗计算机科技有限公司 一种发送音频数据的方法和装置
CN107591149B (zh) * 2017-09-18 2021-09-28 腾讯音乐娱乐科技(深圳)有限公司 音频合成方法、装置及存储介质
CN108008930B (zh) * 2017-11-30 2020-06-30 广州酷狗计算机科技有限公司 确定k歌分值的方法和装置
US10923141B2 (en) * 2018-08-06 2021-02-16 Spotify Ab Singing voice separation with deep u-net convolutional networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7333865B1 (en) * 2006-01-03 2008-02-19 Yesvideo, Inc. Aligning data streams
CN104885153A (zh) * 2012-12-20 2015-09-02 三星电子株式会社 音频校正设备及其音频校正方法
CN103310776A (zh) * 2013-05-29 2013-09-18 亿览在线网络技术(北京)有限公司 一种实时混音的方法和装置
CN104978982A (zh) * 2015-04-02 2015-10-14 腾讯科技(深圳)有限公司 一种流媒体版本对齐方法,及设备
CN106251890A (zh) * 2016-08-31 2016-12-21 广州酷狗计算机科技有限公司 一种录制歌曲音频的方法、装置和系统
CN107862093A (zh) * 2017-12-06 2018-03-30 广州酷狗计算机科技有限公司 文件属性识别方法及装置
CN108711415A (zh) * 2018-06-11 2018-10-26 广州酷狗计算机科技有限公司 纠正伴奏和干音之间的时延的方法、装置及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3633669A4 *

Also Published As

Publication number Publication date
EP3633669A4 (en) 2020-08-12
EP3633669B1 (en) 2024-04-17
US10964301B2 (en) 2021-03-30
CN108711415B (zh) 2021-10-08
EP3633669A1 (en) 2020-04-08
US20200135156A1 (en) 2020-04-30
CN108711415A (zh) 2018-10-26

Similar Documents

Publication Publication Date Title
WO2019237664A1 (zh) 纠正伴奏和干音之间的时延的方法、装置及存储介质
WO2018177190A1 (zh) 一种区块链数据同步的方法和装置
CN108255653B (zh) 一种产品的测试方法及其终端
CN107591149B (zh) 音频合成方法、装置及存储介质
WO2020093883A1 (zh) 获取视频片段的方法、装置、服务器和存储介质
WO2017157319A1 (zh) 音频信息处理方法及装置
WO2020015153A1 (zh) 为歌词文本生成乐曲的方法、装置及计算机可读存储介质
WO2020199713A1 (zh) 数据验证方法、系统、装置及设备
WO2020238042A1 (zh) 识别算法更新的检测方法、装置、存储介质及计算机设备
US11907288B2 (en) Audio identification based on data structure
WO2016184163A1 (zh) Dpi规则的生成方法及装置
CN110677718B (zh) 一种视频识别方法和装置
WO2017000743A1 (zh) 一种软件推荐的方法和装置
CN106782601B (zh) 一种多媒体数据处理方法及其装置
CN106982344A (zh) 视频信息处理方法及装置
WO2020015411A1 (zh) 一种训练改编水平评价模型、评价改编水平的方法及装置
CN111159464B (zh) 一种音频片段的检测方法及相关设备
WO2020238777A1 (zh) 音频片段的匹配方法、装置、计算机可读介质及电子设备
WO2020078120A1 (zh) 音频识别方法、装置及存储介质
CN111400542A (zh) 音频指纹的生成方法、装置、设备及存储介质
US8700542B2 (en) Rule set management
WO2023169259A1 (zh) 音乐热度的预测方法、装置、存储介质及电子设备
CN111462775A (zh) 音频相似度确定方法、装置、服务器及介质
CN107133344B (zh) 一种数据处理方法及装置
CN108549642B (zh) 评价音高信息的标注质量的方法、装置及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018922771

Country of ref document: EP

Effective date: 20191230

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922771

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE