WO2018059342A1 - 一种双音源音频数据的处理方法及装置 - Google Patents
一种双音源音频数据的处理方法及装置 Download PDFInfo
- Publication number
- WO2018059342A1 WO2018059342A1 PCT/CN2017/103106 CN2017103106W WO2018059342A1 WO 2018059342 A1 WO2018059342 A1 WO 2018059342A1 CN 2017103106 W CN2017103106 W CN 2017103106W WO 2018059342 A1 WO2018059342 A1 WO 2018059342A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- accompaniment
- song
- pair
- songs
- same
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012216 screening Methods 0.000 claims description 25
- 238000001228 spectrum Methods 0.000 claims description 24
- 230000009977 dual effect Effects 0.000 claims description 17
- 230000001629 suppression Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 230000001131 transforming effect Effects 0.000 claims 3
- 230000005764 inhibitory process Effects 0.000 claims 1
- 230000001755 vocal effect Effects 0.000 abstract description 2
- 238000003672 processing method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 12
- 241000109463 Rosa x alba Species 0.000 description 10
- 235000005073 Rosa x alba Nutrition 0.000 description 10
- 230000015572 biosynthetic process Effects 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 241000282376 Panthera tigris Species 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 244000181025 Rosa gallica Species 0.000 description 2
- 235000000533 Rosa gallica Nutrition 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0018—Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
- G10H1/366—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/46—Volume control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/051—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/011—Files or data streams containing coded musical information, e.g. for transmission
- G10H2240/036—File multilingual, e.g. multilingual lyrics for karaoke
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/325—Synchronizing two or more audio tracks or files according to musical features or musical timings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/135—Autocorrelation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the invention relates to the technical field of computer information processing, in particular to a method and a device for processing dual audio source audio data.
- the embodiment of the invention provides a dual-source audio data processing method and device, and a non-volatile computer readable storage medium, which can automatically synthesize two songs with the same accompaniment but different singing.
- An embodiment of the present invention provides a method for processing audio data of dual tone sources, where the method includes:
- the time period is energy-suppressed for the left or right channel of the two-channel audio.
- Embodiments of the present invention provide a dual tone audio data processing apparatus, the device including a processor and a nonvolatile storage medium, the nonvolatile storage medium storing one or more computer readable instructions, the processing The computer readable instructions are executed to implement the steps:
- the playback time corresponding to the two-channel audio is divided into a plurality of playback periods, and the left channel or the right channel of the two-channel audio is energy-suppressed in different playback periods.
- the playback time corresponding to the two-channel audio is divided into a plurality of playback periods, and the left channel or the right channel of the two-channel audio is energy-suppressed in different playback periods.
- two songs with the same accompaniment but different singing are used as the homologous song pair, and two mono audio data are obtained by decoding the audio data of the homologous song pair, and the two mono audio data are combined into two pairs.
- Channel audio data and divide the playback time corresponding to the two-channel audio into multiple playback periods, and perform energy suppression on the left or right channel of the two-channel audio in different playback periods, achieving the same accompaniment but singing Different two songs alternately sing fruit.
- the embodiment of the invention provides a novel automatic music synthesis scheme, which provides a new function for the computer device. Since the lyrics information and the accompaniment information of the two songs used for the synthesis processing are the same, the processed audio is very soft, does not cause a sudden sensation, and has a comfortable listening effect.
- the embodiment of the present invention has the advantages of low input cost and high synthesis efficiency as compared with the manner in which a singer performs live recording to record a musical piece.
- FIG. 1 is a schematic diagram of a scenario of a method for processing dual-tone audio data according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a method for processing audio data of dual tone sources according to an embodiment of the present invention
- FIG. 3 is a schematic flowchart of a method for acquiring a pair of homologous songs according to an embodiment of the present invention
- FIG. 4 is a schematic flow chart of a method for screening an accompaniment of a primary selected song pair according to an embodiment of the present invention
- FIG. 5 is a schematic diagram of a lyric file of a homologous song pair according to an embodiment of the present invention
- FIG. 6 is a schematic diagram of an accompaniment file of a homologous song pair according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of a device for processing dual audio source audio data according to an embodiment of the present invention.
- FIG. schematic diagram ;
- FIG. 8 is a schematic structural diagram of a dual-tone audio data processing apparatus according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of an acquiring module of an audio data processing apparatus according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram showing the hardware structure of a dual tone audio data processing apparatus according to an embodiment of the present invention.
- module as used herein may be taken to mean a software object that is executed on the computing system.
- the different components, modules, engines, and services described herein can be viewed as being on the computing system.
- Implementation object The apparatus and method described herein are preferably implemented in software, and may of course be implemented in hardware, all of which are within the scope of the present invention.
- Embodiments of the present invention provide a dual tone source audio data processing method and apparatus.
- FIG. 1 is a schematic diagram of a scenario of a dual tone audio data processing method according to an embodiment of the present invention.
- the scene may include audio data processing device, which is operated in the server 200, referred to as the audio processing device 300, and is mainly used for acquiring audio data of a homologous song pair, the two homologous song pairs being the same accompaniment but singing differently. a song; then, respectively, the audio data of the homologous song pair is decoded to obtain two mono audio data; subsequently, the two mono audio data are combined into one two-channel audio data; finally,
- the playing time corresponding to the two-channel audio is divided into a plurality of playing periods, and energy suppression is performed on the left channel or the right channel of the two-channel audio in different playing periods to obtain the processed audio data.
- the scene may further include a song database 100 in which a large amount of song information is stored, including accompaniment files, lyric files, and audio data corresponding to the songs.
- the audio processing device 300 selects two songs with the same accompaniment but different singing to form a homologous song pair according to the lyric file and the accompaniment file of the song in the song database 100.
- the user terminal 400 may also be included in the scenario, such as a mobile phone, a tablet computer, etc., and the user terminal includes an input device (such as a keyboard, a mouse, etc.) and an output device (such as a screen, a power amplifier, etc.). The user triggers selection of the audio data processed by the audio processing device 300 through the input device, and plays the processed audio data through the output device.
- an audio processing device which may be integrated into a network device such as a server or a gateway to implement a dual audio source audio data processing method.
- the network device such as the server or the gateway may be a computer device.
- An embodiment of the present invention provides a dual-source audio data processing method, including: acquiring audio data of a homologous song pair, the homologous song pair being the same accompaniment but singing different two a song; respectively decoding the audio data of the homologous song pair to obtain two mono audio data; combining the two mono audio data into one two-channel audio data; and corresponding to the two-channel audio
- the play time is divided into a plurality of play periods, and the left channel or the right channel of the two-channel audio is energy-suppressed in different play periods.
- FIG. 2 is a schematic flowchart diagram of a method for processing dual-tone audio data according to an embodiment of the present invention. The method includes:
- the pair of homologous songs are two songs that are the same but sing differently, and the difference in singing refers to different singers or different singing languages.
- the pair of homologous songs may be two songs obtained by the same singer being sung in two different languages by the same singer, such as: Eason Chan’s "Red Rose” sung in Mandarin and singing in Cantonese The "White Rose”, “Red Rose” and “White Rose” have different singing languages but the same accompaniment, and can be used as a homologous song pair.
- the homologous song pair may also be two songs obtained by different singers singing the same song, such as Megan Nicole and Alex singing the song "Maps", Megan Nicole singing "Maps” "Maps” by Alex and Alex can be used as a homologous song pair.
- Megan Nicole and Alex singing the song "Maps” Megan Nicole singing "Maps” "Maps” by Alex and Alex
- S202 Perform decoding processing on the audio data of the homologous song pair to obtain two mono audio data.
- the playing time corresponding to the two-channel audio includes time information of each lyric.
- the time information may be a start time and an end time of each lyrics.
- the time information may also be a start time and a duration of each lyric.
- the playing time may be divided into a plurality of playing periods according to time information of each lyric, and the number of the playing periods is consistent with the number of sentences included in the lyrics, and the effect of each of the two sounds is sung;
- the lyrics can be divided into multiple paragraphs, and the playing time period is divided according to the lyrics paragraphs, that is, one or more lyrics are taken as one segment, and the starting time of the first lyrics in the paragraph is taken as the starting time of the playing period, in paragraphs.
- the end time corresponding to the last sentence of the lyrics is used as the end time of the play period to realize the effect of each of the two sounds.
- the left channel and the right channel may be alternately suppressed in different playing periods; or the left channel or the right channel may be separately suppressed according to a preset rule. For example: suppressing the same channel for a plurality of consecutive playback periods, or not suppressing the left and right channels for some playback periods, or for the left or right only for part of the playback period
- the channel performs energy suppression.
- the energy suppression of the left channel or the right channel is performed only during part of the playing period, and the part of the song corresponding to the same playing period can be alternately completed by two sounds, such as the upper part of the sentence of the sound A singing word, the sound B sings the lower part of the sentence.
- the fade-out effect is applied to the channel that needs to be energy-suppressed; in the playing period, the audio sampling points of the channel to be energy-suppressed are all set to 0; after exiting the playing period A fade-in effect is applied to the channel within a preset time.
- the processed audio data may also be presented to the user for the user to listen to.
- FIG. 3 is a schematic flowchart of a method for acquiring a pair of homologous songs according to an embodiment of the present invention. The method includes:
- the song information of the candidate song pair may be obtained. Create a list of song pairs. Specifically, the song information of the candidate song pair can be obtained by the method described in the following embodiments.
- all the songs with the same song name but different singer names are searched in the song database, and all the songs obtained by the search are combined in pairs to obtain a candidate song pair, and then the songs of the candidate song pair are extracted from the song database. information.
- search results include “Love” by Xiaohu, "Love” by Karen Mok, and “Love” by TFBOYS.
- the search results include "Love” by Xiaohu, "Love” by Karen Mok, and "Love” by TFBOYS.
- “Love” by the Little Tigers and “Love” by Karen Mok as a candidate song pair
- “Love” by the Little Tigers and “Love” by TFBOYS as a candidate song pair
- “Love” by Karen Mok “Love” sung with TFBOYS as a candidate song pair.
- a song tagged with a language tag is searched for in all songs of the same singer, and the language tag is used to identify the song, generally including a song name and a language version, a song with a language tag and Another song corresponding to the song name in the language tag serves as a candidate song pair, and acquires song information of the candidate song pair.
- the song “White Rose” is searched for the red rose Cantonese version, and the "White Rose” and the “Red Rose” corresponding to the language tag can be used as candidate song pairs.
- the lyrics of the candidate song pair can be filtered by the following method.
- the lyrics files of the two songs in the candidate song pair are respectively parsed, and the number of lyrics sentences of each song and the time information corresponding to each lyrics are obtained, and the time information may include the start time and the end time of each lyric.
- the time information can also include each The start time of the lyrics and the duration of the lyrics singing; then determine whether the number of lyrics of the two songs is the same, if not, discard the corresponding candidate song pairs, and if so, the lyrics of the two songs are one-to-one. It is determined whether the time information of the corresponding lyrics in the two songs is the same. If they are different, the corresponding candidate song pairs are discarded. If they are the same, the candidate song pairs are used as the primary song pair.
- S303 Perform accompaniment screening on the primary song pair according to the accompaniment files of the two songs in the primary song pair. If the accompaniment files of the two songs in the primary song pair are the same, determine that the primary song pair is the same. The song is right.
- the accompaniment of the primary song pair can be filtered by the following method.
- S402. Combine the accompaniment audios with the same accompaniment time in the two songs one by one to form at least one accompaniment pair.
- the specific method for judging whether the two pieces of accompaniment audio of the accompaniment pair are the same includes: separately performing Fourier transform on the two accompaniment audio data to generate two spectra; dividing the two spectra into the same number of frequency bands, respectively calculating each spectrum Comparing the average energy value of each frequency band, comparing the size of each frequency band with the previous frequency band, and comparing the comparison result in binary to obtain a binary sequence corresponding to the frequency spectrum; calculating a matching probability of two binary sequences, the matching probability for Where n is the total number of digits in the binary sequence, m is the number of digits corresponding to the digits of the two binary sequences, determining whether the matching probability is greater than a preset value, and if so, determining the two accompaniment pairs of the accompaniment pair.
- the audio is the same.
- the audio data of the two songs constituting the homologous song pair is obtained from the song database.
- FIG. 3 is only one of many methods for acquiring a pair of homologous songs, and should not be construed as limiting the present invention.
- the dual-source audio data processing method is divided into two main aspects, one is to filter the homologous song pair, and the other is to synthesize the audio data of the homologous song pair.
- the following two aspects will be described separately with reference to the examples.
- the homologous songs are composed of two songs with the same accompaniment but different singing. To screen the homologous song pairs, it is necessary to find two songs that use the same accompaniment but sing differently.
- the screening of the homologous song pair may first find the candidate song pair from the song database, and establish a song pair list. Among them, searching for candidate songs from the song database can be performed as follows.
- the first method in the song list of the same singer, to see if there is a song with a language label, such as the song "White Rose” language label is red rose Cantonese version, you can use "White Rose” and "Red Rose” as The candidate song pairs are respectively recorded as song x and song y;
- the second method is to search all the songs with the same song name but different singer names in the song database, and combine all the songs obtained by the search, for example, "Love” by the Little Tigers, "Love” by Karen Mok and TFBOYS sings "Love” in pairs, and obtains three candidate song pairs.
- the two songs of each candidate song pair are recorded as song x and song y respectively.
- the corresponding lyrics files x l and y l are respectively found in the song database, wherein the lyric file format may specifically include the lyrics corresponding to the song and the time information corresponding to each lyric, the time information may be The start time and end time of each lyric can also be the start time and duration of each lyric.
- Figure 5 (a), (b) shows the lyrics file of "Red Rose” and "White Rose” respectively.
- the lyric file format is the start time of each line, followed by the corresponding lyrics, which can be parsed by the lyrics file. Obtain the time information corresponding to each lyrics, as shown in Figure 5.
- the first lyrics of "Red Rose” and the first lyrics of "White Rose” are both [ 00:16.28], the end time is [00:18.65], the first sentence of "Red Rose” is represented by "aaaaaaaaa”, the first sentence of "White Rose” is represented by "bbbbbbb”, and the other lyrics are "xx" «" said.
- the number of lyrics corresponding to the two songs is the same and the start time and end time of each lyric are the same, it is judged that the songs x and y meet the lyric screening condition, and the candidate songs composed of the songs x and y can be selected. For the primary song pair, enter the next round of accompaniment screening, otherwise the candidate song pair is removed from the song pair list.
- the audio x, y can be compared.
- the specific steps of comparing the audios of the songs x and y are as follows:
- the 4k spectrum is equally divided into 32 frequency bands, the average is calculated for each frequency band, and the calculated average value is taken as the average energy of the frequency band;
- the matching probability can be expressed as Where n is the total number of digits in the binary sequence, and m is the same number of digits corresponding to the digits of the two binary sequences (the numbers on the corresponding digits of the two binary sequences are both 0 or 1 at the same time) If the number is the same, the matching probability is compared with the preset value k. If the matching probability is not less than the preset value k, the matching is considered to be successful, and the accompaniment x 1 and y 1 tend to be the same. In theory, if the accompaniment x 1 is the same as y 1 , the matching probability should be 1.
- the similarity of the accompaniment x 1 and y 1 is determined by setting the preset value k.
- the closer the matching probability is to 1, the more likely the accompaniment x 1 is the same as y 1 , so the preset value k should approach 1 , for example, k 0.9.
- the remaining pairs of songs in the song pair list are all homologous song pairs.
- the audio data of the homologous song pair needs to be synthesized.
- the following describes the synthesis processing method of the homologous song pair composed of the songs x and y.
- the steps of synthesizing the songs x and y are as follows:
- the audio data of song x and song y is decoded into 44k16 bit mono audio data.
- the music synthesized only by the above processing may sound messy.
- all the lyric sentences can be cut into n time segments t i i(1, n) according to the time stamp of the lyric information, where n The number of sentences for the lyrics (only the number of sentences sung, not the lyrics and accompaniment lines).
- fade out a fade-out effect is produced in the first 1 second of t 1 , and the specific time period is ([00:15.28] to [00:16.28]). Since the sampling rate is 44100, the left in this second The channel audio has 44100 sampling points, the energy value is k i i ⁇ (1,44100), and the new energy value after fading out
- a fade-in effect occurs in the last 1 second of t 1 , the specific time period is ([00:18.65] to [00:19.65]), and the left channel audio in this second has 44100 samples.
- the energy value is k i i ⁇ (1,44100), then the new energy value after fading in
- two songs with the same accompaniment but different singing are used as the homologous song pair, and two mono audio data are obtained by decoding the audio data of the homologous song pair, and the two mono audio data are combined into two pairs.
- Channel audio data and divide the playback time corresponding to the two-channel audio into multiple playback periods, and perform energy suppression on the left or right channel of the two-channel audio in different playback periods, achieving the same accompaniment but singing
- the embodiment of the invention provides a novel music automatic synthesis scheme, which is a computer device It provides a new function and has the advantages of low input cost and high synthesis efficiency.
- the embodiment of the present invention further provides a dual tone audio data processing device.
- the meaning of the noun is the same as the method of processing the above audio data.
- FIG. 7 is a schematic structural diagram of a dual-tone audio data processing apparatus according to an embodiment of the present invention, where the apparatus includes an obtaining module 500 and a processing module 600.
- the obtaining module 500 is configured to acquire audio data of a homologous song pair.
- the processing module 600 includes a decoding sub-module 610, a merging sub-module 620, and a processing sub-module 630.
- the decoding sub-module 610 is configured to separately decode audio data of the homologous song pair to obtain two mono audios. Data; a merging sub-module 620, configured to combine two mono audio data into one two-channel audio data; and a processing sub-module 630, configured to divide the playing time corresponding to the two-channel audio into multiple playing periods, Energy suppression of the left or right channel of the two-channel audio during different playback periods.
- the homologous song pair is the two songs that are the same but sing different.
- the pair of homologous songs may be two songs obtained by the same singer being sung in two different languages by the same singer, or two singers of different singers singing the same song.
- the acquiring module 500 for acquiring a homologous song pair specifically includes an obtaining submodule 510 , a lyric screening submodule 520 , and an accompaniment filtering submodule 530 .
- the obtaining sub-module 510 is configured to acquire song information of the candidate song pair, the song information includes a lyric file and an accompaniment file corresponding to the two songs; the lyric screening sub-module 520 is configured to use the lyric file according to the two songs of the candidate song pair. Performing lyrics screening on the candidate song pairs. If the lyric files of the two songs in the candidate song pair are the same, it is determined that the candidate song pair is a primary song pair; the accompaniment screening sub-module 530 is configured to match two songs according to the primary song pair. The accompaniment file, the accompaniment screening of the primary song pair, if the accompaniment files of the two songs of the primary song pair are the same, then Determining the primary song pair is the homologous song pair.
- the lyrics file refers to the lyrics of the song and the time information of each lyric.
- the accompaniment file refers to the part of the song that only plays without lyrics.
- FIG. 9 is a schematic structural diagram of an obtaining module 500 of an audio data processing apparatus according to an embodiment of the present invention.
- the obtaining sub-module 510 may include a first obtaining unit 511 and a second obtaining unit 512; wherein the first obtaining unit 511 is configured to search all the song names in the song database but the artist name Different songs, all the songs obtained by the search are combined in pairs to obtain a candidate song pair; the second obtaining unit 512 is configured to search for all the songs marked with the language tag in all the songs of the same singer, the language tag includes the song The name and the language version, the one song marked with the language tag and the other song corresponding to the song name in the language tag are used as one candidate song pair.
- the song screening sub-module 520 may include a parsing unit 521 and a lyric screening unit 522, wherein the parsing unit is configured to parse the lyric files of the two songs of the candidate song pair respectively, and obtain the number of lyric sentences and each of the songs of each song.
- Time information corresponding to a lyric includes a start time and an end time of each lyric; the lyric screening unit is configured to determine whether the number of lyric sentences of the two songs in the candidate song pair is the same, and if so, The lyrics of the two songs correspond one-to-one, and it is judged whether the time information of the corresponding lyrics in the two songs is the same, and if so, the candidate song pair is used as the primary song pair.
- the accompaniment screening sub-module 530 may include an extracting unit 531, a mapping unit 532, an accompaniment screening unit 533, and a determining unit 534; wherein the extracting unit 531 is configured to respectively extract accompaniment files of two songs of the primary selected song pair, The accompaniment file includes at least one piece of accompaniment audio and an accompaniment time corresponding to the accompaniment audio; the mapping unit 532 is configured to map the accompaniment audios having the same accompaniment time in the two songs one by one to form at least one accompaniment pair, the accompaniment screening unit 533, For processing two pieces of accompaniment audio of each accompaniment pair separately, and obtaining two binary lengths corresponding to the accompaniment pair a sequence, calculating a matching probability of two binary sequences, and determining whether the matching probability is greater than a preset value, and if so, determining that the two pieces of accompaniment audio of the accompaniment pair are the same; determining unit 534, for determining whether two of each accompaniment pair The segment accompaniment audio is the same, and if so, it is determined that the primary song pair is the homologous song pair.
- the accompaniment screening unit 533 includes a decoding subunit 5331, a spectrum generation subunit 5332, a conversion subunit 5333, and a calculation subunit 5334;
- the decoding sub-unit 5331 is configured to respectively perform decoding processing on two pieces of accompaniment audio of each accompaniment pair to obtain two accompaniment audio data, and a spectrum generating sub-unit 5332 for performing Fourier transform on the two accompaniment audio data respectively. Generating two spectra; a conversion subunit 5333 for equally dividing the two spectra into the same number of frequency bands, respectively calculating the average energy value of each frequency band in each frequency spectrum, and comparing the size of each frequency band with the previous frequency band, The comparison result is expressed in binary, and a binary sequence corresponding to the spectrum is obtained;
- a calculating subunit 5334 configured to calculate a matching probability of two binary sequences, the matching probability is Where n represents the total number of digits in the binary sequence, m represents the same number of digits in the corresponding digits in the two binary sequences, and determines whether the matching probability is greater than a preset value, and if so, determines the two pairs of the accompaniment pairs The segment accompaniment audio is the same.
- the processing sub-module 630 includes a segmentation unit 631 and a processing unit 632, wherein the segmentation unit 631 is configured to convert the time according to the time information of each lyric in the two-channel audio.
- the playing time of the channel audio is divided into a plurality of playing periods, the number of the playing periods being consistent with the number of sentences of the lyrics; the processing unit 632 for alternately pairing the left channel and the right of the two-channel audio in different playing periods
- the channel is energy suppressed.
- the processed audio data can also be processed. Transfer to the user terminal for display to the user.
- the dual-tone audio data processing device may be specifically integrated in a network device such as a server or a gateway.
- the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities.
- the foregoing method embodiments and details are not described herein.
- the dual-tone audio data processing apparatus provided in this embodiment firstly selects two songs with the same accompaniment but different singings as the homologous song pairs from the song database, and performs decoding and merging on the audio data of the homologous song pair. Processing, obtaining two-channel audio data, and dividing the playing time corresponding to the two-channel audio into multiple playing periods, and performing energy suppression on the left channel or the right channel of the two-channel audio in different playing periods, thereby generating The effect of the two-tone sing.
- the embodiment of the invention provides a novel music synthesizing device, which provides a new function for the computer device, and has the advantages of high synthesis efficiency and low cost.
- the audio data processing device provided by the embodiment of the present invention is, for example, a computer, a tablet computer, a mobile phone with a touch function, and the like, and the processing device of the audio data belongs to the same concept as the processing method of the audio data in the above embodiment.
- the method for processing the audio data may be performed on the processing device of the audio data. The specific implementation process is described in detail in the method for processing the audio data, and details are not described herein.
- FIG. 10 is a schematic diagram showing the hardware structure of a dual tone audio data processing apparatus according to an embodiment of the present invention. As shown in FIG. 10, the apparatus may include one or more (only one shown in the figure) processor 1001, storage medium 1002, bus 1003, and interface 1004.
- the storage medium 1002 can be used to store one or more computer readable instructions, such as The software program and the module, such as the processing method and the program instruction/module corresponding to the dual tone source audio data in the embodiment of the present invention.
- the processor 1001 is connected to the storage medium 1002 via the bus 1003, and runs software programs and modules stored in the storage medium 1002, thereby executing various functional applications and data processing, that is, implementing the above-described processing method of the audio data.
- Storage medium 1002 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
- storage medium 1002 can further include memory remotely located relative to processor 1001, which can be connected to the terminal over a network.
- networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the interface 1004 is connected to the processor 1001 via a bus, receives an instruction of the processor 1001, and receives data of the external device or transmits the data to the external device according to the instruction.
- the program is implemented by controlling associated hardware, which may be stored in a computer readable storage medium, such as in a memory of the server shown in FIG. 1, and executed by at least one processor within the server.
- the flow of an embodiment of the processing method of the audio data may be included in the execution.
- the server can be a computer device.
- the storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), a random access memory (RAM), or the like.
- an embodiment of the present invention provides a non-transitory computer readable storage medium storing a computer program capable of causing a computer to execute the steps of the dual-source audio data processing method described in the above embodiments.
- each functional module may be integrated into one processing chip, or each module may exist physically or separately.
- One or more modules are integrated in one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the integrated module if implemented in the form of a software functional module and sold or used as a standalone product, may also be stored in a computer readable storage medium, such as a read only memory, a magnetic disk or an optical disk, etc. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
一种双音源音频数据的处理方法及装置。该方法包括:获取一同源歌曲对的音频数据,同源歌曲对为伴奏相同但演唱不同的两首歌曲(S201);对同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据(S202);将两个单声道音频数据合并为一个双声道音频数据(S203);将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制(S204)。
Description
本申请要求于2016年9月27日提交中国专利局、申请号为201610852918.8、发明名称为“一种双音源音频数据的处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及计算机信息处理技术领域,具体是一种双音源音频数据的处理方法及装置。
发明背景
随着多媒体设备的普及,人们希望从音乐中获取更多乐趣,除了聆听单一音乐外,混音音乐、串烧歌曲也得到大家的追捧。
发明内容
本发明实施例提供一种双音源音频数据处理方法、装置及非易失性计算机可读存储介质,可以将伴奏相同但演唱不同的两首歌曲进行自动合成。
本发明实施例采用如下的技术方案:
本发明实施例提供一种双音源音频数据的处理方法,所述方法包括:
获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;
分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;
将两个单声道音频数据合并为一个双声道音频数据;
将双声道音频对应的播放时间划分为多个播放时段,在不同的播放
时段对双声道音频的左声道或右声道进行能量抑制。
本发明实施例提供一种双音源音频数据的处理装置,所述装置包括处理器和非易失性存储介质,所述非易失性存储介质存储一个或多个计算机可读指令,所述处理器执行所述计算机可读指令以实现步骤:
获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;
分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;
将两个单声道音频数据合并为一个双声道音频数据;
将双声道音频对应的播放时间切分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
本发明实施例提供一种非易失性计算机可读存储介质,存储有计算机程序,所述计算机程序能够使计算机执行如下步骤:
获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;
对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;
将两个单声道音频数据合并为一个双声道音频数据;
将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
本发明实施例将伴奏相同但演唱不同的两首歌曲作为同源歌曲对,通过对同源歌曲对的音频数据解码得到两个单声道音频数据,将两个单声道音频数据合并成双声道音频数据,并将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,实现了伴奏相同但演唱不同的两首歌曲交替演唱的效
果。本发明实施例提供了一种新颖的音乐自动合成方案,为计算机设备提供了一种新的功能。由于用于合成处理的两首歌曲歌词信息和伴奏信息均相同,因而处理得到的音频非常柔和,不会产生突兀感,具有舒适的聆听效果。而且,无论是获取同源歌曲对还是对音频数据进行处理,均在后台进行,在用户终端侧,用户只需在用户终端上选择合成后的音频进行播放即可,无需手动选择同源歌曲对以及处理音频数据,因此简化了用户终端上的操作,节省了用户终端的处理资源。此外,与采用歌手现场演唱来录制音乐作品的方式相比,本发明实施例具有投入成本低且合成效率高的优点。
附图简要说明
为了更清楚地说明本发明的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它附图。
图1是本发明实施例提供的双音源音频数据的处理方法的场景示意图;
图2是本发明实施例提供的一种双音源音频数据的处理方法的流程示意图;
图3是本发明实施例提供的获取同源歌曲对的方法的流程示意图;
图4是本发明实施例提供的对初选歌曲对的伴奏进行筛选的方法的流程示意图;
图5是本发明实施例提供的一同源歌曲对的歌词文件的示意图;
图6是本发明实施例提供的一同源歌曲对的伴奏文件的示意图;
图7是本发明实施例提供的一种双音源音频数据的处理装置的结构
示意图;
图8是本发明实施例提供的一种双音源音频数据的处理装置的结构示意图;
图9是本发明实施例提供的音频数据处理装置的获取模块的结构示意图;
图10是本发明实施例提供的双音源音频数据的处理装置的硬件结构示意图。
实施本发明的方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
在以下的说明中,本发明的具体实施例将参考由一部或多部计算机所执行的步骤及符号来说明,除非另有述明。因此,这些步骤及操作将有数次提到由计算机执行,本文所指的计算机执行包括了由代表了以一结构化型式中的数据的电子信号的计算机处理单元的操作。此操作转换该数据或将其维持在该计算机的内存系统中的位置处,其可重新配置或另外以本领域测试人员所熟知的方式来改变该计算机的运作。该数据所维持的数据结构为该内存的实体位置,其具有由该数据格式所定义的特定特性。但是,本发明原理以上述文字来说明,其并不代表为一种限制,本领域测试人员将可了解到以下所述的多种步骤及操作亦可实施在硬件当中。
本文所使用的术语“模块”可看做为在该运算系统上执行的软件对象。本文所述的不同组件、模块、引擎及服务可看做为在该运算系统上
的实施对象。而本文所述的装置及方法优选的以软件的方式进行实施,当然也可在硬件上进行实施,均在本发明保护范围之内。
本发明实施例提供一种双音源音频数据处理方法及装置。
参见附图1,该图为本发明实施例所提供的双音源音频数据处理方法的场景示意图。该场景可以包括音频数据的处理装置,其运行于服务器200中,简称音频处理装置300,主要用于获取同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;然后,分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;随后,将两个单声道音频数据合并为一个双声道音频数据;最后,将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,得到处理后的音频数据。
此外,该场景还可以包括歌曲数据库100,该歌曲数据库中存储有大量的歌曲信息,其中包含歌曲相应的伴奏文件、歌词文件以及音频数据。音频处理装置300根据歌曲数据库100中歌曲的歌词文件和伴奏文件,筛选出伴奏相同但演唱不同的两首歌曲组成同源歌曲对。当然,该场景中还可以包括用户终端400,如手机、平板电脑等,该用户终端包括输入装置(如键盘、鼠标等)以及输出装置(如屏幕、功放等)。用户通过输入装置触发选择音频处理装置300处理后的音频数据,通过输出装置播放处理后的音频数据。
以下将分别进行详细说明。
在本发明实施例中,从音频处理装置的角度进行描述,该音频处理装置具体可以集成在服务器或网关等网络设备中,用以实现双音源音频数据处理方法。所述服务器或网关等网络设备可以为一种计算机设备。
本发明实施例提供了一种双音源音频数据处理方法,包括:获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首
歌曲;分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;将两个单声道音频数据合并为一个双声道音频数据;将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
参见图2,图2是本发明实施例提供的一种双音源音频数据的处理方法的流程示意图。所述方法包括:
S201、获取一同源歌曲对的音频数据。
其中,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲,所述演唱不同是指演唱者不同或者演唱语言不同。在本发明实施例中,同源歌曲对可以是同一首歌曲被同一个演唱者用两种不同的语言演唱而得到的两首歌曲,如:陈奕迅用国语演唱的《红玫瑰》和用粤语演唱的《白玫瑰》,《红玫瑰》和《白玫瑰》的演唱语言不同但伴奏相同,可作为一同源歌曲对。在本发明另一实施例中,同源歌曲对也可以是不同的演唱者演唱同一首歌曲而得到的两首歌曲,如Megan Nicole与Alex均演唱了歌曲《Maps》,Megan Nicole演唱的《Maps》和Alex演唱的《Maps》可作为一同源歌曲对。总而言之,需要从歌曲数据库中找到采用了相同伴奏但演唱不同的两首歌曲组成歌曲对。
S202、分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据。
S203、将两个单声道音频数据合并为一个双声道音频数据。
S204、将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
具体地,所述双声道音频对应的播放时间包括每句歌词的时间信息。在本发明实施例中,该时间信息可以是每句歌词的起始时间和结束时间,可替代地,该时间信息也可以是每一句歌词的起始时间和持续时长。
具体地,可以根据每句歌词的时间信息将所述播放时间切分为多个播放时段,所述播放时段的数量与歌词中包含的句子数目一致,达到两种声音各唱一句的效果;也可以将歌词划分为多个段落,按照歌词段落来划分播放时段,即:将一句或多句歌词作为一段,以段落中第一句歌词的起使时间作为该播放时段的起始时间,以段落中最后一句歌词对应的结束时间作为该播放时段的结束时间,来实现两种声音各唱一段的效果。
在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。在本发明实施例中,可以在不同的播放时段对左声道和右声道交替抑制;也可以按照预设规则分别对左声道或右声道进行能量抑制。例如:在连续的多个播放时段内抑制同一声道,或者在某些播放时段内对左声道和右声道都不抑制,或者仅在播放时段的部分时间内对左声道或右声道进行能量抑制。其中,仅在播放时段的部分时间内对左声道或右声道进行能量抑制,可以实现同一播放时段对应的歌曲部分由两种声音交替完成,如声音A唱歌词句子的上半句,声音B唱歌词句子的下半句。
在任意一个播放时段对音频进行能量抑制的具体方法为:
在进入播放时段前的预设时间内,对需要进行能量抑制的声道施加淡出效果;在播放时段内,将需进行能量抑制的声道的音频采样点全部置0;在退出播放时段后的预设时间内,对所述声道施加淡入效果。
进一步地,在执行完步骤S204之后,还可以将处理后的音频数据展示给用户,以供用户聆听。
参见图3,图3是本发明实施例提供的获取同源歌曲对的方法的流程示意图。所述方法包括:
S301、获取候选歌曲对的歌曲信息,所述歌曲信息包括两首歌曲对应的歌词文件和伴奏文件。
在获取候选歌曲对的歌曲信息后,可以根据候选歌曲对的歌曲信息
建立歌曲对列表。具体地,可以通过以下实施例所述的方法获取候选歌曲对的歌曲信息。
在一实施例中,在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索得到的所有歌曲进行两两组合,得到候选歌曲对,进而从歌曲数据库中提取该候选歌曲对的歌曲信息。
例如:在QQ音乐曲库中搜索歌曲名为“爱”的所有歌曲,搜索结果包括小虎队演唱的《爱》、莫文蔚演唱的《爱》及TFBOYS演唱的《爱》,根据两两组合原则,可以将小虎队演唱的《爱》和莫文蔚演唱的《爱》作为一个候选歌曲对,将小虎队演唱的《爱》与TFBOYS演唱的《爱》作为一个候选歌曲对,将莫文蔚演唱的《爱》与TFBOYS演唱的《爱》作为一个候选歌曲对。
在另一实施例中,在同一位歌手的所有歌曲中搜索标注有语言标签的歌曲,语言标签是用于标识歌曲的,一般包括歌曲名和语言版本,将标注有语言标签的一首歌曲和与语言标签中的歌曲名对应的另一首歌曲作为一个候选歌曲对,获取该候选歌曲对的歌曲信息。
例如:在歌手陈奕迅的歌曲列表中,搜索到歌曲《白玫瑰》的语言标签为红玫瑰粤语版,则可以将《白玫瑰》和语言标签对应的《红玫瑰》作为候选歌曲对。
S302、根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对。
具体地,可通过以下方法对候选歌曲对的歌词进行筛选。
首先,分别解析候选歌曲对中两首歌曲的歌词文件,得到每一首歌曲的歌词句子数目及与每一句歌词对应的时间信息,时间信息可以包括每一句歌词的起始时间和结束时间,可替代地,时间信息也可以包括每
句歌词的起始时间和歌词演唱的持续时长;然后判断两首歌曲的歌词句子数目是否相同,若否,则舍弃对应的候选歌曲对,若是,则将两首歌曲的歌词句子一一对应,判断两首歌曲中相对应的歌词的时间信息是否相同,若不同,则舍弃对应的候选歌曲对,若相同,则将所述候选歌曲对作为初选歌曲对。
S303、根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对。
参见图4,可通过以下方法对初选歌曲对的伴奏进行筛选。
S401、分别提取初选歌曲对中两首歌曲的伴奏文件,伴奏文件包括至少一段伴奏音频及与该伴奏音频对应的伴奏时间。
S402、将两首歌曲中伴奏时间相同的伴奏音频一一对应,组成至少一个伴奏对。
S403、分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,并计算两个二进制序列的匹配概率,判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏音频相同。
判断伴奏对的两段伴奏音频是否相同的具体方法包括:分别对两个伴奏音频数据进行傅里叶变换,生成两个频谱;将两个频谱等分为相同数量的频段,分别计算每个频谱中每个频段的平均能量值,比较每个频段与前一频段的大小,将比较结果用二进制表示,得到与所述频谱对应的二进制序列;计算两个二进制序列的匹配概率,所述匹配概率为其中,n表示二进制序列中数码的总个数,m表示两个二进制序列对应数码位的数码相同的个数,判断匹配概率是否大于预设值,若是,则确
定所述伴奏对的两段伴奏音频相同。
S404、判断是否每一个伴奏对的两段伴奏音频都相同,若是,则确定所述初选歌曲对为同源歌曲对。
S304、获取所述同源歌曲对对应的两首歌曲的音频数据。
从歌曲数据库中获取组成同源歌曲对的两首歌曲的音频数据。
上述图3所示的方式仅仅是获取同源歌曲对的诸多方法中的其中一种,不应理解为对本发明的限制。
根据第一实施例所描述的方法,以下将举例作进一步详细说明。
本实施例将双音源音频数据处理方法分为两个主要方面,一方面是筛选同源歌曲对,另一方面是对同源歌曲对的音频数据进行合成处理。下面将分别就上述两方面结合实例进行说明。
同源歌曲对由伴奏相同但演唱不同的两首歌曲组成,筛选同源歌曲对即是需要找到采用了相同伴奏但演唱不同的两首歌曲进行组合。
在本发明实施例中,筛选同源歌曲对可以先从歌曲数据库中查找到候选歌曲对,建立歌曲对列表。其中,从歌曲数据库中查找候选歌曲可以按如下方法进行。
第一种方法,在同一个歌手的歌曲列表中,查看是否有歌曲具有语言标签,如歌曲《白玫瑰》的语言标签为红玫瑰粤语版,则可以将《白玫瑰》和《红玫瑰》作为候选歌曲对,分别记为歌曲x、歌曲y;
第二种方法,在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索得到的所有歌曲进行两两组合,例如,将小虎队演唱的《爱》、莫文蔚演唱的《爱》及TFBOYS演唱的《爱》进行两两组合,得到三个候选歌曲对,每个候选歌曲对的两首歌曲分别记为歌曲x、歌曲y。
对于上述实施例中找到的所有候选歌曲对,因为不一定是采用了相
同的伴奏,如小虎队的《爱》与莫文蔚的《爱》,虽然同名,但并不是同一首歌曲,所以不一定满足伴奏相同的条件,需要对候选歌曲对的伴奏进行筛选。下面将以陈奕迅演唱的《白玫瑰》与《红玫瑰》为例,来对伴奏筛选的具体步骤进行说明。
对于歌曲x、y,分别在歌曲数据库中找到其对应的歌词文件xl、yl,其中歌词文件格式可具体含该歌曲对应的歌词,以及每句歌词对应的时间信息,该时间信息可以是每句歌词的起始时间和结束时间,也可以是每一句歌词的起始时间和持续时长。如图5(a)、(b)分别示出了《红玫瑰》和《白玫瑰》的歌词文件,歌词文件格式为每行前面是起始时间,后面是对应的歌词,可以通过解析歌词文件得到每一句歌词对应的时间信息,如图5所示,除去歌曲名和词曲作者信息之外,《红玫瑰》的第一句歌词与《白玫瑰》的第一句歌词的起始时间均为[00:16.28],结束时间均为[00:18.65],《红玫瑰》的第一句歌词用“aaaaaaaaa”表示,《白玫瑰》的第一句歌词用“bbbbbbbbb”表示,其它歌词用“xx……”表示。通过类似比对,如果两首歌曲对应的歌词句子数相同且每一句歌词的起始时间与结束时间均相同,则判断歌曲x、y符合歌词筛选条件,可以将歌曲x、y组成的候选歌曲对作为初选歌曲对,进入下一轮伴奏筛选,反之则将此候选歌曲对从歌曲对列表中删除。
对初选歌曲对的伴奏进行筛选。一般的歌曲中会有几段时间较长的没有歌词只演奏伴奏的部分,如前奏、两段的间隔与结尾部分,伴奏部分的歌词文件的显示则是只有时间信息没有歌词,如图6中的线框内的内容;《红玫瑰》歌曲中有4段只显示时间而没有歌词的部分,根据这一特点,可直接解析出歌曲在[00:08.61]~[00:16.28]的部分只有伴奏。假设歌曲x、y采用的是同一个伴奏,那么两首歌在没有歌词的部分应是
趋于相同的(由于能量大小、编解码都会对伴奏部分造成影响,完全相同的可能性很低),基于这样的理论可以对歌曲x、y进行音频的对比。在本发明实施例中,对歌曲x、y进行音频的对比的具体步骤如下:
将歌曲x、y对应的歌词文件中的仅有时间信息而无歌词的音频切割出来,如《红玫瑰》则可切割出4个伴奏部分,分别为xii∈(1,4)与yii∈(1,4),由于经过了歌词筛选,所以可以确定,当i确定时,xi与yi的时间相同,由于每一对xi与yi的处理方式相同,后续只以x1与y1作为例子进行说明;
将x1和y1分别解码为8k16bit音频;
以1024个采样点为帧长,以32个采样点为帧移进行傅立叶变换,得到频谱;
将4k的频谱平均分到32个频段,为每个频段计算均值,并将计算出来的均值作为此频段的平均能量;
比较每个频段与上一帧对应频段的大小关系,比前一阵大则为1,小则为0,得到32个bit值代表每一帧,分别对x1和y1执行上述操作,得到与x1和y1对应的两个长度相同的二进制序列;
将x1与y1的二进制序列进行一一对比,计算两个二进制序列的匹配概率,匹配概率可表示为其中,n表示二进制序列中数码的总个数,m表示两个二进制序列对应数码位的数码相同的个数(两个二进制序列对应数码位上的数码同时为0或同时为1,则判断该数码相同),将匹配
概率与预设值k进行比对,若匹配概率不小于预设值k,则认为匹配成功,伴奏x1与y1趋于相同。理论上,若伴奏x1与y1相同,则匹配概率应为1,由于能量大小、编解码都会对伴奏部分造成影响,因而通过设置预设值k来判断伴奏x1与y1的相似度,匹配概率越接近1则伴奏x1与y1相同的可能性越大,故预设值k应趋近于1,例如,k=0.9。
如果一个初选歌曲对的两首歌曲x、y的所有xi与yi均匹配成功,则认为此初选歌曲对的伴奏相同,将此初选歌曲对作为同源歌曲对,反之则将其从歌曲对列表中去除。
经过对上述的歌词筛选和伴奏筛选,歌曲对列表中剩余的歌曲对都为同源歌曲对。为实现同源歌曲双音轮唱的效果,需对同源歌曲对的音频数据进行合成处理,为便于说明,下面以歌曲x、y组成的同源歌曲对的合成处理方法进行说明。
在本发明实施例中,歌曲x、y的合成处理步骤如下:
将歌曲x和歌曲y的音频数据解码为44k16bit单声道音频数据。
将x、y两个单声道音频数据合并为一个双声道音频数据,左右声道可随机分配,这样用户带耳机或音箱听音乐的时候,会发现两个耳朵听到的音乐伴奏相同但演唱不同。
仅通过上述处理合成的音乐听起来会比较乱,为达到轮唱的效果,可以根据歌词信息的时间戳将所有的歌词句子切割为n个时间段tii∈(1,n),其中n为歌词的句子数(仅指演唱的句子数,不包含词曲信息与伴奏的行)。如图5所示的《红玫瑰》,t1为([00:16.28]~[00:18.65])、
t2为([00:18.65]~[00:23.53])…,在ti(i为单数)时间段内对左声道进行能量抑制,在ti(i为双数)时间段内对右声道进行能量抑制,这样在播放时就会产生左右声道不同人声轮流演唱的效果。在本发明实施例中,对于一个ti时间段的抑制分为三个步骤,具体方法如下(以t1为例):
在第一步骤中,淡出:在t1的前1秒产生淡出效果,具体时间段为([00:15.28]~[00:16.28]),由于采样率为44100,所以这一秒内的左声道音频有44100个采样点,能量值为kii∈(1,44100),则淡出后的新能量值为
在第二步骤中,将所有t1时间段[00:16.28]~[00:18.65]内的采样点全部置0;
在第三步骤中,淡入:在t1的后1秒产生淡入效果,具体时间段为([00:18.65]~[00:19.65]),这一秒内的左声道音频有44100个采样点,能量值为kii∈(1,44100),则淡入后的新能量值为
本发明实施例将伴奏相同但演唱不同的两首歌曲作为同源歌曲对,通过对同源歌曲对的音频数据解码得到两个单声道音频数据,将两个单声道音频数据合并成双声道音频数据,并将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,实现了伴奏相同但演唱不同的两首歌曲交替演唱的效果。本发明实施例提供了一种新颖的音乐自动合成方案,为计算机设备
提供了一种新的功能,且兼具有投入成本低、合成效率高的优点。
为便于更好的实施本发明实施例提供的双音源音频数据处理方法,本发明实施例还提供一种双音源音频数据处理装置。其中名词的含义与上述音频数据的处理的方法中相同,具体实现细节可以参考方法实施例中的说明。
请参阅图7,图7为本发明实施例提供的双音源音频数据处理装置的结构示意图,所述装置包括获取模块500以及处理模块600。
其中,所述获取模块500,用于获取一同源歌曲对的音频数据,;
所述处理模块600包括解码子模块610、合并子模块620以及处理子模块630;解码子模块610,用于分别对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;合并子模块620,用于将两个单声道音频数据合并为一个双声道音频数据;处理子模块630,用于将双声道音频对应的播放时间切分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
其中同源歌曲对为伴奏相同但演唱不同的两首歌曲。在本发明实施例中,同源歌曲对可以是同一首歌曲被同一个演唱者用两种不同的语言演唱而得到的两首歌曲,也可以是不同的演唱者演唱同一首歌曲而得到的两首歌曲。参见图8,所述获取同源歌曲对的获取模块500具体包括获取子模块510、歌词筛选子模块520以及伴奏筛选子模块530。其中,获取子模块510用于获取候选歌曲对的歌曲信息,所述歌曲信息包括两首歌曲对应的歌词文件和伴奏文件;歌词筛选子模块520用于根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对;伴奏筛选子模块530用于根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则
确定该初选歌曲对为所述同源歌曲对。
歌词文件是指歌曲中的歌词句子和每句歌词的时间信息,伴奏文件是指歌曲中仅有演奏而无歌词演唱的部分。
参见图9,图9是本发明实施例提供的音频数据处理装置的获取模块500的结构示意图。
作为一种可能的实施方式,所述获取子模块510可以包括第一获取单元511和第二获取单元512;其中,第一获取单元511,用于在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索得到的所有歌曲进行两两组合,得到候选歌曲对;第二获取单元512,用于在同一位歌手的所有歌曲中搜索标注有语言标签的歌曲,所述语言标签包括歌曲名和语言版本,将所述标注有语言标签的一首歌曲和与所述语言标签中的歌曲名对应的另一首歌曲作为一个候选歌曲对。所述歌曲筛选子模块520可以包括解析单元521和歌词筛选单元522,其中,解析单元,用于分别解析候选歌曲对中两首歌曲的歌词文件,得到每一首歌曲的歌词句子数目及与每一句歌词对应的时间信息,所述时间信息包括每一句歌词的起始时间和结束时间;歌词筛选单元,用于判断所述候选歌曲对中两首歌曲的歌词句子数目是否相同,若是,则将两首歌曲的歌词句子一一对应,判断两首歌曲中相对应的歌词的时间信息是否相同,若是,则将所述候选歌曲对作为初选歌曲对。所述伴奏筛选子模块530可以包括提取单元531、映射单元532、伴奏筛选单元533和确定单元534;其中,提取单元531,用于分别提取初选歌曲对中两首歌曲的伴奏文件,所述伴奏文件包括至少一段伴奏音频及与所述伴奏音频对应的伴奏时间;映射单元532,用于将两首歌曲中伴奏时间相同的伴奏音频一一对应,组成至少一个伴奏对,伴奏筛选单元533,用于分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制
序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏音频相同;确定单元534,用于判断是否每一个伴奏对的两段伴奏音频都相同,若是,则确定所述初选歌曲对为所述同源歌曲对。
作为一种可能的实施方式,所述伴奏筛选单元533包括解码子单元5331、频谱生成子单元5332、转换子单元5333和计算子单元5334;其中,
解码子单元5331,用于对每一个伴奏对的两段伴奏音频分别进行解码处理,得到两个伴奏音频数据;频谱生成子单元5332,用于分别对两个伴奏音频数据进行傅里叶变换,生成两个频谱;转换子单元5333,用于将两个频谱等分为相同数量的频段,分别计算每个频谱中每个频段的平均能量值,比较每个频段与前一频段的大小,将比较结果用二进制表示,得到与所述频谱对应的二进制序列;
计算子单元5334,用于计算两个二进制序列的匹配概率,所述匹配概率为其中,n表示二进制序列中数码的总个数,m表示两个二进制序列中对应数码位上的数码相同的个数,判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏音频相同。
进一步地,作为一种可能的实施方式,所述处理子模块630包括切分单元631和处理单元632,其中,切分单元631,用于根据双声道音频中每句歌词的时间信息将双声道音频的播放时间划分为多个播放时段,所述播放时段的数量与歌词的句子数目一致;处理单元632,用于在不同的播放时段交替地对双声道音频的左声道和右声道进行能量抑制。
在对双音源音频数据进行处理之后,还可以将处理得到的音频数据
传输给用户终端,从而展示给用户。
该双音源音频数据处理装置具体可以集成在服务器或网关等网络设备中。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
由上述可知,本实施例提供的双音源音频数据处理装置,首先从歌曲数据库中筛选出伴奏相同但演唱不同的两首歌曲作为同源歌曲对,对同源歌曲对的音频数据执行解码、合并处理,得到双声道音频数据,并将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,从而产生双音轮唱的效果。本发明实施例提供了一种新颖的音乐合成装置,为计算机设备提供了一种新的功能,且兼具合成效率高、成本低的优势。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对双音源音频数据处理方法的详细描述,此处不再赘述。
本发明实施例提供的音频数据处理装置,譬如为计算机、平板电脑、具有触摸功能的手机等等,所述音频数据的处理装置与上文实施例中的音频数据的处理方法属于同一构思,在所述音频数据的处理装置上可以运行所述音频数据的处理方法实施例中提供的任一方法,其具体实现过程详见所述音频数据的处理方法实施例,此处不再赘述。
图10是本发明实施例提供的双音源音频数据的处理装置的硬件结构示意图。如图10所示,该装置可以包括:一个或多个(图中仅示出一个)处理器1001、存储介质1002、总线1003以及接口1004。
其中,存储介质1002可用于存储一个或多个计算机可读指令,例如
软件程序以及模块,如本发明实施例中的双音源音频数据的处理方法和装置对应的程序指令/模块。处理器1001通过总线1003与存储介质1002连接,并运行存储在存储介质1002内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的音频数据的处理方法。存储介质1002可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储介质1002可进一步包括相对于处理器1001远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。接口1004通过总线与处理器1001连接,接收处理器1001的指令,并根据指令接收外部设备的数据或将数据发送给外部设备。
需要说明的是,对本发明实施例所述双音源音频数据处理方法而言,本领域普通技术人员可以理解,实现本发明实施例所述音频数据的处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在图1所示的服务器的存储器中,并被该服务器内的至少一个处理器执行,在执行过程中可包括如所述音频数据的处理方法的实施例的流程。所述服务器可以为一种计算机设备。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
因此,本发明实施例提供了一种非易失性计算机可读存储介质,存储有计算机程序,所述计算机程序能够使计算机执行上述实施例所述的双音源音频数据的处理方法的步骤。
对本发明实施例的所述音频数据的处理装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两
个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本发明实施例所提供的一种双音源音频数据的处理方法及装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。
Claims (21)
- 一种双音源音频数据的处理方法,其特征在于,包括:获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;将两个单声道音频数据合并为一个双声道音频数据;将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
- 根据权利要求1所述的方法,其特征在于,所述获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲,包括:获取候选歌曲对的歌曲信息,所述歌曲信息包括两首歌曲对应的歌词文件和伴奏文件;根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对;根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对;获取所述同源歌曲对对应的两首歌曲的音频数据。
- 根据权利要求2所述的方法,其特征在于,所述获取候选歌曲对的歌曲信息,包括如下步骤之一:在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索 得到的所有歌曲进行两两组合,得到候选歌曲对,并获取所述候选歌曲对的歌曲信息;在同一位歌手的所有歌曲中搜索标注有语言标签的歌曲,所述语言标签包括歌曲名和语言版本,将所述标注有语言标签的一首歌曲和与所述语言标签中的歌曲名对应的另一首歌曲作为一个候选歌曲对,并获取所述候选歌曲对的歌曲信息。
- 根据权利要求2所述的方法,其特征在于,所述根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对,包括:分别解析候选歌曲对中两首歌曲的歌词文件,得到每一首歌曲的歌词句子数目及与每一句歌词对应的时间信息;判断两首歌曲的歌词句子数目是否相同,若是,则将两首歌曲的歌词句子一一对应,判断两首歌曲中相对应的歌词的时间信息是否相同,若是,则将所述候选歌曲对作为初选歌曲对。
- 根据权利要求2所述的方法,其特征在于,所述根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对,包括:分别提取初选歌曲对中两首歌曲的伴奏文件,所述伴奏文件包括至少一段伴奏音频及与所述伴奏音频对应的伴奏时间;将两首歌曲中伴奏时间相同的伴奏音频一一对应,组成至少一个伴奏对,分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏 音频相同;判断是否每一个伴奏对的两段伴奏音频都相同,若是,则确定所述初选歌曲对为所述同源歌曲对。
- 根据权利要求5所述的方法,其特征在于,所述分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏音频对的两段伴奏音频相同,包括:对每一个伴奏对的两段伴奏音频分别进行解码处理,得到两个伴奏音频数据;分别对两个伴奏音频数据进行傅里叶变换,生成两个频谱;将两个频谱等分为相同数量的频段,分别计算每个频谱中每个频段的平均能量值,比较每个频段与前一频段的大小,将比较结果用二进制表示,得到与所述频谱对应的二进制序列;
- 根据权利要求5所述的方法,其特征在于,所述将所述双声道音频对应的播放时间切分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,包括:所述双声道音频对应的播放时间包括每句歌词的时间信息,根据每句歌词的时间信息将所述播放时间切分为多个播放时段,所述播放时段的数量与歌词的句子数目一致;在不同的播放时段交替地对双声道音频的左声道和右声道进行能量 抑制。
- 一种双音源音频数据的处理装置,其特征在于,包括处理器和非易失性存储介质,所述非易失性存储介质存储一个或多个计算机可读指令,所述处理器执行所述计算机可读指令以实现步骤:获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;将两个单声道音频数据合并为一个双声道音频数据;将双声道音频对应的播放时间切分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
- 根据权利要求8所述的装置,其特征在于,所述处理器执行所述计算机可读指令以实现步骤:获取候选歌曲对的歌曲信息,所述歌曲信息包括两首歌曲对应的歌词文件和伴奏文件;根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对;根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对。
- 根据权利要求9所述的装置,其特征在于,所述处理器执行所述计算机可读指令以实现步骤:在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索得到的所有歌曲进行两两组合,得到候选歌曲对;在同一位歌手的所有歌曲中搜索标注有语言标签的歌曲,所述语言标签包括歌曲名和语言版本,将所述标注有语言标签的一首歌曲和与所述语言标签中的歌曲名对应的另一首歌曲作为一个候选歌曲对。
- 根据权利要求9所述的装置,其特征在于,所述处理器执行所述计算机可读指令以实现步骤:分别解析候选歌曲对中两首歌曲的歌词文件,得到每一首歌曲的歌词句子数目及与每一句歌词对应的时间信息;判断所述候选歌曲对中两首歌曲的歌词句子数目是否相同,若是,则将两首歌曲的歌词句子一一对应,判断两首歌曲中相对应的歌词的时间信息是否相同,若是,则将所述候选歌曲对作为初选歌曲对。
- 根据权利要求9所述的装置,其特征在于,所述处理器执行所述计算机可读指令以实现步骤:分别提取初选歌曲对中两首歌曲的伴奏文件,所述伴奏文件包括至少一段伴奏音频及与所述伴奏音频对应的伴奏时间;将两首歌曲中伴奏时间相同的伴奏音频一一对应,组成至少一个伴奏对,分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏音频相同;判断是否每一个伴奏对的两段伴奏音频都相同,若是,则确定所述初选歌曲对为所述同源歌曲对。
- 根据权利要求8所述的装置,其特征在于,所述处理器执行所述计算机可读指令以实现步骤:根据双声道音频中每句歌词的时间信息将双声道音频的播放时间划分为多个播放时段,所述播放时段的数量与歌词的句子数目一致;在不同的播放时段交替地对双声道音频的左声道和右声道进行能量抑制。
- 一种非易失性计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序能够使计算机执行如下步骤:获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲;对所述同源歌曲对的音频数据进行解码处理,得到两个单声道音频数据;将两个单声道音频数据合并为一个双声道音频数据;将双声道音频对应的播放时间划分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制。
- 根据权利要求15所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述获取一同源歌曲对的音频数据,所述同源歌曲对为伴奏相同但演唱不同的两首歌曲的步骤时,使计算机执行如下步骤:获取候选歌曲对的歌曲信息,所述歌曲信息包括两首歌曲对应的歌词文件和伴奏文件;根据候选歌曲对中两首歌曲的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对;根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对;获取所述同源歌曲对对应的两首歌曲的音频数据。
- 根据权利要求16所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述获取候选歌曲对的歌曲信息的步骤时,使计算机执行如下步骤之一:在歌曲数据库中搜索所有歌曲名相同但歌手名不同的歌曲,将搜索得到的所有歌曲进行两两组合,得到候选歌曲对,并获取所述候选歌曲对的歌曲信息;在同一位歌手的所有歌曲中搜索标注有语言标签的歌曲,所述语言标签包括歌曲名和语言版本,将所述标注有语言标签的一首歌曲和与所述语言标签中的歌曲名对应的另一首歌曲作为一个候选歌曲对,并获取所述候选歌曲对的歌曲信息。
- 根据权利要求16所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述根据候选歌曲对中两首歌曲 的歌词文件,对候选歌曲对进行歌词筛选,若候选歌曲对中两首歌曲的歌词文件相同,则确定该候选歌曲对为初选歌曲对的步骤时,使计算机执行如下步骤:分别解析候选歌曲对中两首歌曲的歌词文件,得到每一首歌曲的歌词句子数目及与每一句歌词对应的时间信息;判断两首歌曲的歌词句子数目是否相同,若是,则将两首歌曲的歌词句子一一对应,判断两首歌曲中相对应的歌词的时间信息是否相同,若是,则将所述候选歌曲对作为初选歌曲对。
- 根据权利要求16所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述根据初选歌曲对中两首歌曲的伴奏文件,对初选歌曲对进行伴奏筛选,若初选歌曲对中两首歌曲的伴奏文件相同,则确定该初选歌曲对为所述同源歌曲对的步骤时,使计算机执行如下步骤:分别提取初选歌曲对中两首歌曲的伴奏文件,所述伴奏文件包括至少一段伴奏音频及与所述伴奏音频对应的伴奏时间;将两首歌曲中伴奏时间相同的伴奏音频一一对应,组成至少一个伴奏对,分别对每一个伴奏对的两段伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏对的两段伴奏音频相同;判断是否每一个伴奏对的两段伴奏音频都相同,若是,则确定所述初选歌曲对为所述同源歌曲对。
- 根据权利要求19所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述分别对每一个伴奏对的两段 伴奏音频进行处理,得到与所述伴奏对对应的两个长度相同的二进制序列,计算两个二进制序列的匹配概率,并判断匹配概率是否大于预设值,若是,则确定所述伴奏音频对的两段伴奏音频相同,使计算机执行如下步骤:对每一个伴奏对的两段伴奏音频分别进行解码处理,得到两个伴奏音频数据;分别对两个伴奏音频数据进行傅里叶变换,生成两个频谱;将两个频谱等分为相同数量的频段,分别计算每个频谱中每个频段的平均能量值,比较每个频段与前一频段的大小,将比较结果用二进制表示,得到与所述频谱对应的二进制序列;
- 根据权利要求19所述的非易失性计算机可读存储介质,其特征在于,所述计算机程序在使计算机执行所述将所述双声道音频对应的播放时间切分为多个播放时段,在不同的播放时段对双声道音频的左声道或右声道进行能量抑制,使计算机执行如下步骤:所述双声道音频对应的播放时间包括每句歌词的时间信息,根据每句歌词的时间信息将所述播放时间切分为多个播放时段,所述播放时段的数量与歌词的句子数目一致;在不同的播放时段交替地对双声道音频的左声道和右声道进行能量抑制。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17854792.3A EP3522151B1 (en) | 2016-09-27 | 2017-09-25 | Method and device for processing dual-source audio data |
US16/100,698 US10776422B2 (en) | 2016-09-27 | 2018-08-10 | Dual sound source audio data processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610852918.8A CN106486128B (zh) | 2016-09-27 | 2016-09-27 | 一种双音源音频数据的处理方法及装置 |
CN201610852918.8 | 2016-09-27 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/100,698 Continuation US10776422B2 (en) | 2016-09-27 | 2018-08-10 | Dual sound source audio data processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018059342A1 true WO2018059342A1 (zh) | 2018-04-05 |
Family
ID=58267665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/103106 WO2018059342A1 (zh) | 2016-09-27 | 2017-09-25 | 一种双音源音频数据的处理方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US10776422B2 (zh) |
EP (1) | EP3522151B1 (zh) |
CN (1) | CN106486128B (zh) |
WO (1) | WO2018059342A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111986696A (zh) * | 2020-08-27 | 2020-11-24 | 湖南融视文化创意有限公司 | 一种高效处理歌曲音量均衡的方法 |
CN111986715A (zh) * | 2020-08-19 | 2020-11-24 | 科大讯飞股份有限公司 | 一种录音系统及录音方法 |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106024005B (zh) * | 2016-07-01 | 2018-09-25 | 腾讯科技(深圳)有限公司 | 一种音频数据的处理方法及装置 |
CN106486128B (zh) * | 2016-09-27 | 2021-10-22 | 腾讯科技(深圳)有限公司 | 一种双音源音频数据的处理方法及装置 |
TWI745338B (zh) * | 2017-01-19 | 2021-11-11 | 香港商阿里巴巴集團服務有限公司 | 伴奏音樂的提供方法和裝置 |
CN108694203B (zh) * | 2017-04-11 | 2021-08-13 | 北京雷石天地电子技术有限公司 | 一种连续播放歌曲部分内容的方法和装置 |
CN107506409B (zh) * | 2017-08-09 | 2021-01-08 | 浪潮金融信息技术有限公司 | 一种多音频数据的处理方法 |
CN107665240A (zh) * | 2017-09-01 | 2018-02-06 | 北京雷石天地电子技术有限公司 | 音频文件聚类方法和装置 |
US11487815B2 (en) * | 2019-06-06 | 2022-11-01 | Sony Corporation | Audio track determination based on identification of performer-of-interest at live event |
CN110472094B (zh) * | 2019-08-06 | 2023-03-31 | 沈阳大学 | 一种传统音乐录入方法 |
US11030914B2 (en) * | 2019-08-23 | 2021-06-08 | Hossein Zamanian | Learning device and method |
CN110599989B (zh) * | 2019-09-30 | 2022-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频处理方法、装置及存储介质 |
CN110910917B (zh) * | 2019-11-07 | 2021-08-31 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频片段的拼接方法及装置 |
CN110992970B (zh) * | 2019-12-13 | 2022-05-31 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频合成方法及相关装置 |
CN111599328B (zh) * | 2020-05-22 | 2024-04-09 | 广州酷狗计算机科技有限公司 | 歌曲合成方法、装置、设备及存储介质 |
CN112765396A (zh) | 2021-01-28 | 2021-05-07 | 北京字节跳动网络技术有限公司 | 歌曲的推荐方法、装置、电子设备及存储介质 |
CN113157968B (zh) * | 2021-04-07 | 2024-10-11 | 腾讯音乐娱乐科技(深圳)有限公司 | 获取同旋律音频组方法、终端及存储介质 |
CN113936629B (zh) * | 2021-10-12 | 2024-10-01 | 广州艾美网络科技有限公司 | 音乐文件处理方法和装置、音乐演唱设备 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173689A (zh) * | 1996-08-09 | 1998-02-18 | 雅马哈株式会社 | 有选择地提供和声给二重唱演唱声的卡拉ok装置 |
CN101174409A (zh) * | 2006-10-24 | 2008-05-07 | 诺基亚公司 | 提供多种歌词卡拉ok系统的系统、方法、设备 |
CN101211643A (zh) * | 2006-12-28 | 2008-07-02 | 索尼株式会社 | 音乐编辑装置、方法及程序 |
CN101630507A (zh) * | 2009-08-18 | 2010-01-20 | 深圳华为通信技术有限公司 | 远程卡拉ok的实现方法、装置和系统 |
CN103297805A (zh) * | 2011-12-26 | 2013-09-11 | 索尼公司 | 信息处理装置、方法、程序、记录介质和信息处理系统 |
CN104053120A (zh) * | 2014-06-13 | 2014-09-17 | 福建星网视易信息系统有限公司 | 一种立体声音频的处理方法和装置 |
CN106486128A (zh) * | 2016-09-27 | 2017-03-08 | 腾讯科技(深圳)有限公司 | 一种双音源音频数据的处理方法及装置 |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7191023B2 (en) * | 2001-01-08 | 2007-03-13 | Cybermusicmix.Com, Inc. | Method and apparatus for sound and music mixing on a network |
US8487176B1 (en) * | 2001-11-06 | 2013-07-16 | James W. Wieder | Music and sound that varies from one playback to another playback |
JP2006330533A (ja) * | 2005-05-30 | 2006-12-07 | Roland Corp | 電子楽器 |
JP4364838B2 (ja) * | 2005-06-06 | 2009-11-18 | Kddi株式会社 | 楽曲リミックス可能な音楽再生装置ならびに楽曲リミックス方法およびプログラム |
US20090070420A1 (en) * | 2006-05-01 | 2009-03-12 | Schuyler Quackenbush | System and method for processing data signals |
US8138409B2 (en) * | 2007-08-10 | 2012-03-20 | Sonicjam, Inc. | Interactive music training and entertainment system |
US7985915B2 (en) * | 2007-08-13 | 2011-07-26 | Sanyo Electric Co., Ltd. | Musical piece matching judging device, musical piece recording device, musical piece matching judging method, musical piece recording method, musical piece matching judging program, and musical piece recording program |
WO2009138425A1 (en) * | 2008-05-16 | 2009-11-19 | Tonium Ab | Audio mix instruction file with timing information referring to unique patterns within audio tracks |
US20110126103A1 (en) * | 2009-11-24 | 2011-05-26 | Tunewiki Ltd. | Method and system for a "karaoke collage" |
US9058797B2 (en) * | 2009-12-15 | 2015-06-16 | Smule, Inc. | Continuous pitch-corrected vocal capture device cooperative with content server for backing track mix |
US9601127B2 (en) * | 2010-04-12 | 2017-03-21 | Smule, Inc. | Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s) |
EP2485213A1 (en) * | 2011-02-03 | 2012-08-08 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | Semantic audio track mixer |
US8912419B2 (en) * | 2012-05-21 | 2014-12-16 | Peter Sui Lun Fong | Synchronized multiple device audio playback and interaction |
JP6203003B2 (ja) * | 2012-12-20 | 2017-09-27 | 株式会社東芝 | 信号処理装置、信号処理方法およびプログラム |
US9595932B2 (en) * | 2013-03-05 | 2017-03-14 | Nike, Inc. | Adaptive music playback system |
CN103295568B (zh) * | 2013-05-30 | 2015-10-14 | 小米科技有限责任公司 | 一种异步合唱方法和装置 |
CN104143325B (zh) * | 2014-07-18 | 2016-04-13 | 腾讯科技(深圳)有限公司 | 伴奏/原唱音频数据切换方法和系统 |
US20160078853A1 (en) * | 2014-09-12 | 2016-03-17 | Creighton Strategies Ltd. | Facilitating Online Access To and Participation In Televised Events |
CN104978973B (zh) * | 2014-10-22 | 2019-08-13 | 广州酷狗计算机科技有限公司 | 一种音频处理方法及装置 |
CN104269174B (zh) * | 2014-10-24 | 2018-02-09 | 北京音之邦文化科技有限公司 | 一种音频信号的处理方法及装置 |
CN104966527B (zh) * | 2015-05-27 | 2017-04-19 | 广州酷狗计算机科技有限公司 | K歌处理方法、装置以及k歌处理系统 |
GB2554322B (en) * | 2015-06-03 | 2021-07-14 | Smule Inc | Automated generation of coordinated audiovisual work based on content captured from geographically distributed performers |
GB2581032B (en) * | 2015-06-22 | 2020-11-04 | Time Machine Capital Ltd | System and method for onset detection in a digital signal |
CN105788589B (zh) * | 2016-05-04 | 2021-07-06 | 腾讯科技(深圳)有限公司 | 一种音频数据的处理方法及装置 |
-
2016
- 2016-09-27 CN CN201610852918.8A patent/CN106486128B/zh active Active
-
2017
- 2017-09-25 EP EP17854792.3A patent/EP3522151B1/en active Active
- 2017-09-25 WO PCT/CN2017/103106 patent/WO2018059342A1/zh unknown
-
2018
- 2018-08-10 US US16/100,698 patent/US10776422B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173689A (zh) * | 1996-08-09 | 1998-02-18 | 雅马哈株式会社 | 有选择地提供和声给二重唱演唱声的卡拉ok装置 |
CN101174409A (zh) * | 2006-10-24 | 2008-05-07 | 诺基亚公司 | 提供多种歌词卡拉ok系统的系统、方法、设备 |
CN101211643A (zh) * | 2006-12-28 | 2008-07-02 | 索尼株式会社 | 音乐编辑装置、方法及程序 |
CN101630507A (zh) * | 2009-08-18 | 2010-01-20 | 深圳华为通信技术有限公司 | 远程卡拉ok的实现方法、装置和系统 |
CN103297805A (zh) * | 2011-12-26 | 2013-09-11 | 索尼公司 | 信息处理装置、方法、程序、记录介质和信息处理系统 |
CN104053120A (zh) * | 2014-06-13 | 2014-09-17 | 福建星网视易信息系统有限公司 | 一种立体声音频的处理方法和装置 |
CN106486128A (zh) * | 2016-09-27 | 2017-03-08 | 腾讯科技(深圳)有限公司 | 一种双音源音频数据的处理方法及装置 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111986715A (zh) * | 2020-08-19 | 2020-11-24 | 科大讯飞股份有限公司 | 一种录音系统及录音方法 |
CN111986715B (zh) * | 2020-08-19 | 2024-05-31 | 科大讯飞股份有限公司 | 一种录音系统及录音方法 |
CN111986696A (zh) * | 2020-08-27 | 2020-11-24 | 湖南融视文化创意有限公司 | 一种高效处理歌曲音量均衡的方法 |
CN111986696B (zh) * | 2020-08-27 | 2023-07-07 | 湖南融视文化创意有限公司 | 一种高效处理歌曲音量均衡的方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3522151B1 (en) | 2020-11-11 |
CN106486128A (zh) | 2017-03-08 |
US20180349493A1 (en) | 2018-12-06 |
EP3522151A4 (en) | 2019-10-16 |
CN106486128B (zh) | 2021-10-22 |
EP3522151A1 (en) | 2019-08-07 |
US10776422B2 (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018059342A1 (zh) | 一种双音源音频数据的处理方法及装置 | |
US11456017B2 (en) | Looping audio-visual file generation based on audio and video analysis | |
US10229669B2 (en) | Apparatus, process, and program for combining speech and audio data | |
WO2021083071A1 (zh) | 语音转换、文件生成、播音、语音处理方法、设备及介质 | |
ES2561534T3 (es) | Mezclador de pistas de audio semántico | |
CN106652997B (zh) | 一种音频合成的方法及终端 | |
US8396714B2 (en) | Systems and methods for concatenation of words in text to speech synthesis | |
US8352272B2 (en) | Systems and methods for text to speech synthesis | |
US8583418B2 (en) | Systems and methods of detecting language and natural language strings for text to speech synthesis | |
WO2020113733A1 (zh) | 动画生成方法、装置、电子设备及计算机可读存储介质 | |
US20100082349A1 (en) | Systems and methods for selective text to speech synthesis | |
US20200302112A1 (en) | Speech to text enhanced media editing | |
JP2002014691A (ja) | ソース音声信号内の新規点の識別方法 | |
WO2018121368A1 (zh) | 一种歌词的配乐生成方法和相关装置 | |
CN110675886A (zh) | 音频信号处理方法、装置、电子设备及存储介质 | |
KR20200045852A (ko) | 음성 합성 또는 영상 편집을 통한 멀티미디어 컨텐츠 내 광고 서비스 플랫폼 및 음성 합성 서비스와 영상 편집 서비스를 제공하는 방법 | |
WO2018094952A1 (zh) | 一种内容推荐方法与装置 | |
CN105280206A (zh) | 一种音频的播放方法、装置 | |
Bayle et al. | Kara1k: A karaoke dataset for cover song identification and singing voice analysis | |
JP4697432B2 (ja) | 音楽再生装置、音楽再生方法及び音楽再生用プログラム | |
JP2008242376A (ja) | 楽曲紹介文生成装置、ナレーション付加装置およびプログラム | |
CN103680561A (zh) | 人声信号与其文字说明资料的同步的系统及其方法 | |
JP7335316B2 (ja) | プログラム及び情報処理装置 | |
US20230410848A1 (en) | Method and apparatus of generating audio and video materials | |
TWI276961B (en) | System, method and machine-readable storage medium for synchronization of still image and audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17854792 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017854792 Country of ref document: EP Effective date: 20190429 |