US10410615B2 - Audio information processing method and apparatus - Google Patents
Audio information processing method and apparatus Download PDFInfo
- Publication number
- US10410615B2 US10410615B2 US15/762,841 US201715762841A US10410615B2 US 10410615 B2 US10410615 B2 US 10410615B2 US 201715762841 A US201715762841 A US 201715762841A US 10410615 B2 US10410615 B2 US 10410615B2
- Authority
- US
- United States
- Prior art keywords
- audio
- sound channel
- energy value
- attribute
- subfile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000010365 information processing Effects 0.000 title abstract description 30
- 238000003672 processing method Methods 0.000 title abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000012549 training Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 18
- 238000001228 spectrum Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims 10
- 238000010586 diagram Methods 0.000 description 17
- 238000005070 sampling Methods 0.000 description 12
- 230000003595 spectral effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/06—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
- G10H1/12—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms
- G10H1/125—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms using a digital filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/087—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/005—Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/041—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2230/00—General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
- G10H2230/025—Computing or signal processing architecture features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/055—Filters for musical processing or musical effects; Filter responses, filter architecture, filter coefficients or control parameters therefor
- G10H2250/071—All pole filter, i.e. autoregressive [AR] filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/261—Window, i.e. apodization function or tapering function amounting to the selection and appropriate weighting of a group of samples in a digital signal within some chosen time interval, outside of which it is zero valued
- G10H2250/275—Gaussian window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Definitions
- the present application relates to the information processing technology, and in particular to an audio information processing method and apparatus.
- Audio files with an accompaniment function generally have two sound channels: an original sound channel (having accompaniments and human voices) and an accompanying sound channel, which are switched by a user when he or she is singing Karaoke. Since there is no fixed standard, the audio files acquired from different channels have different versions, the first sound channel of some audio files is an accompaniment while the second sound channel of other audio files is an accompaniment. Thus it is not possible to confirm which sound channel is the accompanying sound channel after these audio files are acquired. Generally, the audio files may be put into use only after being adjusted to a uniform format by artificial recognition or by being automatically resolved by equipment.
- a method comprising decoding a first audio file to acquire a first audio subfile corresponding to a first sound channel and a second audio subfile corresponding to a second sound channel; extracting first audio data from the first audio subfile; extracting second audio data from the second audio subfile; acquiring a first audio energy value of the first audio data; acquiring a second audio energy value of the second audio data; and determining an attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value.
- an apparatus comprising at least one memory configured to store computer program code; and at least one processor configured to access the at least one memory and operate according to the computer program code, said computer program code including decoding code configured to cause at least one of the at least one processor to decode an audio file to acquire a first audio subfile corresponding to a first sound channel and a second audio subfile corresponding to a second sound channel; extracting code configured to cause at least one of the at least one processor to extract first audio data from the first audio subfile and second audio data from the second audio subfile; acquisition code configured to cause at least one of the at least one processor to acquire a first audio energy value of the first audio data and a second audio energy value of the second audio data; and processing code configured to cause at least one of the at least one processor to determine an attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value.
- a non-transitory computer-readable storage medium that stores computer program code that, when executed by a processor of a calculating apparatus, causes the calculating apparatus to execute a method comprising decoding an audio file to acquire a first audio subfile outputted corresponding to a first sound channel and a second audio subfile outputted corresponding to a second sound channel; extracting first audio data from the first audio subfile; extracting second audio data from the second audio subfile; acquiring a first audio energy value of the first audio data; acquiring a second audio energy value of the second audio data; and determining the attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value.
- FIG. 1 is a schematic diagram of dual channel music to be distinguished
- FIG. 2 is a flow diagram of an audio information processing method according an exemplary embodiment
- FIG. 3 is a flow diagram of a method to obtain a Deep Neural Networks (DNN) model through training according an exemplary embodiment
- FIG. 4 is a schematic diagram of the DNN model according an exemplary embodiment
- FIG. 5 is a flow diagram of an audio information processing method according an exemplary embodiment
- FIG. 6 is a flow diagram of Perceptual Linear Predictive (PLP) parameter extraction according an exemplary embodiment
- FIG. 7 may be a flow diagram of an audio information processing method according an exemplary embodiment
- FIG. 8 is a schematic diagram of an a cappella data extraction process according an exemplary embodiment
- FIG. 9 is a flow diagram of an audio information processing method according an exemplary embodiment
- FIG. 10 is a structural diagram of an audio information processing apparatus according an exemplary embodiment.
- FIG. 11 is a structural diagram of a hardware composition of an audio information processing apparatus according an exemplary embodiment.
- Exemplary embodiments acquire the corresponding first audio subfile and second audio subfile by dual-channel decoding of the audio file, then extract the audio data including the first audio data and the second audio data (the first audio data and the second audio data may have a same attribute), and finally determine an attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value, so as to determine a sound channel that meets particular attribute requirements.
- the corresponding accompanying sound channel and original sound channel of the audio file may be distinguished efficiently and accurately, thus solving the problem of high human cost and low efficiency of manpower resolution and low accuracy of equipment automatic resolution.
- An audio information processing method may be achieved through software, hardware, firmware or a combination thereof.
- the software may be, for example, WeSing software, that is, the audio information processing method provided by the present application may be used, for example, in the WeSing software.
- Exemplary embodiments may be applied to distinguish the corresponding accompanying sound channel of the audio file automatically, quickly and accurately based on machine learning.
- Exemplary embodiments decode an audio file to acquire a first audio subfile outputted corresponding to the first sound channel and a second audio subfile outputted corresponding to a second sound channel; extract first audio data from the first audio subfile and second audio data from the second audio subfile; acquire a first audio energy value of the first audio data and a second audio energy value of the second audio data; and determine an attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value so as to determine a sound channel that meets particular attribute requirements.
- FIG. 2 is a flow diagram of the audio information processing method according an exemplary embodiment. As shown in FIG. 2 , the audio information processing method according an exemplary embodiment may include the following steps:
- Step S 201 Decode the audio file to acquire the first audio subfile outputted corresponding to the first sound channel and the second audio subfile outputted corresponding to the second sound channel.
- the audio file herein may be any music file whose accompanying/original sound channels are to be distinguished.
- the first sound channel and the second sound channel may be the left channel and the right channel respectively, and correspondingly, the first audio subfile and the second audio subfile may be the accompanying file and the original file corresponding to the first audio file respectively.
- a song is decoded to acquire the accompanying file or original file representing the left channel output and the original file or accompanying file representing the right channel output.
- Step S 202 Extract the first audio data from the first audio subfile and the second audio data from the second audio subfile.
- the first audio data and the second audio data may have the same attribute, or the two may represent the same attribute. If the two are both human-voice audios, then the human-voice audios are extracted from the first audio subfile and the second audio subfile.
- the specific human-voice extraction method may be any method that may be used to extract human-voice audios from the audio files.
- a Deep Neural Networks (DNN) model may be trained to extract human-voice audios from the audio files, for example, when the first audio file may be a song, if the first audio subfile may be an accompanying audio file and the second audio subfile may be an original audio file, then the DNN model is used to extract the human-voice accompanying data from the accompanying audio file and extract the a cappella data from the original audio file.
- DNN Deep Neural Networks
- Step S 203 Acquire the first audio energy value of the first audio data and the second audio energy value of the second audio data.
- the first audio energy value may be calculated from the first audio data and the second audio energy value may be calculated from the second audio data.
- the first audio energy value may be the average audio energy value of the first audio data
- the second audio energy value may be the average audio energy value of the second audio data.
- different methods may be used to acquire the average audio energy value corresponding to the audio data.
- the audio data may be composed of multiple sampling points, and each sampling point may generally correspond to a value between 0 and 32767, and the average value of all sampling point values may be taken as the average audio energy value corresponding to the audio data. In this way, the average value of all sampling points of the first audio data may be taken as the first audio energy value, and the average value of all sampling points of the second audio data may be taken as the second audio energy value.
- Step S 204 Determine the attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value.
- the sound channel that meets the particular attribute requirements may be the sound channel where the outputted audio of the first audio file is the accompanying audio in the first sound channel and the second sound channel.
- the sound channel that meets the particular attribute requirements may be the sound channel outputting the accompaniment corresponding to the song in left and right channels.
- the difference value between the first audio energy value and the second audio energy value may be determined, if the result shows that the difference value is greater than the threshold and the first audio energy value is less than the second audio energy value, then determine the attribute of the first sound channel as the first attribute and the attribute of the second sound channel as the second attribute, that is to determine the first sound channel as the sound channel outputting accompanying audios and the second sound channel as the sound channel outputting original audios.
- the difference value between the first audio energy value and the second audio energy value is greater than the threshold and the second audio energy value is less than the first audio energy value, then determine the attribute of the second sound channel as the first attribute and the attribute of the first sound channel as the second attribute, that is to determine the second sound channel as the sound channel outputting accompanying audios and the first sound channel as the sound channel outputting original audios.
- the first audio subfile or the second audio subfile corresponding to the first audio energy value or the second audio energy value may be determined as the audio file (i.e. accompanying files) that meets the particular attribute requirements, and the sound channel corresponding to the audio subfile that meets the particular attribute requirements as the sound channel that meets the particular requirements (i.e. sound channel that outputs accompanying files).
- the difference value between the first audio energy value and the second audio energy value is not greater than the audio energy difference threshold, then there may be many human-voice accompaniments in the accompanying audio file in application.
- the frequency spectrum characteristics of accompanying audios and a cappella audios are still different, so human-voice accompanying data may be distinguished from a cappella data according to the frequency spectrum characteristics thereof.
- the accompanying data may be determined finally based on the principle that the average audio energy of the accompanying data is less than that of the a cappella data, and then the result that the sound channel corresponding to the accompanying data is the sound channel that meets the particular attribute requirements is obtained.
- FIG. 3 is a flow diagram of the method to obtain the DNN model through training according an exemplary embodiment. As shown in FIG. 3 , the method to obtain the DNN model through training according an exemplary embodiment may include the following steps:
- Step S 301 Decode the audios in the multiple predetermined audio files respectively to acquire the corresponding multiple Pulse Code Modulation (PCM) audio files.
- PCM Pulse Code Modulation
- the multiple predetermined audio files may be N original songs and corresponding N a cappella songs thereof selected from a song library of WeSing.
- N may be a positive integer and may be greater than 2,000 for the follow-up training.
- There have been tens of thousands of songs with both original and high-quality a cappella data (the a cappella data is mainly selected by a free scoring system, that is to select the a cappella data with a higher score), so all such songs may be collected, from which 10,000 songs may be randomly selected for follow-up operations (here the complexity and accuracy of the follow-up training are mainly considered for the selection).
- PCM pulse code modulation
- Step S 302 Extract the frequency spectrum features from the obtained multiple PCM audio files.
- Step S 303 Train the extracted frequency spectrum features by using the BP algorithm to obtain the DNN model.
- 4 matrices are obtained, including a 2827*2048 dimensional matrix, a 2048*2048 dimensional matrix, a 2048*2048 dimensional matrix and a 2048*257 dimensional matrix.
- FIG. 5 is a flow diagram of the audio information processing method according an exemplary embodiment. As shown in FIG. 5 , the audio information processing method according an exemplary embodiment may include the following steps:
- Step S 501 Decode the audio file to acquire the first audio subfile outputted corresponding to the first sound channel and the second audio subfile outputted corresponding to the second sound channel.
- the audio file herein may be any music file whose accompanying/original sound channels are to be distinguished. If the audio file is a song whose accompanying/original sound channels are to be distinguished, then the first sound channel and the second sound channel may be the left channel and the right channel respectively, and correspondingly, the first audio subfile and the second audio subfile may be the accompanying file and the original file corresponding to the first audio file, respectively.
- the first audio file is a song
- Step S 501 the song is decoded to acquire the accompanying file or original file of the song outputted by the left channel and the original file or accompanying file of the song outputted by the right channel.
- Step S 502 Extract the first audio data from the first audio subfile and the second audio data from the second audio subfile respectively by using the predetermined DNN model.
- the predetermined DNN model may be the DNN model obtained through in-advance training by using the BP algorithm in exemplary embodiment 2 described above or the DNN model obtained through other methods;
- the first audio data and the second audio data may have a same attribute, or the two may represent the same attribute. If the two are both human-voice audios, then the human-voice audios are extracted from the first audio subfile and the second audio subfile by using the DNN model obtained through in-advance training. For example, when the first audio file is a song, if the first audio subfile is an accompanying audio file and the second audio subfile is an original audio file, then the DNN model is used to extract the human-voice accompanying data from the accompanying audio file and the human a cappella data from the original audio file.
- the process of extracting the a cappella data by using the DNN model obtained through training may include the following steps:
- step S 302 of exemplary embodiment 2 Use the method provided in step S 302 of exemplary embodiment 2 to extract the frequency spectrum features
- each frame feature extends 5 frames forward and backward respectively to obtain 11*257 dimensional feature (the operation is not performed for the first 5 frames and the last 5 frames of the audio file), and multiple the input feature by the matrix in each layer of the DNN model obtained through training in the embodiment 2 to finally obtain a 257 dimensional output feature and then obtain m ⁇ 10 frame output feature.
- the first frame extends 5 frames forward and the last frame extends 5 frames backward to obtain m frame output result;
- i denotes 512 dimensions
- j denotes the corresponding frequency band of i, which is 257
- j may correspond to one or two i
- variables z and t correspond to z i and t i obtained in step 2) respectively;
- Step S 503 Acquire the first audio energy value of the first audio data and the second audio energy value of the second audio data.
- the first audio energy value may be calculated from the first audio data
- the second audio energy value may be calculated from the second audio data.
- the first audio energy value may be the average audio energy value of the first audio data
- the second audio energy value may be the average audio energy value of the second audio data.
- different methods may be used to acquire the average audio energy value corresponding to the audio data.
- the audio data is composed of multiple sampling points, and each sampling point generally corresponds to a value between 0 and 32767, and the average value of all sampling point values is taken as the average audio energy value corresponding to the audio data.
- the average value of all sampling points of the first audio data may be taken as the first audio energy value
- the average value of all sampling points of the second audio data may be taken as the second audio energy value.
- Step S 504 Determine whether the difference value between the first audio energy value and the second audio energy value is greater than the predetermined threshold or not. If yes, proceed to step S 505 ; otherwise, proceed to step S 506 .
- a threshold i.e. audio energy difference threshold
- the audio energy difference threshold may be predetermined. Specifically, the threshold may be set experimentally according to the actual use. For example, the threshold may be set as 486. If the difference value between the first audio energy value and the second audio energy value is greater than the audio energy difference threshold, the sound channel corresponding to the sound channel whose audio energy value is smaller is determined as the accompanying sound channel.
- Step S 505 if the first audio energy value is less than the second audio energy value, then determine the attribute of the first sound channel as the first attribute, and if the second audio energy value is less than the first audio energy value, then determine the attribute of the second sound channel as the first attribute.
- determining the first audio energy value and the second audio energy value If the first audio energy value is less than the second audio energy value, then determine the attribute of the first sound channel as the first attribute and the attribute of the second sound channel as the second attribute, that is to determine the first sound channel as the sound channel outputting accompanying audios and the second sound channel as the sound channel outputting original audios. If the second audio energy value is less than the first audio energy value, then determine the attribute of the second sound channel as the first attribute and the attribute of the first sound channel as the second attribute, that is to determine the second sound channel as the sound channel outputting accompanying audios and the first sound channel as the sound channel outputting original audios.
- the audio file that meets the particular attribute requirements may be determined as the audio file that meets the particular attribute requirements, and the sound channel corresponding to the audio subfile that meets the particular attribute requirements as the sound channel that meets the particular requirements.
- the audio file that meets the particular attribute requirements is the accompanying audio file corresponding to the first audio file
- the sound channel that meets the particular requirements is the sound channel where the outputted audio of the first audio file is the accompanying audio in the first sound channel and the second sound channel.
- Step S 506 Assign attribute to the first sound channel and/or the second sound channel by using the predetermined GMM.
- the predetermined GMM model is obtained through in-advance training, and the specific training process includes the following:
- PLP Perceptual Linear Predictive
- the determined sound channel is the sound channel that preliminarily meets the particular attribute requirements.
- Step S 507 Determine the first audio energy value and the second audio energy value. If the first attribute is assigned to the first sound channel and the first audio energy value is less than the second audio energy value, or the first attribute is assigned to the second sound channel and the second audio energy value is less than the first audio energy value, proceed to step S 508 ; otherwise proceed to step S 509 .
- step S 508 determines whether the audio energy value corresponding to the sound channel that preliminarily meets the particular attribute requirements is less than the audio energy value corresponding to the other sound channel or not. If yes, proceed to step S 508 ; otherwise proceed to step S 509 .
- the audio energy value corresponding to the sound channel that preliminarily meets the particular attribute requirements is exactly the audio energy value of the audio file outputted by the sound channel.
- Step S 508 If the first attribute is assigned to the first sound channel and the first audio energy value is less than the second audio energy value, determine the attribute of the first sound channel as the first attribute and the attribute of the second sound channel as the second attribute, that is to determine the first sound channel as the sound channel outputting accompanying audio and the second sound channel as the sound channel outputting original audio. If the first attribute is assigned to the second sound channel and the second audio energy value is less than the first audio energy value, determine the attribute of the second sound channel as the first attribute and the attribute of the first sound channel as the second attribute, that is to determine the second sound channel as the sound channel outputting accompanying audio and the first sound channel as the sound channel outputting original audio.
- the sound channel that preliminarily meets the particular attribute requirements may be determined as the sound channel that meets the particular attribute requirements which is the sound channel outputting accompanying audio.
- the method may further include the following steps after Step S 508 :
- the sound channel that meets the particular attribute requirements may be the sound channel outputting accompanying audio.
- the sound channel outputting accompanying audio such as the first sound channel
- the sound channel is labeled as the accompanying audio sound channel.
- a user may switch between accompaniments and originals based on the labeled sound channel when the user is singing karaoke;
- Step S 509 Output the prompt message.
- the prompt message may be used to prompt the user that the corresponding sound channel outputting accompanying audio of the first audio file cannot be distinguished, so that the user can confirm that the corresponding sound channel outputs accompanying audio manually.
- the attributes of the first sound channel and the second sound channel need to be confirmed artificially.
- the human-voice component from the music by using the trained DNN model, and then obtain the final classification result through comparison of dual-channel human-voice energy.
- the accuracy of the final classification may reach 99% or above.
- FIG. 7 is a flow diagram of an audio information processing method according an exemplary embodiment. As shown in FIG. 7 , the audio information processing method according an exemplary embodiment may include the following steps:
- Step S 701 Extract the dual-channel a cappella data (and/or human-voice accompanying data) of the music to be detected by using the DNN model trained in advance.
- FIG. 8 A specific process of extracting the a cappella data is shown in FIG. 8 .
- Step S 702 Calculate the average audio energy value of the extracted dual-channel a cappella (and/or human-voice accompanying) data respectively.
- Step S 703 Determine whether the audio energy difference value of the dual-channel a cappella (and/or human-voice accompanying) data is greater than the predetermined threshold or not. If yes, proceed to step S 704 ; otherwise, proceed to step S 705 .
- Step S 704 Determine the sound channel corresponding to the a cappella (and/or human-voice accompanying) data with a smaller average audio energy value as the accompanying sound channel.
- Step S 705 Classify the music to be detected with dual-channel output by using the GMM trained in advance.
- Step S 706 Determine whether the audio energy value corresponding to the sound channel that is classified as accompanying audio is smaller or not. If yes, proceed to step S 707 ; otherwise, proceed to step S 708 .
- Step S 707 Determine the sound channel with a smaller audio energy value as the accompanying sound channel.
- Step S 708 Output the prompt message to use manual confirmation.
- the dual-channel a cappella (and/or human-voice accompanying) data may be extracted while the accompanying audio sound channel is determined by using the GMM, and then a regression function is used to execute the above steps 703 - 708 .
- the operations in step S 705 have been executed in advance, so such operations may be skipped when the regression function is used, as shown in FIG. 9 .
- FIG. 9 conduct dual-channel decoding on the music to be classified (i.e. music to be detected).
- use the a cappella training data to obtain the DNN model through training and use the accompanying human-voice training data to obtain the GMM model through training.
- FIG. 10 is a structural diagram of the composition of the audio information processing apparatus according an exemplary embodiment.
- the composition of the audio information processing apparatus according an exemplary embodiment includes a decoding module 11 , an extracting module 12 , an acquisition module 13 and a processing module 14 ;
- the decoding module 11 being configured to decode the audio file (i.e. the first audio file) to acquire the first audio subfile outputted corresponding to first sound channel and the second audio subfile outputted corresponding to the second sound channel;
- the extracting module 12 being configured to extract the first audio data from the first audio subfile and the second audio data from the second audio subfile;
- the acquisition module 13 being configured to acquire the first audio energy value of the first audio data and the second audio energy value of the second audio data
- the processing module 14 being configured to determine the attribute of at least one of the first sound channel and the second sound channel based on the first audio energy value and the second audio energy value.
- the first audio data and the second audio data may have a same attribute.
- the first audio data may correspond to the human-voice audio outputted by the first sound channel and the second audio data may correspond to the human-voice audio outputted by the second sound channel;
- the processing module 14 may be configured to determine which one of the first sound channel and the second sound channel is the sound channel outputting accompanying audio based on the first audio energy value of the human-voice audio outputted by the first sound channel and the second audio energy value of the human-voice audio outputted by the second sound channel.
- the apparatus may further comprise a first model training module 15 configured to extract the frequency spectrum features of the multiple predetermined audio files respectively;
- the extracting module 12 may be further configured to extract the first audio data from the first audio subfile and the second audio data from the second audio subfile respectively by using the DNN model.
- the processing module 14 may be configured to determine the difference value between the first audio energy value and the second audio energy value. If the difference value is greater than the threshold (e.g. an audio energy difference threshold) and the first audio energy value is less than the second audio energy value, then determine the attribute of the first sound channel as the first attribute and the attribute of the second sound channel as the second attribute, that is to determine the first sound channel as the sound channel outputting accompanying audio and the second sound channel as the sound channel outputting original audio.
- the threshold e.g. an audio energy difference threshold
- the difference value between the first audio energy value and the second audio energy value is greater than the threshold and the second audio energy value is less than the first audio energy value, then determine the attribute of the second sound channel as the first attribute and the attribute of the first sound channel as the second attribute, that is to determine the second sound channel as the sound channel outputting accompanying audio and the first sound channel as the sound channel outputting original audio.
- the processing module 14 detects that the difference value between the first audio energy value and the second audio energy value is greater than the audio energy difference threshold, the first audio subfile or the second audio subfile corresponding to the first audio energy value or the second audio energy value (whichever is smaller) is determined as the audio file that meets the particular attribute requirements, and the sound channel corresponding to the audio subfile that meets the particular attribute requirements as the sound channel that meets the particular requirements;
- the classification method is used to assign attribute to at least one of the first sound channel and the second sound channel, so as to preliminarily determine which one of the first sound channel and the second sound channel is the sound channel that meets the particular attribute requirements.
- the apparatus may further comprise a second model training module 16 being configured to extract the Perceptual Linear Predictive (PLP) characteristic parameters of multiple audio files;
- PLP Perceptual Linear Predictive
- GMM Gaussian Mixture Model
- EM Expectation Maximization
- the processing module 14 may be further configured to assign an attribute to at least one of the first sound channel and the second sound channel by using the GMM obtained through training, so as to preliminarily determine the first sound channel or the second sound channel as the sound channel that preliminarily meets the particular attribute requirements.
- the processing module 14 may be configured to determine the first audio energy value and the second audio energy value. If the first attribute is assigned to the first sound channel and the first audio energy value is less than the second audio energy value, or the first attribute is assigned to the second sound channel and the second audio energy value is less than the first audio energy value. This is also to preliminarily determine whether the audio energy value corresponding to the sound channel that meets the particular attribute requirements is less than the audio energy value corresponding to the other sound channel or not;
- the sound channel that preliminarily meets the particular attribute requirements if the result shows that the audio energy value corresponding to the sound channel that preliminarily meets the particular attribute requirements is less than the audio energy value corresponding to the other sound channel, determine the sound channel that preliminarily meets the particular attribute requirements as the sound channel that meets the particular attribute requirements.
- the processing module 14 may be further configured to output a prompt message when the result shows that the audio energy value corresponding to the sound channel that preliminarily meets the particular attribute requirements is not less than the audio energy value corresponding to the other sound channel.
- the decoding module 11 , the extracting module 12 , the acquisition module 13 , the processing module 14 , the first model training module 15 and the second model training module 16 in the audio information processing apparatus may be achieved through a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC) in the apparatus.
- CPU Central Processing Unit
- DSP Digital Signal Processor
- FPGA Field Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- FIG. 11 is a structural diagram of the hardware composition of the audio information processing apparatus according an exemplary embodiment.
- the apparatus S 11 is shown as FIG. 11 .
- the apparatus S 11 may include a processor 111 , a storage medium 112 and at least one external communication interface 113 ; and the processor 111 , the storage medium 112 and the external communication interface 113 may be connected through a bus 114 .
- the audio information processing apparatus may be a mobile phone, a desktop computer, a PC or an all-in-one machine.
- the audio information processing method may also be achieved through the operations of a server.
- the audio information processing apparatus may be a terminal or a server.
- the audio information processing method according to an exemplary embodiment is not limited to being used in the terminal, instead, the audio information processing method may also be used in a server such as a web server or a server corresponding to music application software (e.g. WeSing software).
- a server such as a web server or a server corresponding to music application software (e.g. WeSing software).
- WeSing software e.g. WeSing software
- the foregoing computer program code may be stored in a computer-readable storage medium, and a computer may execute the steps including the above exemplary embodiments during execution; and the foregoing storage medium may include a mobile storage device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a disk, a disc or other media that can store program codes.
- RAM Random Access Memory
- ROM Read-Only Memory
- the software functional module(s) may also be stored in a computer-readable storage medium.
- the technical solution according exemplary embodiments essentially or the part contributing to the related technology may be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes several instructions used to allow a computer device (which may be a personal computer, a server or a network device) to execute the whole or part of the method provided by each exemplary embodiment of the present application.
- the foregoing storage medium includes a mobile storage device, an RAM, an ROM, a disk, a disc or other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
- Stereophonic System (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Description
to obtain 512 dimensional frequency spectrum feature, where i denotes 512 dimensions, j denotes the corresponding frequency band of i, which is 257, and j may correspond to one or two i, and variables z and t correspond to zi and ti obtained in step 2) respectively;
Claims (20)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610157251.XA CN105741835B (en) | 2016-03-18 | 2016-03-18 | A kind of audio-frequency information processing method and terminal |
CN201610157251.X | 2016-03-18 | ||
CN201610157251 | 2016-03-18 | ||
PCT/CN2017/076939 WO2017157319A1 (en) | 2016-03-18 | 2017-03-16 | Audio information processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180293969A1 US20180293969A1 (en) | 2018-10-11 |
US10410615B2 true US10410615B2 (en) | 2019-09-10 |
Family
ID=56251827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/762,841 Active US10410615B2 (en) | 2016-03-18 | 2017-03-16 | Audio information processing method and apparatus |
Country Status (6)
Country | Link |
---|---|
US (1) | US10410615B2 (en) |
JP (1) | JP6732296B2 (en) |
KR (1) | KR102128926B1 (en) |
CN (1) | CN105741835B (en) |
MY (1) | MY185366A (en) |
WO (1) | WO2017157319A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180350392A1 (en) * | 2016-06-01 | 2018-12-06 | Tencent Technology (Shenzhen) Company Limited | Sound file sound quality identification method and apparatus |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105741835B (en) * | 2016-03-18 | 2019-04-16 | 腾讯科技(深圳)有限公司 | A kind of audio-frequency information processing method and terminal |
CN106448630B (en) * | 2016-09-09 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Method and device for generating digital music score file of song |
CN106375780B (en) * | 2016-10-20 | 2019-06-04 | 腾讯音乐娱乐(深圳)有限公司 | A kind of multimedia file producting method and its equipment |
CN108461086B (en) * | 2016-12-13 | 2020-05-15 | 北京唱吧科技股份有限公司 | Real-time audio switching method and device |
CN110085216A (en) * | 2018-01-23 | 2019-08-02 | 中国科学院声学研究所 | A kind of vagitus detection method and device |
CN108231091B (en) * | 2018-01-24 | 2021-05-25 | 广州酷狗计算机科技有限公司 | Method and device for detecting whether left and right sound channels of audio are consistent |
US10522167B1 (en) * | 2018-02-13 | 2019-12-31 | Amazon Techonlogies, Inc. | Multichannel noise cancellation using deep neural network masking |
CN109102800A (en) * | 2018-07-26 | 2018-12-28 | 广州酷狗计算机科技有限公司 | A kind of method and apparatus that the determining lyrics show data |
CN111061909B (en) * | 2019-11-22 | 2023-11-28 | 腾讯音乐娱乐科技(深圳)有限公司 | Accompaniment classification method and accompaniment classification device |
CN113420771B (en) * | 2021-06-30 | 2024-04-19 | 扬州明晟新能源科技有限公司 | Colored glass detection method based on feature fusion |
CN113744708B (en) * | 2021-09-07 | 2024-05-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Model training method, audio evaluation method, device and readable storage medium |
CN114615534A (en) * | 2022-01-27 | 2022-06-10 | 海信视像科技股份有限公司 | Display device and audio processing method |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0916189A (en) | 1995-04-18 | 1997-01-17 | Texas Instr Inc <Ti> | Karaoke marking method and karaoke device |
US5736943A (en) * | 1993-09-15 | 1998-04-07 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Method for determining the type of coding to be selected for coding at least two signals |
JP2003330497A (en) | 2002-05-15 | 2003-11-19 | Matsushita Electric Ind Co Ltd | Method and device for encoding audio signal, encoding and decoding system, program for executing encoding, and recording medium with the program recorded thereon |
US20040074378A1 (en) * | 2001-02-28 | 2004-04-22 | Eric Allamanche | Method and device for characterising a signal and method and device for producing an indexed signal |
US20040094019A1 (en) * | 2001-05-14 | 2004-05-20 | Jurgen Herre | Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function |
US20040125961A1 (en) * | 2001-05-11 | 2004-07-01 | Stella Alessio | Silence detection |
JP2005201966A (en) | 2004-01-13 | 2005-07-28 | Daiichikosho Co Ltd | Karaoke machine for automatically controlling background chorus sound volume |
US20070131095A1 (en) * | 2005-12-10 | 2007-06-14 | Samsung Electronics Co., Ltd. | Method of classifying music file and system therefor |
US20070180980A1 (en) * | 2006-02-07 | 2007-08-09 | Lg Electronics Inc. | Method and apparatus for estimating tempo based on inter-onset interval count |
US20080187153A1 (en) * | 2005-06-17 | 2008-08-07 | Han Lin | Restoring Corrupted Audio Signals |
CN101577117A (en) | 2009-03-12 | 2009-11-11 | 北京中星微电子有限公司 | Extracting method of accompaniment music and device |
US7630500B1 (en) * | 1994-04-15 | 2009-12-08 | Bose Corporation | Spatial disassembly processor |
CN101894559A (en) | 2010-08-05 | 2010-11-24 | 展讯通信(上海)有限公司 | Audio processing method and device thereof |
US20110081024A1 (en) * | 2009-10-05 | 2011-04-07 | Harman International Industries, Incorporated | System for spatial extraction of audio signals |
US8378964B2 (en) * | 2006-04-13 | 2013-02-19 | Immersion Corporation | System and method for automatically producing haptic events from a digital audio signal |
US20130121511A1 (en) * | 2009-03-31 | 2013-05-16 | Paris Smaragdis | User-Guided Audio Selection from Complex Sound Mixtures |
US8489403B1 (en) * | 2010-08-25 | 2013-07-16 | Foundation For Research and Technology—Institute of Computer Science ‘FORTH-ICS’ | Apparatuses, methods and systems for sparse sinusoidal audio processing and transmission |
US20160049162A1 (en) * | 2013-03-21 | 2016-02-18 | Intellectual Discovery Co., Ltd. | Audio signal size control method and device |
CN105741835A (en) | 2016-03-18 | 2016-07-06 | 腾讯科技(深圳)有限公司 | Audio information processing method and terminal |
US20160254001A1 (en) * | 2013-11-27 | 2016-09-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Decoder, encoder, and method for informed loudness estimation in object-based audio coding systems |
-
2016
- 2016-03-18 CN CN201610157251.XA patent/CN105741835B/en active Active
-
2017
- 2017-03-16 KR KR1020187010355A patent/KR102128926B1/en active IP Right Grant
- 2017-03-16 MY MYPI2018701314A patent/MY185366A/en unknown
- 2017-03-16 WO PCT/CN2017/076939 patent/WO2017157319A1/en active Application Filing
- 2017-03-16 US US15/762,841 patent/US10410615B2/en active Active
- 2017-03-16 JP JP2018521411A patent/JP6732296B2/en active Active
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5736943A (en) * | 1993-09-15 | 1998-04-07 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Method for determining the type of coding to be selected for coding at least two signals |
US7630500B1 (en) * | 1994-04-15 | 2009-12-08 | Bose Corporation | Spatial disassembly processor |
JPH0916189A (en) | 1995-04-18 | 1997-01-17 | Texas Instr Inc <Ti> | Karaoke marking method and karaoke device |
US5719344A (en) * | 1995-04-18 | 1998-02-17 | Texas Instruments Incorporated | Method and system for karaoke scoring |
US20040074378A1 (en) * | 2001-02-28 | 2004-04-22 | Eric Allamanche | Method and device for characterising a signal and method and device for producing an indexed signal |
US20040125961A1 (en) * | 2001-05-11 | 2004-07-01 | Stella Alessio | Silence detection |
US20040094019A1 (en) * | 2001-05-14 | 2004-05-20 | Jurgen Herre | Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function |
JP2003330497A (en) | 2002-05-15 | 2003-11-19 | Matsushita Electric Ind Co Ltd | Method and device for encoding audio signal, encoding and decoding system, program for executing encoding, and recording medium with the program recorded thereon |
JP2005201966A (en) | 2004-01-13 | 2005-07-28 | Daiichikosho Co Ltd | Karaoke machine for automatically controlling background chorus sound volume |
US20080187153A1 (en) * | 2005-06-17 | 2008-08-07 | Han Lin | Restoring Corrupted Audio Signals |
US20070131095A1 (en) * | 2005-12-10 | 2007-06-14 | Samsung Electronics Co., Ltd. | Method of classifying music file and system therefor |
US20070180980A1 (en) * | 2006-02-07 | 2007-08-09 | Lg Electronics Inc. | Method and apparatus for estimating tempo based on inter-onset interval count |
US8378964B2 (en) * | 2006-04-13 | 2013-02-19 | Immersion Corporation | System and method for automatically producing haptic events from a digital audio signal |
CN101577117A (en) | 2009-03-12 | 2009-11-11 | 北京中星微电子有限公司 | Extracting method of accompaniment music and device |
US20130121511A1 (en) * | 2009-03-31 | 2013-05-16 | Paris Smaragdis | User-Guided Audio Selection from Complex Sound Mixtures |
US20110081024A1 (en) * | 2009-10-05 | 2011-04-07 | Harman International Industries, Incorporated | System for spatial extraction of audio signals |
CN101894559A (en) | 2010-08-05 | 2010-11-24 | 展讯通信(上海)有限公司 | Audio processing method and device thereof |
US8489403B1 (en) * | 2010-08-25 | 2013-07-16 | Foundation For Research and Technology—Institute of Computer Science ‘FORTH-ICS’ | Apparatuses, methods and systems for sparse sinusoidal audio processing and transmission |
US20160049162A1 (en) * | 2013-03-21 | 2016-02-18 | Intellectual Discovery Co., Ltd. | Audio signal size control method and device |
US20160254001A1 (en) * | 2013-11-27 | 2016-09-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Decoder, encoder, and method for informed loudness estimation in object-based audio coding systems |
CN105741835A (en) | 2016-03-18 | 2016-07-06 | 腾讯科技(深圳)有限公司 | Audio information processing method and terminal |
Non-Patent Citations (3)
Title |
---|
Communication dated Jun. 17, 2019, from the Japanese Patent Office in counterpart application No. 2018-521411. |
Eric's Memo Pad, "KTV Automatic Sound Channel Judgment", http://ericpeng1968.blogspot.com/2015/08/ktv_5.html, Aug. 5, 2015. |
International Search Report for PCT/CN2017/076939 dated Jun. 20, 2017. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180350392A1 (en) * | 2016-06-01 | 2018-12-06 | Tencent Technology (Shenzhen) Company Limited | Sound file sound quality identification method and apparatus |
US10832700B2 (en) * | 2016-06-01 | 2020-11-10 | Tencent Technology (Shenzhen) Company Limited | Sound file sound quality identification method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN105741835B (en) | 2019-04-16 |
JP6732296B2 (en) | 2020-07-29 |
US20180293969A1 (en) | 2018-10-11 |
KR102128926B1 (en) | 2020-07-01 |
JP2019502144A (en) | 2019-01-24 |
MY185366A (en) | 2021-05-11 |
CN105741835A (en) | 2016-07-06 |
KR20180053714A (en) | 2018-05-23 |
WO2017157319A1 (en) | 2017-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10410615B2 (en) | Audio information processing method and apparatus | |
US10789290B2 (en) | Audio data processing method and apparatus, and computer storage medium | |
CN109599093B (en) | Intelligent quality inspection keyword detection method, device and equipment and readable storage medium | |
CN105244026B (en) | A kind of method of speech processing and device | |
CN106486128B (en) | Method and device for processing double-sound-source audio data | |
US9368116B2 (en) | Speaker separation in diarization | |
US20150356967A1 (en) | Generating Narrative Audio Works Using Differentiable Text-to-Speech Voices | |
WO2022203699A1 (en) | Unsupervised parallel tacotron non-autoregressive and controllable text-to-speech | |
CN112037764B (en) | Method, device, equipment and medium for determining music structure | |
CN107680584B (en) | Method and device for segmenting audio | |
EP4425482A2 (en) | Model training and tone conversion method and apparatus, device, and medium | |
CN108764114B (en) | Signal identification method and device, storage medium and terminal thereof | |
WO2024055752A9 (en) | Speech synthesis model training method, speech synthesis method, and related apparatuses | |
US12093314B2 (en) | Accompaniment classification method and apparatus | |
Petermann et al. | Tackling the cocktail fork problem for separation and transcription of real-world soundtracks | |
Mandel et al. | Audio super-resolution using concatenative resynthesis | |
CN112712793A (en) | ASR (error correction) method based on pre-training model under voice interaction and related equipment | |
JP6220733B2 (en) | Voice classification device, voice classification method, and program | |
CN114783410A (en) | Speech synthesis method, system, electronic device and storage medium | |
CN113825009B (en) | Audio and video playing method and device, electronic equipment and storage medium | |
Reddy et al. | MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection | |
CN114822492B (en) | Speech synthesis method and device, electronic equipment and computer readable storage medium | |
Ramona et al. | Comparison of different strategies for a SVM-based audio segmentation | |
US20240071367A1 (en) | Automatic Speech Generation and Intelligent and Robust Bias Detection in Automatic Speech Recognition Model | |
CN116229989A (en) | Method, device and equipment for analyzing voice and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, CHI Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ZHAO, WEIFENG;REEL/FRAME:045332/0653 Effective date: 20180313 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |