CN108962277A - Speech signal separation method, apparatus, computer equipment and storage medium - Google Patents
Speech signal separation method, apparatus, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108962277A CN108962277A CN201810802835.7A CN201810802835A CN108962277A CN 108962277 A CN108962277 A CN 108962277A CN 201810802835 A CN201810802835 A CN 201810802835A CN 108962277 A CN108962277 A CN 108962277A
- Authority
- CN
- China
- Prior art keywords
- frequency spectrum
- audio
- audio signal
- frame
- accompaniment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 36
- 238000001228 spectrum Methods 0.000 claims abstract description 131
- 230000005236 sound signal Effects 0.000 claims abstract description 71
- 238000005070 sampling Methods 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 18
- 238000006243 chemical reaction Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 238000009432 framing Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 abstract description 15
- 230000003595 spectral effect Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005204 segregation Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Reverberation, Karaoke And Other Acoustics (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The invention discloses a kind of speech signal separation method, apparatus, computer equipment and storage mediums, belong to field of voice signal.The described method includes: sampling to the acoustic waveform of audio file to be separated, audio signal is obtained;Audio signal is converted from time domain to frequency domain, obtains the frequency spectrum of audio signal, frequency spectrum is only used for indicating the amplitude of audio signal and amplitude is real number;The frequency spectrum of audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;Accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.The transformation algorithm of the amplitude of audio frame is only indicated when the present invention is using conversion with real number, to carry out the transformation of time domain to frequency domain and frequency domain to time domain, since transformation front and back will not convert phase, phase information is not suffered a loss, therefore, accompaniment and voice are separated from audio file based on this conversion regime, avoids the phase distortion problem of Fourier transformation spectral decomposition.
Description
Technical field
The present invention relates to Speech signal processing field, in particular to a kind of speech signal separation method, apparatus, computer are set
Standby and storage medium.
Background technique
With the continuous development of voice process technology, speech signal separation has obtained extensively in people's daily life
General application.For example, user when using some K song software wants that accompaniment is combined to record the song that oneself is sung, then just needing
The accompanying song provided using server, the quality of accompaniment directly affect the effect for recording finished product to the end.Therefore, how to carry out
Speech signal separation, it is most important for the quality for promoting audio accompaniment to obtain audio accompaniment and voice audio.
Currently, can be related to turning audio signal from time domain with Fourier transformation when carrying out speech signal separation
The process of frequency domain is shifted to, the available complex spectrum of the process.It is thus possible to be divided by being decomposed to complex spectrum
The accompaniment frequency spectrum and voice frequency spectrum separated out, then by Fourier inversion, obtain audio accompaniment and voice audio.
In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems: due to plural number
When frequency spectrum is decomposed, merely with amplitude frequency spectrum, the phenomenon that there are phase distortions so as to cause the audio accompaniment isolated.
Summary of the invention
The embodiment of the invention provides a kind of speech signal separation method, apparatus, computer equipment and storage medium, energy
Enough solve the problems, such as the phase distortion of speech signal separation.The technical solution is as follows:
On the one hand, a kind of speech signal separation method is provided, this method comprises:
The acoustic waveform of audio file to be separated is sampled, audio signal is obtained;
The audio signal is converted from time domain to frequency domain, the frequency spectrum of the audio signal is obtained, which is only used for indicating to be somebody's turn to do
The amplitude of audio signal and the amplitude are real number;
The frequency spectrum of the audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;
The accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.
In a kind of possible implementation, this converts the audio signal to frequency domain from time domain, obtains the audio signal
Frequency spectrum, comprising:
The audio signal is subjected to sub-frame processing, obtains multiple audio frames;
Multiple audio frame is converted from time domain to frequency domain respectively, obtains the frequency spectrum of multiple audio frame, each audio frame
Frequency spectrum be only used for indicating the amplitude of the audio frame and amplitude is real number;
The frequency spectrum of multiple audio frame is combined, the frequency spectrum of the audio signal is obtained.
In a kind of possible implementation, which is carried out sub-frame processing by this, obtains multiple audio frames, comprising:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible implementation, the length of the default window function is identical as the sampling number of each audio frame.
In a kind of possible implementation, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible implementation, this decomposes the frequency spectrum of the audio signal, obtains accompaniment frequency spectrum and voice
Frequency spectrum, comprising:
Preset decomposition model is called, which is used to carry out frequency spectrum separation based on signal spectrum;
The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
On the one hand, a kind of speech signal separation device is provided, which includes:
Sampling module samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module obtains the frequency spectrum of the audio signal, is somebody's turn to do for converting the audio signal from time domain to frequency domain
Frequency spectrum is only used for indicating the amplitude of the audio signal and the amplitude is real number;
Decomposing module obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module obtains audio accompaniment for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain
With voice audio.
In a kind of possible implementation, which includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains multiple audio frame for converting multiple audio frame from time domain to frequency domain respectively
Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of multiple audio frame to be combined.
In a kind of possible implementation, which is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible implementation, the length of the default window function is identical as the sampling number of each audio frame.
In a kind of possible implementation, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible implementation, for calling preset decomposition model, which uses the decomposing module
In based on signal spectrum progress frequency spectrum separation;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum
With voice frequency spectrum.
On the one hand, a kind of computer equipment is provided, which includes processor and memory, in the memory
It is stored at least one instruction, which is loaded by the processor and executed to realize that predicate sound signal separation method as above is held
Capable operation.
On the one hand, a kind of computer readable storage medium is provided, at least one instruction is stored in the storage medium, it should
Instruction is loaded as processor and is executed to realize operation performed by predicate sound signal separation method as above.
Method provided in an embodiment of the present invention only indicates that the transformation of the amplitude of audio frame is calculated with real number when using conversion
Method, to carry out the transformation of time domain to frequency domain and frequency domain to time domain, since transformation front and back will not convert phase, phase
Information is not suffered a loss, and therefore, is separated accompaniment and voice from audio file based on this conversion regime, is avoided Fourier transformation frequency
The phase distortion problem of spectral factorization.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of implement scene figure of speech signal separation method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of speech signal separation method provided in an embodiment of the present invention;
Fig. 3 is a kind of speech signal separation apparatus structure schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Fig. 1 is a kind of implement scene figure of speech signal separation method provided in an embodiment of the present invention.Referring to Fig. 1, the reality
Apply in scene may include: at least one terminal 101 and at least one server 102, wherein at least one terminal 101 can be with
As the acquisition terminal of voice signal or the playback terminal of audio file, at least one server 102 is for being at least one
A terminal 101 provides audio service, such as can provide audio file to be played, can also provide such as embodiment of the present invention
The corresponding Signal separator function of institute's providing method, so as to provided by terminal or audio file that terminal is chosen carries out language
Sound signal separation etc..The server 102 can be provided as computer equipment.
Fig. 2 is a kind of flow chart of speech signal separation method provided in an embodiment of the present invention.Referring to fig. 2, the embodiment
It specifically includes:
201, computer equipment samples the acoustic waveform of audio file to be separated, obtains audio signal.
The audio file to be separated can be the audio file of terminal upload, be also possible to store in computer equipment
Audio file, certainly, the computer equipment can be server, be also possible to any one terminal, the embodiment of the present invention to this not
It limits.Computer equipment is after obtaining audio file to be processed, the acoustic waveform of available audio file, and to sound wave
Waveform carries out the sampling of default sample rate, to obtain audio signal.
Wherein, which can be corresponding with the format of the audio file, and different audio file formats can correspond to
In the default sample rate of difference, the acoustic waveform of audio file is sampled using audio sample rate corresponding with the format, it can
To guarantee that the obtained audio signal of sampling is with uniformity.
202, the computer equipment is based on default window function, carries out windowing process to the audio signal, obtains multiple sounds
Frequency frame.
Sub-frame processing can be carried out according to default frame length by sampling obtained audio signal, to obtain multiple original audio frames.
The default frame length should be short enough, can generally be taken as 20 to 50 milliseconds, and within the time short enough, which can be approximate
It is considered as stable periodic signal, in order to the implementation of subsequent step.
When carrying out sub-frame processing, the sampling number of each audio frame should be chosen in reasonable range, to improve audio
The spectral resolution of frame.In a kind of possible implementation, answered between a upper original audio frame and next original audio frame
The part for having frame to be overlapped prevents from going out between two original audio frames to guarantee that each original audio frame has the ingredient of previous frame
Existing discontinuous phenomenon.Generally, the sampling number range of each original audio frame can be chosen at 512 to 8192 points it
Between.For example, in embodiments of the present invention, the sampling number of each audio frame can be chosen at 2048 points, correspondingly, by frame weight
Folded sampling number is chosen at 1024 points.
During above-mentioned sub-frame processing, it may be considered that the sampled point for being included in default frame length and each audio frame
Number, so that the two is all satisfied above-mentioned condition, to reach optimal framing effect.
When actually carrying out sub-frame processing, the mode of adding window can be taken, that is to say and multiple original audio frame is distinguished
Windowing process is carried out, multiple audio frames are obtained, to allow multiple audio frame preferably to meet time-frequency convert in subsequent step
Periodicity requirements reduce the leakage of audio frame frequency spectrum, improve the resolution ratio of frequency spectrum.For example, the default window function can choose the Chinese
Peaceful window or hamming code window.Wherein, the length of the default window function can be identical as the sampling number of each audio frame, each audio frame
Sampling number be frame overlap sampling points 2 times.
203, the computer equipment converts multiple audio frame from time domain to frequency domain respectively, obtains multiple audio frame
Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number.
In embodiments of the present invention, when carrying out time-frequency convert, multiple audio frame can be divided by hartley transform
It does not convert from time domain to frequency domain, obtains the frequency spectrum of multiple audio frame.Since hartley transform is real number transformation, obtain
The frequency spectrum of multiple audio frame is real number frequency spectrum, and, which is only used for indicating the amplitude of the sound spectrum, is not related to phase
Position.Specifically, which can realize using following formula:
K=0 ... .., N-1
Wherein, the number of sampling points of each audio frame is N, and the number of sampling points of frame overlapping is M, and M is the 1/2, x of NnIt is every
The sample amplitude of frame, n=0,1,2 ..., N-1.HkFor the frequency spectrum after hartley transform, k is frequency point, k=0,1,2 ...,
N-1。
It should be noted that the embodiment of the present invention is only illustrated by taking hartley transform as an example, can also actually use
Other do not damage the mapping mode of phase, and it is not limited in the embodiment of the present invention.
204, the frequency spectrum of multiple audio frame is combined by the computer equipment, obtains the frequency spectrum of the audio signal.
When getting the frequency spectrum of each audio frame, the frequency spectrum of each audio frame is spelled by connected head-to-tail mode sequence
It connects, forms the bivector of N*L dimension, wherein N is equal to the number of sampling points of each audio frame, and L is the total number of frame.
205, the computer equipment calls preset decomposition model, which is used to carry out frequency based on signal spectrum
Spectrum separation;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
Wherein, preset decomposition model, which can be, is in advance based on the frequency spectrum of multiple audio signals, based on multiple audio signal
Accompaniment frequency spectrum and voice frequency spectrum be trained.For example, the preset decomposition model can be used to indicate that accompaniment frequency spectrum and
The law of segregation of voice frequency spectrum decomposes the frequency spectrum of the audio signal to be based on the law of segregation.
206, the computer equipment converts the accompaniment frequency spectrum and voice frequency spectrum to time domain from frequency domain, obtain audio accompaniment with
Voice audio.
It, can be by Hartley inverse transformation, by the accompaniment frequency spectrum and voice when getting accompaniment frequency spectrum and voice frequency spectrum
Frequency spectrum is converted from frequency domain to time domain, and audio accompaniment and voice audio are obtained.
Method provided in an embodiment of the present invention only indicates that the transformation of the amplitude of the audio frame is calculated with real number when using conversion
Method due to transformed frequency spectrum, is composed for real number, is believed without phase to carry out the transformation of time domain to frequency domain and frequency domain to time domain
Breath;And carry out after inverse transformation or original phase, phase information are not suffered a loss, therefore, based on this conversion regime from sound
Separation accompaniment and voice, avoid the phase distortion problem of Fourier transformation spectral decomposition in frequency file.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination
It repeats one by one.
Fig. 3 is a kind of structural schematic diagram of speech signal separation device provided in an embodiment of the present invention, described referring to Fig. 3
Device includes:
Sampling module 301 samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module 302 obtains the audio signal for converting the audio signal from time domain to frequency domain
Frequency spectrum, the frequency spectrum is only used for indicating the amplitude of the audio signal and the amplitude is real number;
Decomposing module 303 obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module 304 is accompanied for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain
Audio and voice audio.
In a kind of possible embodiment, first conversion module 302 includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains the multiple sound for converting the multiple audio frame from time domain to frequency domain respectively
The frequency spectrum of frequency frame, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of the multiple audio frame to be combined.
In a kind of possible embodiment, the framing unit is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible embodiment, the sampling number phase of the length of the default window function and each audio frame
Together.
In a kind of possible embodiment, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible embodiment, the decomposing module is for calling preset decomposition model, the preset decomposition mould
Type is used to carry out frequency spectrum separation based on signal spectrum;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output
Accompaniment frequency spectrum and voice frequency spectrum.
It should be understood that speech signal separation device provided by the above embodiment is in speech signal separation, only more than
The division progress of each functional module is stated for example, can according to need and in practical application by above-mentioned function distribution by difference
Functional module complete, i.e., the internal structure of equipment is divided into different functional modules, with complete it is described above whole or
Person's partial function.In addition, speech signal separation device provided by the above embodiment belongs to speech signal separation embodiment of the method
Same design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can be because
Configuration or performance are different and generate bigger difference, may include one or more processors (central
Processing units, CPU) 401 and one or more memory 402, wherein it is stored in the memory 402
There is at least one instruction, at least one instruction is loaded by the processor 401 and executed to realize that above-mentioned each method is real
The method that example offer is provided.Certainly, which can also have wired or wireless network interface, keyboard and input and output
The components such as interface, to carry out input and output, which can also include other components for realizing functions of the equipments,
This will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction,
Above-metioned instruction can be executed by the processor in terminal to complete the speech signal separation method in following embodiments.For example, described
Computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage
Equipment etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (14)
1. a kind of speech signal separation method, which is characterized in that the described method includes:
The acoustic waveform of audio file to be separated is sampled, audio signal is obtained;
The audio signal is converted from time domain to frequency domain, the frequency spectrum of the audio signal is obtained, the frequency spectrum is only used for indicating
The amplitude of the audio signal and the amplitude are real number;
The frequency spectrum of the audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;
The accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.
2. the method according to claim 1, wherein described convert the audio signal to frequency domain from time domain,
Obtain the frequency spectrum of the audio signal, comprising:
The audio signal is subjected to sub-frame processing, obtains multiple audio frames;
The multiple audio frame is converted from time domain to frequency domain respectively, obtains the frequency spectrum of the multiple audio frame, each audio frame
Frequency spectrum be only used for indicating the amplitude of the audio frame and amplitude is real number;
The frequency spectrum of the multiple audio frame is combined, the frequency spectrum of the audio signal is obtained.
3. according to the method described in claim 2, it is characterized in that, it is described by the audio signal carry out sub-frame processing, obtain
Multiple audio frames, comprising:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
4. according to the method described in claim 3, it is characterized in that, the length of the default window function and each audio frame
Sampling number it is identical.
5. according to the method described in claim 2, it is characterized in that, the sampling number of each audio frame is frame overlap sampling points
2 times.
6. being obtained the method according to claim 1, wherein the frequency spectrum by the audio signal decomposes
To accompaniment frequency spectrum and voice frequency spectrum, comprising:
Preset decomposition model is called, the preset decomposition model is used to carry out frequency spectrum separation based on signal spectrum;
The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
7. a kind of speech signal separation device, which is characterized in that described device includes:
Sampling module samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module obtains the frequency spectrum of the audio signal, institute for converting the audio signal from time domain to frequency domain
Frequency spectrum is stated to be only used for indicating the amplitude of the audio signal and the amplitude for real number;
Decomposing module obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module, for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain, obtain audio accompaniment with
Voice audio.
8. device according to claim 7, which is characterized in that first conversion module includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains the multiple audio frame for converting the multiple audio frame from time domain to frequency domain respectively
Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of the multiple audio frame to be combined.
9. device according to claim 8, which is characterized in that the framing unit is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
10. device according to claim 9, which is characterized in that the length of the default window function and each audio
The sampling number of frame is identical.
11. device according to claim 8, which is characterized in that the sampling number of each audio frame is frame overlap sampling point
Several 2 times.
12. device according to claim 7, which is characterized in that the decomposing module is for calling preset decomposition model, institute
State preset decomposition model for based on signal spectrum progress frequency spectrum separation;By described default point of the frequency spectrum input of the audio signal
Solve model, output accompaniment frequency spectrum and voice frequency spectrum.
13. a kind of computer equipment, which is characterized in that the computer equipment includes processor and memory, the memory
In be stored at least one instruction, described instruction is loaded by the processor and is executed to realize as claim 1 to right is wanted
Ask operation performed by 7 described in any item speech signal separation methods.
14. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium
Instruction is stated to be loaded by processor and executed to realize such as claim 1 to the described in any item speech signal separations of claim 7
Operation performed by method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810802835.7A CN108962277A (en) | 2018-07-20 | 2018-07-20 | Speech signal separation method, apparatus, computer equipment and storage medium |
PCT/CN2018/118293 WO2020015270A1 (en) | 2018-07-20 | 2018-11-29 | Voice signal separation method and apparatus, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810802835.7A CN108962277A (en) | 2018-07-20 | 2018-07-20 | Speech signal separation method, apparatus, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108962277A true CN108962277A (en) | 2018-12-07 |
Family
ID=64482037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810802835.7A Pending CN108962277A (en) | 2018-07-20 | 2018-07-20 | Speech signal separation method, apparatus, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108962277A (en) |
WO (1) | WO2020015270A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109767760A (en) * | 2019-02-23 | 2019-05-17 | 天津大学 | Far field audio recognition method based on the study of the multiple target of amplitude and phase information |
CN109801644A (en) * | 2018-12-20 | 2019-05-24 | 北京达佳互联信息技术有限公司 | Separation method, device, electronic equipment and the readable medium of mixed sound signal |
CN110085251A (en) * | 2019-04-26 | 2019-08-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Voice extracting method, voice extraction element and Related product |
CN110277105A (en) * | 2019-07-05 | 2019-09-24 | 广州酷狗计算机科技有限公司 | Eliminate the methods, devices and systems of background audio data |
CN111192594A (en) * | 2020-01-10 | 2020-05-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for separating voice and accompaniment and related product |
CN111429942A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN115240709A (en) * | 2022-07-25 | 2022-10-25 | 镁佳(北京)科技有限公司 | Sound field analysis method and device for audio file |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1945689A (en) * | 2006-10-24 | 2007-04-11 | 北京中星微电子有限公司 | Method and its device for extracting accompanying music from songs |
CN101944355A (en) * | 2009-07-03 | 2011-01-12 | 深圳Tcl新技术有限公司 | Obbligato music generation device and realization method thereof |
CN102402977A (en) * | 2010-09-14 | 2012-04-04 | 无锡中星微电子有限公司 | Method for extracting accompaniment and human voice from stereo music and device of method |
CN104053120A (en) * | 2014-06-13 | 2014-09-17 | 福建星网视易信息系统有限公司 | Method and device for processing stereo audio frequency |
CN106024005A (en) * | 2016-07-01 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Processing method and apparatus for audio data |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8954175B2 (en) * | 2009-03-31 | 2015-02-10 | Adobe Systems Incorporated | User-guided audio selection from complex sound mixtures |
CN104078051B (en) * | 2013-03-29 | 2018-09-25 | 南京中兴软件有限责任公司 | A kind of voice extracting method, system and voice audio frequency playing method and device |
CN103943113B (en) * | 2014-04-15 | 2017-11-07 | 福建星网视易信息系统有限公司 | The method and apparatus that a kind of song goes accompaniment |
CN104134444B (en) * | 2014-07-11 | 2017-03-15 | 福建星网视易信息系统有限公司 | A kind of song based on MMSE removes method and apparatus of accompanying |
-
2018
- 2018-07-20 CN CN201810802835.7A patent/CN108962277A/en active Pending
- 2018-11-29 WO PCT/CN2018/118293 patent/WO2020015270A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1945689A (en) * | 2006-10-24 | 2007-04-11 | 北京中星微电子有限公司 | Method and its device for extracting accompanying music from songs |
CN101944355A (en) * | 2009-07-03 | 2011-01-12 | 深圳Tcl新技术有限公司 | Obbligato music generation device and realization method thereof |
CN102402977A (en) * | 2010-09-14 | 2012-04-04 | 无锡中星微电子有限公司 | Method for extracting accompaniment and human voice from stereo music and device of method |
CN104053120A (en) * | 2014-06-13 | 2014-09-17 | 福建星网视易信息系统有限公司 | Method and device for processing stereo audio frequency |
CN106024005A (en) * | 2016-07-01 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Processing method and apparatus for audio data |
Non-Patent Citations (2)
Title |
---|
吴本谷: "音乐中人声分离研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
栾正禧: "《中国邮电百科全书》", 30 September 1993 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109801644A (en) * | 2018-12-20 | 2019-05-24 | 北京达佳互联信息技术有限公司 | Separation method, device, electronic equipment and the readable medium of mixed sound signal |
US11430427B2 (en) | 2018-12-20 | 2022-08-30 | Beijing Dajia Internet Information Technology Co., Ltd. | Method and electronic device for separating mixed sound signal |
CN109767760A (en) * | 2019-02-23 | 2019-05-17 | 天津大学 | Far field audio recognition method based on the study of the multiple target of amplitude and phase information |
CN110085251B (en) * | 2019-04-26 | 2021-06-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Human voice extraction method, human voice extraction device and related products |
CN110085251A (en) * | 2019-04-26 | 2019-08-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Voice extracting method, voice extraction element and Related product |
CN110277105A (en) * | 2019-07-05 | 2019-09-24 | 广州酷狗计算机科技有限公司 | Eliminate the methods, devices and systems of background audio data |
CN110277105B (en) * | 2019-07-05 | 2021-08-13 | 广州酷狗计算机科技有限公司 | Method, device and system for eliminating background audio data |
CN111192594A (en) * | 2020-01-10 | 2020-05-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for separating voice and accompaniment and related product |
CN111192594B (en) * | 2020-01-10 | 2022-12-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for separating voice and accompaniment and related product |
CN111429942A (en) * | 2020-03-19 | 2020-07-17 | 北京字节跳动网络技术有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN111429942B (en) * | 2020-03-19 | 2023-07-14 | 北京火山引擎科技有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN115240709A (en) * | 2022-07-25 | 2022-10-25 | 镁佳(北京)科技有限公司 | Sound field analysis method and device for audio file |
CN115240709B (en) * | 2022-07-25 | 2023-09-19 | 镁佳(北京)科技有限公司 | Sound field analysis method and device for audio file |
Also Published As
Publication number | Publication date |
---|---|
WO2020015270A1 (en) | 2020-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108962277A (en) | Speech signal separation method, apparatus, computer equipment and storage medium | |
Li et al. | ICASSP 2021 deep noise suppression challenge: Decoupling magnitude and phase optimization with a two-stage deep network | |
CN109584903B (en) | Multi-user voice separation method based on deep learning | |
US20210193149A1 (en) | Method, apparatus and device for voiceprint recognition, and medium | |
CN103426437A (en) | Source separation using independent component analysis with mixed multi-variate probability density function | |
Ming et al. | Exemplar-based sparse representation of timbre and prosody for voice conversion | |
CN103426436A (en) | Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation | |
CN108492818B (en) | Text-to-speech conversion method and device and computer equipment | |
WO1993018505A1 (en) | Voice transformation system | |
WO2022166710A1 (en) | Speech enhancement method and apparatus, device, and storage medium | |
CN103426434A (en) | Source separation by independent component analysis in conjunction with source direction information | |
US10141008B1 (en) | Real-time voice masking in a computer network | |
Kumar | Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation | |
CN113921022B (en) | Audio signal separation method, device, storage medium and electronic equipment | |
US9484044B1 (en) | Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms | |
US9530434B1 (en) | Reducing octave errors during pitch determination for noisy audio signals | |
CN114203163A (en) | Audio signal processing method and device | |
CN112185410A (en) | Audio processing method and device | |
Peer et al. | Phase-aware deep speech enhancement: It's all about the frame length | |
US9208794B1 (en) | Providing sound models of an input signal using continuous and/or linear fitting | |
CN113035207A (en) | Audio processing method and device | |
Li et al. | Filtering and refining: A collaborative-style framework for single-channel speech enhancement | |
CN112750444A (en) | Sound mixing method and device and electronic equipment | |
CN112151055B (en) | Audio processing method and device | |
CN113744715A (en) | Vocoder speech synthesis method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181207 |
|
RJ01 | Rejection of invention patent application after publication |