CN108962277A - Speech signal separation method, apparatus, computer equipment and storage medium - Google Patents

Speech signal separation method, apparatus, computer equipment and storage medium Download PDF

Info

Publication number
CN108962277A
CN108962277A CN201810802835.7A CN201810802835A CN108962277A CN 108962277 A CN108962277 A CN 108962277A CN 201810802835 A CN201810802835 A CN 201810802835A CN 108962277 A CN108962277 A CN 108962277A
Authority
CN
China
Prior art keywords
frequency spectrum
audio
audio signal
frame
accompaniment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810802835.7A
Other languages
Chinese (zh)
Inventor
张超钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201810802835.7A priority Critical patent/CN108962277A/en
Priority to PCT/CN2018/118293 priority patent/WO2020015270A1/en
Publication of CN108962277A publication Critical patent/CN108962277A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a kind of speech signal separation method, apparatus, computer equipment and storage mediums, belong to field of voice signal.The described method includes: sampling to the acoustic waveform of audio file to be separated, audio signal is obtained;Audio signal is converted from time domain to frequency domain, obtains the frequency spectrum of audio signal, frequency spectrum is only used for indicating the amplitude of audio signal and amplitude is real number;The frequency spectrum of audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;Accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.The transformation algorithm of the amplitude of audio frame is only indicated when the present invention is using conversion with real number, to carry out the transformation of time domain to frequency domain and frequency domain to time domain, since transformation front and back will not convert phase, phase information is not suffered a loss, therefore, accompaniment and voice are separated from audio file based on this conversion regime, avoids the phase distortion problem of Fourier transformation spectral decomposition.

Description

Speech signal separation method, apparatus, computer equipment and storage medium
Technical field
The present invention relates to Speech signal processing field, in particular to a kind of speech signal separation method, apparatus, computer are set Standby and storage medium.
Background technique
With the continuous development of voice process technology, speech signal separation has obtained extensively in people's daily life General application.For example, user when using some K song software wants that accompaniment is combined to record the song that oneself is sung, then just needing The accompanying song provided using server, the quality of accompaniment directly affect the effect for recording finished product to the end.Therefore, how to carry out Speech signal separation, it is most important for the quality for promoting audio accompaniment to obtain audio accompaniment and voice audio.
Currently, can be related to turning audio signal from time domain with Fourier transformation when carrying out speech signal separation The process of frequency domain is shifted to, the available complex spectrum of the process.It is thus possible to be divided by being decomposed to complex spectrum The accompaniment frequency spectrum and voice frequency spectrum separated out, then by Fourier inversion, obtain audio accompaniment and voice audio.
In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems: due to plural number When frequency spectrum is decomposed, merely with amplitude frequency spectrum, the phenomenon that there are phase distortions so as to cause the audio accompaniment isolated.
Summary of the invention
The embodiment of the invention provides a kind of speech signal separation method, apparatus, computer equipment and storage medium, energy Enough solve the problems, such as the phase distortion of speech signal separation.The technical solution is as follows:
On the one hand, a kind of speech signal separation method is provided, this method comprises:
The acoustic waveform of audio file to be separated is sampled, audio signal is obtained;
The audio signal is converted from time domain to frequency domain, the frequency spectrum of the audio signal is obtained, which is only used for indicating to be somebody's turn to do The amplitude of audio signal and the amplitude are real number;
The frequency spectrum of the audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;
The accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.
In a kind of possible implementation, this converts the audio signal to frequency domain from time domain, obtains the audio signal Frequency spectrum, comprising:
The audio signal is subjected to sub-frame processing, obtains multiple audio frames;
Multiple audio frame is converted from time domain to frequency domain respectively, obtains the frequency spectrum of multiple audio frame, each audio frame Frequency spectrum be only used for indicating the amplitude of the audio frame and amplitude is real number;
The frequency spectrum of multiple audio frame is combined, the frequency spectrum of the audio signal is obtained.
In a kind of possible implementation, which is carried out sub-frame processing by this, obtains multiple audio frames, comprising:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible implementation, the length of the default window function is identical as the sampling number of each audio frame.
In a kind of possible implementation, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible implementation, this decomposes the frequency spectrum of the audio signal, obtains accompaniment frequency spectrum and voice Frequency spectrum, comprising:
Preset decomposition model is called, which is used to carry out frequency spectrum separation based on signal spectrum;
The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
On the one hand, a kind of speech signal separation device is provided, which includes:
Sampling module samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module obtains the frequency spectrum of the audio signal, is somebody's turn to do for converting the audio signal from time domain to frequency domain Frequency spectrum is only used for indicating the amplitude of the audio signal and the amplitude is real number;
Decomposing module obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module obtains audio accompaniment for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain With voice audio.
In a kind of possible implementation, which includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains multiple audio frame for converting multiple audio frame from time domain to frequency domain respectively Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of multiple audio frame to be combined.
In a kind of possible implementation, which is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible implementation, the length of the default window function is identical as the sampling number of each audio frame.
In a kind of possible implementation, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible implementation, for calling preset decomposition model, which uses the decomposing module In based on signal spectrum progress frequency spectrum separation;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum With voice frequency spectrum.
On the one hand, a kind of computer equipment is provided, which includes processor and memory, in the memory It is stored at least one instruction, which is loaded by the processor and executed to realize that predicate sound signal separation method as above is held Capable operation.
On the one hand, a kind of computer readable storage medium is provided, at least one instruction is stored in the storage medium, it should Instruction is loaded as processor and is executed to realize operation performed by predicate sound signal separation method as above.
Method provided in an embodiment of the present invention only indicates that the transformation of the amplitude of audio frame is calculated with real number when using conversion Method, to carry out the transformation of time domain to frequency domain and frequency domain to time domain, since transformation front and back will not convert phase, phase Information is not suffered a loss, and therefore, is separated accompaniment and voice from audio file based on this conversion regime, is avoided Fourier transformation frequency The phase distortion problem of spectral factorization.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of implement scene figure of speech signal separation method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of speech signal separation method provided in an embodiment of the present invention;
Fig. 3 is a kind of speech signal separation apparatus structure schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Fig. 1 is a kind of implement scene figure of speech signal separation method provided in an embodiment of the present invention.Referring to Fig. 1, the reality Apply in scene may include: at least one terminal 101 and at least one server 102, wherein at least one terminal 101 can be with As the acquisition terminal of voice signal or the playback terminal of audio file, at least one server 102 is for being at least one A terminal 101 provides audio service, such as can provide audio file to be played, can also provide such as embodiment of the present invention The corresponding Signal separator function of institute's providing method, so as to provided by terminal or audio file that terminal is chosen carries out language Sound signal separation etc..The server 102 can be provided as computer equipment.
Fig. 2 is a kind of flow chart of speech signal separation method provided in an embodiment of the present invention.Referring to fig. 2, the embodiment It specifically includes:
201, computer equipment samples the acoustic waveform of audio file to be separated, obtains audio signal.
The audio file to be separated can be the audio file of terminal upload, be also possible to store in computer equipment Audio file, certainly, the computer equipment can be server, be also possible to any one terminal, the embodiment of the present invention to this not It limits.Computer equipment is after obtaining audio file to be processed, the acoustic waveform of available audio file, and to sound wave Waveform carries out the sampling of default sample rate, to obtain audio signal.
Wherein, which can be corresponding with the format of the audio file, and different audio file formats can correspond to In the default sample rate of difference, the acoustic waveform of audio file is sampled using audio sample rate corresponding with the format, it can To guarantee that the obtained audio signal of sampling is with uniformity.
202, the computer equipment is based on default window function, carries out windowing process to the audio signal, obtains multiple sounds Frequency frame.
Sub-frame processing can be carried out according to default frame length by sampling obtained audio signal, to obtain multiple original audio frames. The default frame length should be short enough, can generally be taken as 20 to 50 milliseconds, and within the time short enough, which can be approximate It is considered as stable periodic signal, in order to the implementation of subsequent step.
When carrying out sub-frame processing, the sampling number of each audio frame should be chosen in reasonable range, to improve audio The spectral resolution of frame.In a kind of possible implementation, answered between a upper original audio frame and next original audio frame The part for having frame to be overlapped prevents from going out between two original audio frames to guarantee that each original audio frame has the ingredient of previous frame Existing discontinuous phenomenon.Generally, the sampling number range of each original audio frame can be chosen at 512 to 8192 points it Between.For example, in embodiments of the present invention, the sampling number of each audio frame can be chosen at 2048 points, correspondingly, by frame weight Folded sampling number is chosen at 1024 points.
During above-mentioned sub-frame processing, it may be considered that the sampled point for being included in default frame length and each audio frame Number, so that the two is all satisfied above-mentioned condition, to reach optimal framing effect.
When actually carrying out sub-frame processing, the mode of adding window can be taken, that is to say and multiple original audio frame is distinguished Windowing process is carried out, multiple audio frames are obtained, to allow multiple audio frame preferably to meet time-frequency convert in subsequent step Periodicity requirements reduce the leakage of audio frame frequency spectrum, improve the resolution ratio of frequency spectrum.For example, the default window function can choose the Chinese Peaceful window or hamming code window.Wherein, the length of the default window function can be identical as the sampling number of each audio frame, each audio frame Sampling number be frame overlap sampling points 2 times.
203, the computer equipment converts multiple audio frame from time domain to frequency domain respectively, obtains multiple audio frame Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number.
In embodiments of the present invention, when carrying out time-frequency convert, multiple audio frame can be divided by hartley transform It does not convert from time domain to frequency domain, obtains the frequency spectrum of multiple audio frame.Since hartley transform is real number transformation, obtain The frequency spectrum of multiple audio frame is real number frequency spectrum, and, which is only used for indicating the amplitude of the sound spectrum, is not related to phase Position.Specifically, which can realize using following formula:
K=0 ... .., N-1
Wherein, the number of sampling points of each audio frame is N, and the number of sampling points of frame overlapping is M, and M is the 1/2, x of NnIt is every The sample amplitude of frame, n=0,1,2 ..., N-1.HkFor the frequency spectrum after hartley transform, k is frequency point, k=0,1,2 ..., N-1。
It should be noted that the embodiment of the present invention is only illustrated by taking hartley transform as an example, can also actually use Other do not damage the mapping mode of phase, and it is not limited in the embodiment of the present invention.
204, the frequency spectrum of multiple audio frame is combined by the computer equipment, obtains the frequency spectrum of the audio signal.
When getting the frequency spectrum of each audio frame, the frequency spectrum of each audio frame is spelled by connected head-to-tail mode sequence It connects, forms the bivector of N*L dimension, wherein N is equal to the number of sampling points of each audio frame, and L is the total number of frame.
205, the computer equipment calls preset decomposition model, which is used to carry out frequency based on signal spectrum Spectrum separation;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
Wherein, preset decomposition model, which can be, is in advance based on the frequency spectrum of multiple audio signals, based on multiple audio signal Accompaniment frequency spectrum and voice frequency spectrum be trained.For example, the preset decomposition model can be used to indicate that accompaniment frequency spectrum and The law of segregation of voice frequency spectrum decomposes the frequency spectrum of the audio signal to be based on the law of segregation.
206, the computer equipment converts the accompaniment frequency spectrum and voice frequency spectrum to time domain from frequency domain, obtain audio accompaniment with Voice audio.
It, can be by Hartley inverse transformation, by the accompaniment frequency spectrum and voice when getting accompaniment frequency spectrum and voice frequency spectrum Frequency spectrum is converted from frequency domain to time domain, and audio accompaniment and voice audio are obtained.
Method provided in an embodiment of the present invention only indicates that the transformation of the amplitude of the audio frame is calculated with real number when using conversion Method due to transformed frequency spectrum, is composed for real number, is believed without phase to carry out the transformation of time domain to frequency domain and frequency domain to time domain Breath;And carry out after inverse transformation or original phase, phase information are not suffered a loss, therefore, based on this conversion regime from sound Separation accompaniment and voice, avoid the phase distortion problem of Fourier transformation spectral decomposition in frequency file.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.
Fig. 3 is a kind of structural schematic diagram of speech signal separation device provided in an embodiment of the present invention, described referring to Fig. 3 Device includes:
Sampling module 301 samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module 302 obtains the audio signal for converting the audio signal from time domain to frequency domain Frequency spectrum, the frequency spectrum is only used for indicating the amplitude of the audio signal and the amplitude is real number;
Decomposing module 303 obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module 304 is accompanied for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain Audio and voice audio.
In a kind of possible embodiment, first conversion module 302 includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains the multiple sound for converting the multiple audio frame from time domain to frequency domain respectively The frequency spectrum of frequency frame, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of the multiple audio frame to be combined.
In a kind of possible embodiment, the framing unit is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
In a kind of possible embodiment, the sampling number phase of the length of the default window function and each audio frame Together.
In a kind of possible embodiment, the sampling number of each audio frame is 2 times of frame overlap sampling points.
In a kind of possible embodiment, the decomposing module is for calling preset decomposition model, the preset decomposition mould Type is used to carry out frequency spectrum separation based on signal spectrum;The frequency spectrum of the audio signal is inputted into the preset decomposition model, output Accompaniment frequency spectrum and voice frequency spectrum.
It should be understood that speech signal separation device provided by the above embodiment is in speech signal separation, only more than The division progress of each functional module is stated for example, can according to need and in practical application by above-mentioned function distribution by difference Functional module complete, i.e., the internal structure of equipment is divided into different functional modules, with complete it is described above whole or Person's partial function.In addition, speech signal separation device provided by the above embodiment belongs to speech signal separation embodiment of the method Same design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can be because Configuration or performance are different and generate bigger difference, may include one or more processors (central Processing units, CPU) 401 and one or more memory 402, wherein it is stored in the memory 402 There is at least one instruction, at least one instruction is loaded by the processor 401 and executed to realize that above-mentioned each method is real The method that example offer is provided.Certainly, which can also have wired or wireless network interface, keyboard and input and output The components such as interface, to carry out input and output, which can also include other components for realizing functions of the equipments, This will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, the memory for example including instruction, Above-metioned instruction can be executed by the processor in terminal to complete the speech signal separation method in following embodiments.For example, described Computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage Equipment etc..
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (14)

1. a kind of speech signal separation method, which is characterized in that the described method includes:
The acoustic waveform of audio file to be separated is sampled, audio signal is obtained;
The audio signal is converted from time domain to frequency domain, the frequency spectrum of the audio signal is obtained, the frequency spectrum is only used for indicating The amplitude of the audio signal and the amplitude are real number;
The frequency spectrum of the audio signal is decomposed, accompaniment frequency spectrum and voice frequency spectrum are obtained;
The accompaniment frequency spectrum and voice frequency spectrum are converted from frequency domain to time domain, audio accompaniment and voice audio are obtained.
2. the method according to claim 1, wherein described convert the audio signal to frequency domain from time domain, Obtain the frequency spectrum of the audio signal, comprising:
The audio signal is subjected to sub-frame processing, obtains multiple audio frames;
The multiple audio frame is converted from time domain to frequency domain respectively, obtains the frequency spectrum of the multiple audio frame, each audio frame Frequency spectrum be only used for indicating the amplitude of the audio frame and amplitude is real number;
The frequency spectrum of the multiple audio frame is combined, the frequency spectrum of the audio signal is obtained.
3. according to the method described in claim 2, it is characterized in that, it is described by the audio signal carry out sub-frame processing, obtain Multiple audio frames, comprising:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
4. according to the method described in claim 3, it is characterized in that, the length of the default window function and each audio frame Sampling number it is identical.
5. according to the method described in claim 2, it is characterized in that, the sampling number of each audio frame is frame overlap sampling points 2 times.
6. being obtained the method according to claim 1, wherein the frequency spectrum by the audio signal decomposes To accompaniment frequency spectrum and voice frequency spectrum, comprising:
Preset decomposition model is called, the preset decomposition model is used to carry out frequency spectrum separation based on signal spectrum;
The frequency spectrum of the audio signal is inputted into the preset decomposition model, output accompaniment frequency spectrum and voice frequency spectrum.
7. a kind of speech signal separation device, which is characterized in that described device includes:
Sampling module samples for the acoustic waveform to audio file to be separated, obtains audio signal;
First conversion module obtains the frequency spectrum of the audio signal, institute for converting the audio signal from time domain to frequency domain Frequency spectrum is stated to be only used for indicating the amplitude of the audio signal and the amplitude for real number;
Decomposing module obtains accompaniment frequency spectrum and voice frequency spectrum for decomposing the frequency spectrum of the audio signal;
Second conversion module, for converting the accompaniment frequency spectrum and voice frequency spectrum from frequency domain to time domain, obtain audio accompaniment with Voice audio.
8. device according to claim 7, which is characterized in that first conversion module includes:
Framing unit obtains multiple audio frames for the audio signal to be carried out sub-frame processing;
Time-frequency convert unit obtains the multiple audio frame for converting the multiple audio frame from time domain to frequency domain respectively Frequency spectrum, the frequency spectrum of each audio frame is only used for indicating the amplitude of the audio frame and amplitude is real number;
Assembled unit obtains the frequency spectrum of the audio signal for the frequency spectrum of the multiple audio frame to be combined.
9. device according to claim 8, which is characterized in that the framing unit is used for:
Based on default window function, windowing process is carried out to the audio signal, obtains multiple audio frames.
10. device according to claim 9, which is characterized in that the length of the default window function and each audio The sampling number of frame is identical.
11. device according to claim 8, which is characterized in that the sampling number of each audio frame is frame overlap sampling point Several 2 times.
12. device according to claim 7, which is characterized in that the decomposing module is for calling preset decomposition model, institute State preset decomposition model for based on signal spectrum progress frequency spectrum separation;By described default point of the frequency spectrum input of the audio signal Solve model, output accompaniment frequency spectrum and voice frequency spectrum.
13. a kind of computer equipment, which is characterized in that the computer equipment includes processor and memory, the memory In be stored at least one instruction, described instruction is loaded by the processor and is executed to realize as claim 1 to right is wanted Ask operation performed by 7 described in any item speech signal separation methods.
14. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium Instruction is stated to be loaded by processor and executed to realize such as claim 1 to the described in any item speech signal separations of claim 7 Operation performed by method.
CN201810802835.7A 2018-07-20 2018-07-20 Speech signal separation method, apparatus, computer equipment and storage medium Pending CN108962277A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810802835.7A CN108962277A (en) 2018-07-20 2018-07-20 Speech signal separation method, apparatus, computer equipment and storage medium
PCT/CN2018/118293 WO2020015270A1 (en) 2018-07-20 2018-11-29 Voice signal separation method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810802835.7A CN108962277A (en) 2018-07-20 2018-07-20 Speech signal separation method, apparatus, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108962277A true CN108962277A (en) 2018-12-07

Family

ID=64482037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810802835.7A Pending CN108962277A (en) 2018-07-20 2018-07-20 Speech signal separation method, apparatus, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN108962277A (en)
WO (1) WO2020015270A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767760A (en) * 2019-02-23 2019-05-17 天津大学 Far field audio recognition method based on the study of the multiple target of amplitude and phase information
CN109801644A (en) * 2018-12-20 2019-05-24 北京达佳互联信息技术有限公司 Separation method, device, electronic equipment and the readable medium of mixed sound signal
CN110085251A (en) * 2019-04-26 2019-08-02 腾讯音乐娱乐科技(深圳)有限公司 Voice extracting method, voice extraction element and Related product
CN110277105A (en) * 2019-07-05 2019-09-24 广州酷狗计算机科技有限公司 Eliminate the methods, devices and systems of background audio data
CN111192594A (en) * 2020-01-10 2020-05-22 腾讯音乐娱乐科技(深圳)有限公司 Method for separating voice and accompaniment and related product
CN111429942A (en) * 2020-03-19 2020-07-17 北京字节跳动网络技术有限公司 Audio data processing method and device, electronic equipment and storage medium
CN115240709A (en) * 2022-07-25 2022-10-25 镁佳(北京)科技有限公司 Sound field analysis method and device for audio file

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945689A (en) * 2006-10-24 2007-04-11 北京中星微电子有限公司 Method and its device for extracting accompanying music from songs
CN101944355A (en) * 2009-07-03 2011-01-12 深圳Tcl新技术有限公司 Obbligato music generation device and realization method thereof
CN102402977A (en) * 2010-09-14 2012-04-04 无锡中星微电子有限公司 Method for extracting accompaniment and human voice from stereo music and device of method
CN104053120A (en) * 2014-06-13 2014-09-17 福建星网视易信息系统有限公司 Method and device for processing stereo audio frequency
CN106024005A (en) * 2016-07-01 2016-10-12 腾讯科技(深圳)有限公司 Processing method and apparatus for audio data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8954175B2 (en) * 2009-03-31 2015-02-10 Adobe Systems Incorporated User-guided audio selection from complex sound mixtures
CN104078051B (en) * 2013-03-29 2018-09-25 南京中兴软件有限责任公司 A kind of voice extracting method, system and voice audio frequency playing method and device
CN103943113B (en) * 2014-04-15 2017-11-07 福建星网视易信息系统有限公司 The method and apparatus that a kind of song goes accompaniment
CN104134444B (en) * 2014-07-11 2017-03-15 福建星网视易信息系统有限公司 A kind of song based on MMSE removes method and apparatus of accompanying

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1945689A (en) * 2006-10-24 2007-04-11 北京中星微电子有限公司 Method and its device for extracting accompanying music from songs
CN101944355A (en) * 2009-07-03 2011-01-12 深圳Tcl新技术有限公司 Obbligato music generation device and realization method thereof
CN102402977A (en) * 2010-09-14 2012-04-04 无锡中星微电子有限公司 Method for extracting accompaniment and human voice from stereo music and device of method
CN104053120A (en) * 2014-06-13 2014-09-17 福建星网视易信息系统有限公司 Method and device for processing stereo audio frequency
CN106024005A (en) * 2016-07-01 2016-10-12 腾讯科技(深圳)有限公司 Processing method and apparatus for audio data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴本谷: "音乐中人声分离研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
栾正禧: "《中国邮电百科全书》", 30 September 1993 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801644A (en) * 2018-12-20 2019-05-24 北京达佳互联信息技术有限公司 Separation method, device, electronic equipment and the readable medium of mixed sound signal
US11430427B2 (en) 2018-12-20 2022-08-30 Beijing Dajia Internet Information Technology Co., Ltd. Method and electronic device for separating mixed sound signal
CN109767760A (en) * 2019-02-23 2019-05-17 天津大学 Far field audio recognition method based on the study of the multiple target of amplitude and phase information
CN110085251B (en) * 2019-04-26 2021-06-25 腾讯音乐娱乐科技(深圳)有限公司 Human voice extraction method, human voice extraction device and related products
CN110085251A (en) * 2019-04-26 2019-08-02 腾讯音乐娱乐科技(深圳)有限公司 Voice extracting method, voice extraction element and Related product
CN110277105A (en) * 2019-07-05 2019-09-24 广州酷狗计算机科技有限公司 Eliminate the methods, devices and systems of background audio data
CN110277105B (en) * 2019-07-05 2021-08-13 广州酷狗计算机科技有限公司 Method, device and system for eliminating background audio data
CN111192594A (en) * 2020-01-10 2020-05-22 腾讯音乐娱乐科技(深圳)有限公司 Method for separating voice and accompaniment and related product
CN111192594B (en) * 2020-01-10 2022-12-09 腾讯音乐娱乐科技(深圳)有限公司 Method for separating voice and accompaniment and related product
CN111429942A (en) * 2020-03-19 2020-07-17 北京字节跳动网络技术有限公司 Audio data processing method and device, electronic equipment and storage medium
CN111429942B (en) * 2020-03-19 2023-07-14 北京火山引擎科技有限公司 Audio data processing method and device, electronic equipment and storage medium
CN115240709A (en) * 2022-07-25 2022-10-25 镁佳(北京)科技有限公司 Sound field analysis method and device for audio file
CN115240709B (en) * 2022-07-25 2023-09-19 镁佳(北京)科技有限公司 Sound field analysis method and device for audio file

Also Published As

Publication number Publication date
WO2020015270A1 (en) 2020-01-23

Similar Documents

Publication Publication Date Title
CN108962277A (en) Speech signal separation method, apparatus, computer equipment and storage medium
Li et al. ICASSP 2021 deep noise suppression challenge: Decoupling magnitude and phase optimization with a two-stage deep network
CN109584903B (en) Multi-user voice separation method based on deep learning
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
CN103426437A (en) Source separation using independent component analysis with mixed multi-variate probability density function
Ming et al. Exemplar-based sparse representation of timbre and prosody for voice conversion
CN103426436A (en) Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation
CN108492818B (en) Text-to-speech conversion method and device and computer equipment
WO1993018505A1 (en) Voice transformation system
WO2022166710A1 (en) Speech enhancement method and apparatus, device, and storage medium
CN103426434A (en) Source separation by independent component analysis in conjunction with source direction information
US10141008B1 (en) Real-time voice masking in a computer network
Kumar Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation
CN113921022B (en) Audio signal separation method, device, storage medium and electronic equipment
US9484044B1 (en) Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms
US9530434B1 (en) Reducing octave errors during pitch determination for noisy audio signals
CN114203163A (en) Audio signal processing method and device
CN112185410A (en) Audio processing method and device
Peer et al. Phase-aware deep speech enhancement: It's all about the frame length
US9208794B1 (en) Providing sound models of an input signal using continuous and/or linear fitting
CN113035207A (en) Audio processing method and device
Li et al. Filtering and refining: A collaborative-style framework for single-channel speech enhancement
CN112750444A (en) Sound mixing method and device and electronic equipment
CN112151055B (en) Audio processing method and device
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207

RJ01 Rejection of invention patent application after publication