WO2020097841A1 - Voice activity detection method and apparatus, storage medium and electronic device - Google Patents
Voice activity detection method and apparatus, storage medium and electronic device Download PDFInfo
- Publication number
- WO2020097841A1 WO2020097841A1 PCT/CN2018/115601 CN2018115601W WO2020097841A1 WO 2020097841 A1 WO2020097841 A1 WO 2020097841A1 CN 2018115601 W CN2018115601 W CN 2018115601W WO 2020097841 A1 WO2020097841 A1 WO 2020097841A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- frame
- signal
- frequency domain
- noise reduction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
Definitions
- the present application belongs to the technical field of terminals, and particularly relates to a voice endpoint detection method, device, storage medium, and electronic equipment.
- voice processing technologies such as voiceprint wake-up and voice recognition have also developed more and more mature.
- speech endpoint detection technology greatly affects the performance of speech processing technology.
- the detection premise of the voice endpoint detection technology is to assume that the voice signal is a short-term stationary signal, which leads to a low accuracy of the voice endpoint detection technology in different non-stationary noise environments.
- Embodiments of the present application provide a voice endpoint detection method, device, storage medium, and electronic equipment, which can improve the accuracy of voice endpoint detection.
- an embodiment of the present application provides a voice endpoint detection method, including:
- an embodiment of the present application provides a voice endpoint detection device, including:
- Acquisition module for acquiring noisy speech signals
- a noise reduction module configured to perform noise reduction processing on the noisy speech signal to obtain a noise-reduced speech signal
- a calculation module used to calculate the spectral entropy ratio of the noise-reduced speech signal, and calculate the short-term energy of the noise-reduced speech signal;
- the detection module is configured to perform speech endpoint detection based on the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- an embodiment of the present application provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the voice endpoint detection method provided in this embodiment.
- an embodiment of the present application provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is used to execute the computer program by calling the computer program stored in the memory:
- the noise-reducing voice signal since the noise-containing voice signal affects the accuracy of the detection of the voice endpoint, the noise-reducing voice signal is processed to make it a noise-reduced voice signal, and then the noise reduction that can improve the accuracy of the voice endpoint detection
- the spectral entropy ratio and short-term energy of the voice signal are used to detect the voice endpoint of the noise-reduced voice signal, which effectively improves the accuracy of voice endpoint detection.
- FIG. 1 is a first schematic flowchart of a voice endpoint detection method provided by an embodiment of the present application.
- FIG. 2 is a second schematic flowchart of a voice endpoint detection method provided by an embodiment of the present application.
- FIG. 3 is a third schematic flowchart of a voice endpoint detection method provided by an embodiment of the present application.
- FIG. 4 is a fourth schematic flowchart of a voice endpoint detection method provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a voice endpoint detection device provided by an embodiment of the present application.
- FIG. 6 is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 7 is a second schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 1 is a first schematic flowchart of a voice endpoint detection method provided by an embodiment of the present application.
- the process of the voice endpoint detection method may include:
- Voice endpoint detection refers to detecting the presence or absence of voice in a noisy environment. It is usually used in voice processing systems such as voice coding and voice enhancement to reduce the voice coding rate, save communication bandwidth, reduce mobile device energy consumption, and improve recognition Rate and other effects.
- the noisy speech signal may refer to speech signals in different unstable noise environments.
- the noise-reduced speech signal is subjected to noise reduction processing to obtain a noise-reduced speech signal.
- the electronic device may perform noise reduction processing on the noisy speech signal to obtain a noise-reduced speech signal.
- the frequency of the noise signal is lower than the frequency of the speech signal, and the lower the frequency of the noise signal, the smaller the impact on the accuracy of the detection of speech endpoints. Therefore, the electronic device can increase the frequency of the voice signal in the noisy voice signal and reduce the frequency of the noise signal in the noisy voice signal, so as to reduce the influence of the noise signal on the accuracy of voice endpoint detection.
- noise reduction processing of the noisy speech signal is not limited to the above method, but may be other methods as long as the purpose of reducing the noise of the noisy speech signal can be achieved.
- the spectral entropy ratio of the noise-reduced speech signal is calculated, and the short-term energy of the noise-reduced speech signal is calculated.
- the electronic device can calculate the noise-reduced speech The spectral entropy ratio of the signal and the short-term energy of the noise-reduced speech signal are calculated, so that speech endpoint detection can be performed according to the characteristics of the two noise-reduced speech signals.
- speech endpoint detection is performed according to the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- the electronic device can use the spectral entropy ratio of the noise-reduced speech signal and the short-term energy value of the noise-reduced speech signal as threshold parameters to perform speech endpoint detection, that is, to detect the presence or absence of speech, thereby detecting a valid speech segment.
- the values of the spectral entropy ratio and the short-term energy can be set according to the actual situation. For example, it can be set that when the value of the spectral entropy ratio is A, it can be detected that there is speech, and when the value of the spectral entropy ratio is B, it can be detected that there is no speech. It can be set that when the short-term energy value is C, it can detect the presence of voice; when the short-term energy value is D, it can detect that there is no voice.
- the noise-reducing voice signal since the noise-containing voice signal affects the accuracy of voice endpoint detection, the noise-reducing voice signal is subjected to noise reduction processing to make it a noise-reduced voice signal, and then used to improve the accuracy of voice endpoint detection
- the spectral entropy ratio and short-term energy of the noise-reduced speech signal are used to detect the speech endpoint of the noise-reduced speech signal, which effectively improves the accuracy of speech endpoint detection.
- FIG. 2 is a second schematic flowchart of a voice endpoint detection method according to an embodiment of the present application.
- the process of the voice endpoint detection method may include:
- the electronic device acquires a noisy speech signal.
- a noisy speech signal may refer to a speech signal in different unstable noise environments.
- the noisy speech signal is a time-domain signal.
- the electronic device performs frame windowing processing on the noisy speech signal to obtain a multi-frame windowed time domain signal.
- the electronic device performs frame windowing on the noisy speech signal y (n) to obtain a multi-frame windowed time domain signal.
- the electronic device can usually take a frame length of 20ms and take a frame shift of 10ms to frame the noisy speech signal.
- the windowed time domain signal is a time domain signal, and the windowed time domain signal of each frame may include a noise signal part and a speech signal part.
- the noise signal and speech signal here are both time-domain signals.
- the electronic device performs Fourier transform on each frame of the windowed time domain signal of the multi-frame windowed time domain signal to obtain a multi-frame frequency domain signal.
- each frame frequency domain signal may include a noise signal part and a voice signal part.
- the noise signal and speech signal here are both frequency domain signals.
- f is the frequency component
- i is the number of frames.
- the electronic device estimates the Fourier coefficients of the frequency domain signal for each frame.
- the Fourier coefficient of the frequency domain signal of the i-th frame can be estimated using the following formula:
- ⁇ (f, i) is the estimated prior signal-to-noise ratio
- ⁇ (f, i) is the estimated posterior signal-to-noise ratio
- p (f, i) represents The probability of the existence of speech
- q (f, i) represents the probability of the absence of speech.
- the electronic device performs noise reduction processing on the frequency domain signal of each frame according to the Fourier coefficients of the frequency domain signal of each frame to obtain a multi-frame noise reduction frequency domain signal.
- the electronic device in order to reduce the influence of the noise signal on the detection of the voice endpoint, can reduce the Fourier coefficient of the noise signal portion in the frequency domain signal of each frame by using an appropriate G 0 .
- the electronic device can also increase the Fourier coefficient of the speech signal part in the frequency domain signal of each frame.
- the electronic device calculates the energy spectrum of the noise-reduced frequency domain signal for each frame.
- the process 206 may be implemented by the processes 2061, 2062, 2063, and 2064, which may be:
- the electronic device acquires the frequency band information of the noise-reduced frequency domain signal for each frame.
- the electronic device can acquire the frequency range of the noise-reduced frequency domain signal of the i-th frame.
- the electronic device divides the noise reduction frequency domain signal of each frame according to the frequency band information to obtain multiple sub-noise reduction frequency domain signals corresponding to the noise reduction frequency domain signal of each frame.
- the frequency band of the noise reduction frequency domain signal of the ith frame is 500 Hz to 1400 Hz.
- the electronic device may divide the noise reduction frequency domain signal of the i-th frame into multiple sub-noise reduction frequency domain signals according to the frequency band range. For example, assuming that the i-th frame noise reduction frequency domain signal is divided into 3 sub-noise reduction frequency domain signals, then the electronic device may divide the first sub-noise reduction frequency domain signal into a frequency band range of 500 Hz to 800 Hz, and the second sub-noise reduction noise The frequency range included in the frequency domain signal is 800 Hz to 1100 Hz, and the frequency range included in the third sub-noise reduction frequency domain signal is 1100 Hz to 1400 Hz.
- noise reduction frequency domain signal of each frame and how many sub-noise reduction frequency domain signals are divided into noise reduction frequency domain signals of each frame can be determined according to actual needs, and no specific limitation is made here.
- the electronic device calculates the energy spectrum of each sub-noise reduction frequency domain signal of the plurality of sub-noise reduction frequency domain signals.
- the formula for calculating the energy spectrum of the w-th sub-noise reduction frequency domain signal in the i-th frame can be:
- E (w, i) represents the energy spectrum of the wth sub-noise reduction frequency domain signal in the i frame
- N b represents the total number of sub-noise reduction frequency domain signals
- N can be set according to actual needs, usually set to 2 nth power. For example, N can be set to 256, 512, 1024, etc.
- the electronic device calculates the energy spectrum of the noise reduction frequency domain signal of each frame according to the energy spectrum of each sub-noise reduction frequency domain signal.
- the energy spectrum of the noise reduction frequency domain signal in the i-th frame is the sum of the energy spectrums of all sub-noise reduction frequency domain signals divided in the frame.
- the formula for calculating the energy spectrum of the noise-reduced frequency domain signal in the i-th frame can be:
- N b is the total number of sub-noise reduction frequency domain signals
- E (i) is the energy spectrum of the i-th frame noise reduction frequency domain signal
- E (w, i) is The energy spectrum of the w-th sub-noise frequency-domain signal in the i-th frame.
- the electronic device calculates the spectral entropy of the noise-reduced frequency domain signal for each frame.
- the process 207 may be implemented through the process 2071 and the process 2072, which may be:
- the electronic device calculates the normalized probability density of each sub-noise reduction frequency domain signal according to the energy spectrum of each sub-noise reduction frequency domain signal and the energy spectrum of each frame of the denoise frequency domain signal.
- the calculation formula of the normalized density of the w-th sub-noise reduction frequency domain signal in the i-th frame can be:
- w represents the wth sub-noise reduction frequency domain signal
- N b represents the total number of sub-noise reduction frequency domain signals
- p (w, i) represents the normalized density of the wth sub-noise reduction frequency domain signal in the i frame
- E (w, i) represents the energy spectrum of the w-th sub-noise reduction frequency domain signal in the i-th frame
- E (i) represents the energy spectrum of the i-th frame noise reduction frequency-domain signal.
- the electronic device calculates the spectral entropy of the noise reduction frequency domain signal for each frame according to the normalized probability density of each sub-noise reduction frequency domain signal.
- the formula for calculating the spectral entropy of the noise-reduced frequency-domain signal in frame i can be:
- w represents the wth sub-noise reduction frequency domain signal
- N b represents the total number of sub-noise reduction frequency domain signals
- p (w, i) represents the normalized density of the wth sub-noise reduction frequency domain signal in the i frame
- H (i) represents the spectral entropy of the noise-reduced frequency domain signal of the i-th frame.
- the electronic device calculates the spectral entropy ratio of the noise reduction frequency domain signal of each frame according to the energy spectrum of the noise reduction frequency domain signal of each frame and the spectral entropy of the noise reduction frequency domain signal of each frame, to obtain Spectral entropy ratio.
- the formula for calculating the spectral entropy ratio of the noise-reduced frequency-domain signal in frame i can be:
- EER TEO (i) represents the spectral entropy ratio of the noise reduction frequency domain signal of the i frame
- E (i) represents the energy spectrum of the noise reduction frequency domain signal of the i frame
- H (i) represents the noise reduction frequency domain of the i frame The spectral entropy of the signal.
- the electronic device can calculate the spectral entropy ratio of the noise reduction frequency domain signal of each frame according to the calculation formula of the spectral entropy ratio of the noise reduction frequency domain signal of the i-th frame above, so as to obtain the spectrum of all frames of the noise reduction speech signal Entropy ratio.
- the electronic device performs an inverse Fourier transform on the noise-reduced frequency domain signal of each frame to obtain a multi-frame noise-reduced time domain signal.
- performing inverse Fourier transform on the frequency domain signal can convert the frequency domain signal into a time domain signal. Therefore, the electronic device performs inverse Fourier transform on the noise-reduced frequency domain signal of each frame to obtain multi-frame noise-reduced time domain signal
- the electronic device calculates the short-term energy of the noise reduction time-domain signal of each frame to obtain the short-term energy of all frames of the noise reduction speech signal.
- the calculation formula of the short-term energy of the noise reduction time-domain signal of the i frame is:
- Represents the short-term energy of the noise reduction time-domain signal of frame i Represents the noise reduction time-domain signal of frame i
- the electronic device can calculate the short-term energy of the noise reduction frequency domain signal of each frame according to the calculation formula of the short-term energy of the noise reduction frequency domain signal of the i-th frame above, so as to obtain the short frame of all frames of the noise reduction speech signal ⁇ ⁇ Time energy.
- the electronic device determines the position of the voice starting point based on the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal; and / or the electronic device
- the spectral entropy ratio of the frame and the short-term energy of all frames of the noise-reduced speech signal determine the position of the speech termination point.
- the process 211 may be: if the electronic device detects that no speech exists according to the spectral entropy ratio of the first number of frames of the noise-reduced speech signal and the short-term energy of the first number of frames of the noise-reduced speech signal, and According to the spectral entropy ratio of the second number of frames of the noise-reduced speech signal and the spectral entropy ratio of the second number of frames of the noise-reduced speech signal, detecting the presence of speech, the electronic device determines that the first frame of the second number of frames is located The position is the position of the starting point of the voice.
- the process 212 may be: if the electronic device detects the presence of speech according to the spectral entropy ratio of the third number of frames of the noise-reduced speech signal and the short-term energy of the third number of frames of the noise-reduced speech signal, and According to the spectral entropy ratio of the fourth number of frames of the noise-reduced speech signal and the short-term energy of the fourth number of frames of the noise-reduced speech signal, it is detected that no speech exists, and the electronic device determines The position is the position of the voice termination point.
- the electronic device can determine The position of the first frame in subsequent frames is the position of the starting point of speech.
- the electronic device may determine The position of the first frame in subsequent frames is the position of the voice termination point.
- the noise-reduced speech signal is divided into 20 frames, namely the first frame, the second frame, the third frame ... the 19th frame, the 20th frame.
- the electronic device detects that no speech exists according to the spectral entropy ratio of the first frame to the fifth frame and the short-term energy of the first frame to the fifth frame, and according to the spectral entropy ratio of the sixth frame to the tenth frame and the sixth
- the electronic device determines that the position of frame 6 is the position of the starting point of the voice.
- the electronic device detects the presence of speech based on the spectral entropy ratio of frames 11 to 15 and the short-term energy of frames 11 to 15 and based on the spectral entropy ratio of frames 16 to 20 and frames 16 to 20.
- the short-term energy of the 20th frame detects that there is no voice, and the electronic device determines that the position of the 16th frame is the position of the voice termination point.
- the values of the spectral entropy ratio and the short-term energy can be set according to the actual situation. For example, it can be set that when the value of the spectral entropy ratio is A, it can be detected that there is speech, and when the value of the spectral entropy ratio is B, it can be detected that there is no speech. It can be set that when the short-term energy value is C, it can detect the presence of voice, and when the short-term energy value is D, it can detect that there is no voice.
- the above is only one example of determining the position of the voice start point and the position of the voice end point proposed in this embodiment. It can be understood that, within the protection scope of the embodiments of the present application, the position of the voice start point and the position of the voice end point may also be determined in other ways, and no specific limitation is made here.
- FIG. 5 is a schematic structural diagram of a voice endpoint detection device 300 according to an embodiment of the present application.
- the voice endpoint detection device 300 may include: an acquisition module 301, a noise reduction module 302, a calculation module 303, and a detection module 304.
- the obtaining module 301 is used to obtain a noisy speech signal.
- the noise reduction module 302 is configured to perform noise reduction processing on the noise-containing speech signal to obtain a noise-reduced speech signal.
- the calculation module 303 is configured to calculate the spectral entropy ratio of the noise-reduced speech signal and calculate the short-term energy of the noise-reduced speech signal.
- the detection module 304 is configured to perform speech endpoint detection according to the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- the noise reduction module 302 may be used to: perform frame-by-frame windowing processing on the noisy speech signal to obtain a multi-frame windowed time-domain signal; Perform a Fourier transform on the windowed time-domain signal of each frame to obtain a multi-frame frequency domain signal; estimate the Fourier coefficients of the frequency domain signal of each frame; according to the Fourier coefficients of the frequency domain signal of each frame, for each frame frequency
- the domain signal is subjected to noise reduction processing to obtain a multi-frame noise reduction frequency domain signal.
- the calculation module 303 may be used to: calculate the energy spectrum of the noise reduction frequency domain signal per frame; calculate the spectral entropy of the noise reduction frequency domain signal per frame; according to the noise reduction frequency domain signal per frame The energy spectrum and the spectral entropy of the noise-reduced frequency domain signal per frame calculate the spectral entropy ratio of the noise-reduced frequency domain signal per frame to obtain the spectral entropy ratio of all frames of the noise-reduced speech signal.
- the calculation module 303 may also be used to: obtain frequency band information of the noise reduction frequency domain signal of each frame; divide the noise reduction frequency domain signal of each frame according to the frequency band information to obtain a reduction of each frame A plurality of sub-noise reduction frequency domain signals corresponding to the noise frequency domain signal; calculating an energy spectrum of each sub-noise reduction frequency domain signal of the plurality of sub-noise reduction frequency domain signals; calculating according to the energy spectrum of each sub-noise reduction frequency domain signal The energy spectrum of the noise-reduced frequency domain signal per frame.
- the calculation module 303 may be further used to calculate each sub-noise reduction frequency domain according to the energy spectrum of each sub-noise reduction frequency domain signal and the energy spectrum of the per-frame noise reduction frequency domain signal The normalized probability density of the signal; according to the normalized probability density of each sub-noise reduction frequency domain signal, the spectral entropy of the noise reduction frequency domain signal of each frame is calculated.
- the calculation module 303 can also be used to: perform inverse Fourier transform on the noise reduction frequency domain signal of each frame to obtain a multi-frame noise reduction time domain signal; Time energy, the short-term energy of all frames of the noise-reduced speech signal is obtained.
- the detection module 304 may be used to determine the position of the voice start point according to the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal And / or determine the position of the voice termination point based on the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal.
- the detection module 304 may be further configured to: based on the spectral entropy ratio of the first number of frames of the noise-reduced speech signal and the short-term energy of the first number of frames of the noise-reduced speech signal, No speech is detected, and according to the spectral entropy ratio of the second number of frames of the noise-reduced speech signal and the spectral entropy ratio of the second number of frames of the noise-reduced speech signal, if the presence of speech is detected, the second The position of the first frame in the number of frames is the position of the starting point of speech.
- the detection module 304 may be further configured to: based on the spectral entropy ratio of the third number of frames of the noise-reduced speech signal and the short-term energy of the third number of frames of the noise-reduced speech signal, The presence of speech is detected, and according to the spectral entropy ratio of the fourth number of frames of the noise-reduced speech signal and the short-term energy of the fourth number of frames of the noise-reduced speech signal, the absence of speech is detected, and the fourth is determined
- the position of the first frame in the number of frames is the position of the voice termination point.
- An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the process in the voice endpoint detection method provided in this embodiment .
- An embodiment of the present application also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is used to execute the computer program stored in the memory by executing the computer program Process in the voice endpoint detection method.
- the aforementioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
- FIG. 6, is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the mobile terminal 400 may include components such as a microphone 401, a memory 402, and a processor 403. Those skilled in the art may understand that the structure of the mobile terminal shown in FIG. 6 does not constitute a limitation on the mobile terminal, and may include more or fewer components than those illustrated, or combine certain components, or arrange different components.
- the microphone 401 can be used to pick up the voice uttered by the user and the like.
- the memory 402 may be used to store application programs and data.
- the application program stored in the memory 402 contains executable code.
- the application program can form various functional modules.
- the processor 403 executes application programs stored in the memory 402 to execute various functional applications and data processing.
- the processor 403 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 402 and calling data stored in the memory 402 Various functions and processing data to monitor the mobile terminal as a whole.
- the processor 403 in the mobile terminal loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 runs and stores the memory in the memory The application in 402, thereby implementing the process:
- FIG. 7 is a second schematic structural diagram of an electronic device according to an embodiment of the present application.
- the mobile terminal 500 may include components such as a microphone 501, a memory 502, a processor 503, an input unit 504, an output unit 505, a speaker 506, and the like.
- the microphone 501 can be used to pick up the voice uttered by the user and the like.
- the memory 502 may be used to store application programs and data.
- the application program stored in the memory 502 contains executable code.
- the application program can form various functional modules.
- the processor 503 executes application programs stored in the memory 502 to execute various functional applications and data processing.
- the processor 503 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 502 and calling data stored in the memory 502 Various functions and processing data to monitor the mobile terminal as a whole.
- the input unit 504 may be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
- user characteristic information such as fingerprints
- the output unit 505 can be used to display information input by the user or provided to the user and various graphical user interfaces of the mobile terminal. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
- the output unit may include a display panel.
- the processor 503 in the mobile terminal will load the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 will run the stored code in the memory The application in 502, thereby implementing the process:
- the processor 503 when the processor 503 executes the process of performing noise reduction processing on the noise-containing speech signal to obtain a noise-reduced speech signal, it may perform: performing frame-and-window processing on the noise-containing speech signal, Obtain multi-frame windowed time-domain signals; Fourier transform the windowed time-domain signals of each frame of the multi-frame windowed time-domain signals to obtain multi-frame frequency-domain signals; estimate the Fourier of each frame of frequency-domain signals Coefficient; according to the Fourier coefficient of the frequency domain signal of each frame, perform noise reduction processing on the frequency domain signal of each frame to obtain a multi-frame noise reduction frequency domain signal.
- the processor 503 when it executes the process of calculating the spectral entropy ratio of the noise-reduced speech signal, it may perform: calculating the energy spectrum of the noise-reducing frequency domain signal per frame; Spectral entropy of each frame; calculate the spectral entropy ratio of the noise reduction frequency domain signal per frame according to the energy spectrum of the noise reduction frequency domain signal per frame and the spectral entropy of the noise reduction frequency domain signal per frame to obtain the noise reduction speech signal The ratio of the spectral entropy of all frames.
- the processor 503 when the processor 503 executes the process of calculating the energy spectrum of the noise reduction frequency domain signal per frame, it may perform: acquiring frequency band information of the noise reduction frequency domain signal per frame; according to the frequency band information Divide the noise reduction frequency domain signal of each frame to obtain a plurality of sub-noise reduction frequency domain signals corresponding to each frame of the noise reduction frequency domain signal; calculate the energy spectrum of each sub-noise reduction frequency domain signal of the plurality of sub-noise reduction frequency domain signals; The energy spectrum of the noise reduction frequency domain signal of each frame is calculated according to the energy spectrum of each sub-noise reduction frequency domain signal.
- the processor 503 when the processor 503 executes the process of calculating the spectral entropy of the noise reduction frequency domain signal per frame, it may perform: according to the energy spectrum of each sub-noise reduction frequency domain signal and the noise reduction per frame The energy spectrum of the frequency domain signal, calculate the normalized probability density of each sub-noise reduction frequency domain signal; calculate the spectral entropy of the noise reduction frequency domain signal per frame according to the normalized probability density of each sub-noise reduction frequency domain signal .
- the processor 503 when the processor 503 executes the process of calculating the short-term energy of the noise-reduced speech signal, it may perform: performing an inverse Fourier transform on each frame of the noise-reduced frequency domain signal to obtain multi-frame noise reduction Time-domain signal; calculate the short-term energy of the noise-reduced time-domain signal of each frame to obtain the short-term energy of all frames of the noise-reduced speech signal.
- the processor 503 when the processor 503 executes the process of performing voice endpoint detection according to the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal, it may execute: according to the noise reduction Determine the position of the starting point of the speech by the spectral entropy ratio of all frames of the speech signal and the short-term energy of all frames of the noise-reduced speech signal; and / or according to the spectral entropy ratio of all frames of the noise-reduced speech signal and all Describe the short-term energy of all frames of the noise-reduced speech signal to determine the position of the speech termination point.
- the processor 503 executes the process of determining the position of the voice start point based on the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal , It can be performed: if according to the spectral entropy ratio of the first number of frames of the noise-reduced speech signal and the short-term energy of the first number of frames of the noise-reduced speech signal, it is detected that no speech exists, and according to the noise reduction The spectral entropy ratio of the second number of frames of the speech signal and the spectral entropy ratio of the second number of frames of the noise-reduced speech signal, if the presence of speech is detected, the position of the first frame in the second number of frames is determined to be speech The position of the starting point.
- the processor 503 executes the process of determining the position of the speech termination point based on the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal , It can be performed: if according to the spectral entropy ratio of the third number of frames of the noise-reduced speech signal and the short-term energy of the third number of frames of the noise-reduced speech signal, the presence of speech is detected, and according to the noise reduction The spectral entropy ratio of the fourth number of frames of the speech signal and the short-term energy of the fourth number of frames of the noise-reduced speech signal, if no speech is detected, it is determined that the position of the first frame in the fourth number of frames is speech The location of the end point.
- the voice endpoint detection device provided by the embodiment of the present application and the voice endpoint detection method in the above embodiments belong to the same concept, and any of the voice endpoint detection method embodiments provided on the voice endpoint detection device can be run on the voice endpoint detection device
- any of the voice endpoint detection method embodiments provided on the voice endpoint detection device can be run on the voice endpoint detection device
- the voice endpoint detection method described in the embodiments of the present application a person of ordinary skill in the art can understand that all or part of the process of implementing the voice endpoint detection method described in the embodiments of the present application can be controlled by a computer program.
- the computer program may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, during the execution process may include the implementation of the voice endpoint detection method Example process.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and so on.
- each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module.
- the above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk, etc. .
Abstract
Description
Claims (20)
- 一种语音端点检测方法,其中,包括:A voice endpoint detection method, including:获取含噪语音信号;Obtain noisy speech signals;对所述含噪语音信号进行降噪处理,得到降噪语音信号;Performing noise reduction processing on the noise-containing voice signal to obtain a noise-reduced voice signal;计算所述降噪语音信号的谱熵比值,并计算所述降噪语音信号的短时能量;Calculating the spectral entropy ratio of the noise-reduced speech signal, and calculating the short-term energy of the noise-reduced speech signal;根据所述降噪语音信号的谱熵比值和所述降噪语音信号的短时能量进行语音端点检测。Perform speech endpoint detection according to the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- 根据权利要求1所述的语音端点检测方法,其中,所述对所述含噪语音信号进行降噪处理,得到降噪语音信号,包括:The method for detecting a voice endpoint according to claim 1, wherein the performing noise reduction processing on the noise-containing speech signal to obtain a noise-reduced speech signal includes:对所述含噪语音信号进行分帧加窗处理,得到多帧加窗时域信号;Performing frame-by-frame windowing on the noisy speech signal to obtain a multi-frame windowed time-domain signal;对所述多帧加窗时域信号的每帧加窗时域信号进行傅里叶变换,得到多帧频域信号;Performing a Fourier transform on each frame of the multi-frame windowed time-domain signal to obtain a multi-frame frequency domain signal;估算每帧频域信号的傅里叶系数;Estimate the Fourier coefficient of each frame frequency domain signal;根据所述每帧频域信号的傅里叶系数,对每帧频域信号进行降噪处理,得到多帧降噪频域信号。According to the Fourier coefficients of the frequency domain signal of each frame, performing noise reduction processing on the frequency domain signal of each frame to obtain a multi-frame noise reduction frequency domain signal.
- 根据权利要求2所述的语音端点检测方法,其中,所述计算所述降噪语音信号的谱熵比值,包括:The method for detecting a voice endpoint according to claim 2, wherein the calculating the spectral entropy ratio of the noise-reduced voice signal includes:计算每帧降噪频域信号的能量谱;Calculate the energy spectrum of the noise-reduced frequency domain signal per frame;计算每帧降噪频域信号的谱熵;Calculate the spectral entropy of the noise-reduced frequency domain signal per frame;根据所述每帧降噪频域信号的能量谱和所述每帧降噪频域信号的谱熵计算每帧降噪频域信号的谱熵比值,得到所述降噪语音信号的所有帧的谱熵比值。Calculate the spectral entropy ratio of the noise reduction frequency domain signal of each frame according to the energy spectrum of the noise reduction frequency domain signal of each frame and the spectral entropy of the noise reduction frequency domain signal of each frame, to obtain the Spectral entropy ratio.
- 根据权利要求3所述的语音端点检测方法,其中,所述计算每帧降噪频域信号的能量谱,包括:The voice endpoint detection method according to claim 3, wherein the calculating the energy spectrum of the noise-reduced frequency domain signal per frame includes:获取每帧降噪频域信号的频带信息;Obtain the frequency band information of the noise reduction frequency domain signal of each frame;根据所述频带信息对所述每帧降噪频域信号进行划分,得到每帧降噪频域信号对应的多个子降噪频域信号;Dividing the noise reduction frequency domain signal of each frame according to the frequency band information to obtain multiple sub-noise reduction frequency domain signals corresponding to the noise reduction frequency domain signal of each frame;计算所述多个子降噪频域信号的每个子降噪频域信号的能量谱;Calculating the energy spectrum of each sub-noise reduction frequency domain signal of the plurality of sub-noise reduction frequency domain signals;根据所述每个子降噪频域信号的能量谱计算每帧降噪频域信号的能量谱。The energy spectrum of the noise reduction frequency domain signal of each frame is calculated according to the energy spectrum of each sub-noise reduction frequency domain signal.
- 根据权利要求4所述的语音端点检测方法,其中,所述计算每帧降噪频域信号的谱熵,包括:The speech endpoint detection method according to claim 4, wherein the calculating the spectral entropy of the noise-reduced frequency domain signal per frame includes:根据所述每个子降噪频域信号的能量谱及所述每帧降噪频域信号的能量谱,计算每个子降噪频域信号的归一化概率密度;Calculating the normalized probability density of each sub-noise reduction frequency domain signal according to the energy spectrum of each sub-noise reduction frequency domain signal and the energy spectrum of the per-frame noise reduction frequency domain signal;根据所述每个子降噪频域信号的归一化概率密度,计算每帧降噪频域信号的谱熵。According to the normalized probability density of each sub-noise reduction frequency domain signal, the spectral entropy of the noise reduction frequency domain signal of each frame is calculated.
- 根据权利要求3所述的语音端点检测方法,其中,所述计算所述降噪语音信号的短时能量,包括:The voice endpoint detection method according to claim 3, wherein the calculation of the short-term energy of the noise-reduced voice signal includes:对每帧降噪频域信号进行逆傅里叶变换,得到多帧降噪时域信号;Perform inverse Fourier transform on each frame of noise reduction frequency domain signal to obtain multiframe noise reduction time domain signal;计算每帧降噪时域信号的短时能量,得到所述降噪语音信号的所有帧的短时能量。Calculate the short-term energy of the noise reduction time-domain signal of each frame to obtain the short-term energy of all frames of the noise reduction speech signal.
- 根据权利要求6所述的语音端点检测方法,其中,所述根据所述降噪语音信号的谱熵比值和所述降噪语音信号的短时能量进行语音端点检测,包括:The speech endpoint detection method according to claim 6, wherein the speech endpoint detection based on the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal includes:根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音起始点的位置;和/或Determine the position of the speech starting point according to the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal; and / or根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音终止点的位置。The position of the speech termination point is determined according to the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal.
- 根据权利要求7所述的语音端点检测方法,其中,所述根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音起始点的位置,包括:The method for detecting a voice endpoint according to claim 7, wherein said determining the starting point of the voice based on the spectral entropy ratio of all frames of the noise-reduced voice signal and the short-term energy of all frames of the noise-reduced voice signal Location, including:若根据所述降噪语音信号的第一数量帧的谱熵比值和所述降噪语音信号的第一数量帧的短时能量,检测到无语音存在,且根据所述降噪语音信号的第二数量帧的谱熵比值和所述降噪语音信号的第二数量帧的谱熵比值,检测到有语音存在,则确定第二数量帧中的第一帧所在的位置为语音起始点的位置。If according to the spectral entropy ratio of the first number of frames of the noise-reduced speech signal and the short-term energy of the first number of frames of the noise-reduced speech signal, it is detected that no speech exists, and according to the first The spectral entropy ratio of the second number of frames and the spectral entropy ratio of the second number of frames of the noise-reduced speech signal, if the presence of speech is detected, it is determined that the position of the first frame in the second number of frames is the position of the starting point of the speech .
- 根据权利要求7所述的语音端点检测方法,其中,所述根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音终止点的位置,包括:The method of detecting a voice endpoint according to claim 7, wherein the determining of the voice termination point is based on the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal Location, including:若根据所述降噪语音信号的第三数量帧的谱熵比值和所述降噪语音信号的第三数量帧的短时能量,检测到有语音存在,且根据所述降噪语音信号的第四数量帧的谱熵比值和所述降噪语音信号的第四数量帧的短时能量,检测到无语音存在,则确定第四数量帧中的第一帧所在的位置为语音终止点的位置。If the presence of speech is detected according to the spectral entropy ratio of the third number of frames of the noise-reduced speech signal and the short-term energy of the third number of frames of the noise-reduced speech signal, and according to the third The spectral entropy ratio of the fourth number of frames and the short-term energy of the fourth number of frames of the noise-reduced speech signal, if no speech is detected, it is determined that the position of the first frame in the fourth number of frames is the position of the voice termination point .
- 一种语音端点检测装置,其中,包括:A voice endpoint detection device, including:获取模块,用于获取含噪语音信号;Acquisition module for acquiring noisy speech signals;降噪模块,用于对所述含噪语音信号进行降噪处理,得到降噪语音信号;A noise reduction module, configured to perform noise reduction processing on the noisy speech signal to obtain a noise-reduced speech signal;计算模块,用于计算所述降噪语音信号的谱熵比值,并计算所述降噪语音信号的短时能量;A calculation module, used to calculate the spectral entropy ratio of the noise-reduced speech signal, and calculate the short-term energy of the noise-reduced speech signal;检测模块,用于根据所述降噪语音信号的谱熵比值和所述降噪语音信号的短时能量进行语音端点检测。The detection module is configured to perform speech endpoint detection based on the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- 一种存储介质,其中,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行权利要求1至9任一项所述的语音端点检测方法。A storage medium, wherein a computer program is stored in the storage medium, and when the computer program is run on a computer, the computer is caused to execute the voice endpoint detection method according to any one of claims 1 to 9.
- 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:An electronic device, wherein the electronic device includes a processor and a memory, a computer program is stored in the memory, and the processor is used to execute the computer program by calling the computer program stored in the memory:获取含噪语音信号;Obtain noisy speech signals;对所述含噪语音信号进行降噪处理,得到降噪语音信号;Performing noise reduction processing on the noise-containing voice signal to obtain a noise-reduced voice signal;计算所述降噪语音信号的谱熵比值,并计算所述降噪语音信号的短时能量;Calculating the spectral entropy ratio of the noise-reduced speech signal, and calculating the short-term energy of the noise-reduced speech signal;根据所述降噪语音信号的谱熵比值和所述降噪语音信号的短时能量进行语音端点检测。Perform speech endpoint detection according to the spectral entropy ratio of the noise-reduced speech signal and the short-term energy of the noise-reduced speech signal.
- 根据权利要求12所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 12, wherein the processor is configured to execute:对所述含噪语音信号进行分帧加窗处理,得到多帧加窗时域信号;Performing frame-by-frame windowing on the noisy speech signal to obtain a multi-frame windowed time-domain signal;对所述多帧加窗时域信号的每帧加窗时域信号进行傅里叶变换,得到多帧频域信号;Performing a Fourier transform on each frame of the multi-frame windowed time-domain signal to obtain a multi-frame frequency domain signal;估算每帧频域信号的傅里叶系数;Estimate the Fourier coefficient of each frame frequency domain signal;根据所述每帧频域信号的傅里叶系数,对每帧频域信号进行降噪处理,得到多帧降噪频域信号。According to the Fourier coefficients of the frequency domain signal of each frame, performing noise reduction processing on the frequency domain signal of each frame to obtain a multi-frame noise reduction frequency domain signal.
- 根据权利要求13所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 13, wherein the processor is configured to execute:计算每帧降噪频域信号的能量谱;Calculate the energy spectrum of the noise-reduced frequency domain signal per frame;计算每帧降噪频域信号的谱熵;Calculate the spectral entropy of the noise-reduced frequency domain signal per frame;根据所述每帧降噪频域信号的能量谱和所述每帧降噪频域信号的谱熵计算每帧降噪频域信号的谱熵比值,得到所述降噪语音信号的所有帧的谱熵比值。Calculate the spectral entropy ratio of the noise reduction frequency domain signal of each frame according to the energy spectrum of the noise reduction frequency domain signal of each frame and the spectral entropy of the noise reduction frequency domain signal of each frame, to obtain the Spectral entropy ratio.
- 根据权利要求14所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 14, wherein the processor is configured to execute:获取每帧降噪频域信号的频带信息;Obtain the frequency band information of the noise reduction frequency domain signal of each frame;根据所述频带信息对所述每帧降噪频域信号进行划分,得到每帧降噪频域信号对应的多个子降噪频域信号;Dividing the noise reduction frequency domain signal of each frame according to the frequency band information to obtain multiple sub-noise reduction frequency domain signals corresponding to the noise reduction frequency domain signal of each frame;计算所述多个子降噪频域信号的每个子降噪频域信号的能量谱;Calculating the energy spectrum of each sub-noise reduction frequency domain signal of the plurality of sub-noise reduction frequency domain signals;根据所述每个子降噪频域信号的能量谱计算每帧降噪频域信号的能量谱。The energy spectrum of the noise reduction frequency domain signal of each frame is calculated according to the energy spectrum of each sub-noise reduction frequency domain signal.
- 根据权利要求15所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 15, wherein the processor is configured to execute:根据所述每个子降噪频域信号的能量谱及所述每帧降噪频域信号的能量谱,计算每个子降噪频域信号的归一化概率密度;Calculating the normalized probability density of each sub-noise reduction frequency domain signal according to the energy spectrum of each sub-noise reduction frequency domain signal and the energy spectrum of the per-frame noise reduction frequency domain signal;根据所述每个子降噪频域信号的归一化概率密度,计算每帧降噪频域信号的谱熵。According to the normalized probability density of each sub-noise reduction frequency domain signal, the spectral entropy of the noise reduction frequency domain signal of each frame is calculated.
- 根据权利要求14所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 14, wherein the processor is configured to execute:对每帧降噪频域信号进行逆傅里叶变换,得到多帧降噪时域信号;Perform inverse Fourier transform on each frame of noise reduction frequency domain signal to obtain multiframe noise reduction time domain signal;计算每帧降噪时域信号的短时能量,得到所述降噪语音信号的所有帧的短时能量。Calculate the short-term energy of the noise reduction time-domain signal of each frame to obtain the short-term energy of all frames of the noise reduction speech signal.
- 根据权利要求17所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 17, wherein the processor is configured to execute:根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音起始点的位置;和/或Determine the position of the speech starting point according to the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal; and / or根据所述降噪语音信号的所有帧的谱熵比值和所述降噪语音信号的所有帧的短时能量,确定语音终止点的位置。The position of the speech termination point is determined according to the spectral entropy ratio of all frames of the noise-reduced speech signal and the short-term energy of all frames of the noise-reduced speech signal.
- 根据权利要求18所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 18, wherein the processor is configured to execute:若根据所述降噪语音信号的第一数量帧的谱熵比值和所述降噪语音信号的第一数量帧的短时能量,检测到无语音存在,且根据所述降噪语音信号的第二数量帧的谱熵比值和所述降噪语音信号的第二数量帧的谱熵比值,检测到有语音存在,则确定第二数量帧中的第一帧所在的位置为语音起始点的位置。If according to the spectral entropy ratio of the first number of frames of the noise-reduced speech signal and the short-term energy of the first number of frames of the noise-reduced speech signal, it is detected that no speech exists, and according to the first The spectral entropy ratio of the second number of frames and the spectral entropy ratio of the second number of frames of the noise-reduced speech signal, if the presence of speech is detected, it is determined that the position of the first frame in the second number of frames is the position of the starting point of the speech .
- 根据权利要求18所述的电子设备,其中,所述处理器用于执行:The electronic device according to claim 18, wherein the processor is configured to execute:若根据所述降噪语音信号的第三数量帧的谱熵比值和所述降噪语音信号的第三数量帧的短时能量,检测到有语音存在,且根据所述降噪语音信号的第四数量帧的谱熵比值和所述降噪语音信号的第四数量帧的短时能量,检测到无语音存在,则确定第四数量帧中的第一帧所在的位置为语音终止点的位置。If the presence of speech is detected according to the spectral entropy ratio of the third number of frames of the noise-reduced speech signal and the short-term energy of the third number of frames of the noise-reduced speech signal, and according to the third The spectral entropy ratio of the fourth number of frames and the short-term energy of the fourth number of frames of the noise-reduced speech signal, if no speech is detected, it is determined that the position of the first frame in the fourth number of frames is the position of the voice termination point .
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880097699.4A CN112955951A (en) | 2018-11-15 | 2018-11-15 | Voice endpoint detection method and device, storage medium and electronic equipment |
PCT/CN2018/115601 WO2020097841A1 (en) | 2018-11-15 | 2018-11-15 | Voice activity detection method and apparatus, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/115601 WO2020097841A1 (en) | 2018-11-15 | 2018-11-15 | Voice activity detection method and apparatus, storage medium and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020097841A1 true WO2020097841A1 (en) | 2020-05-22 |
Family
ID=70731178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/115601 WO2020097841A1 (en) | 2018-11-15 | 2018-11-15 | Voice activity detection method and apparatus, storage medium and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112955951A (en) |
WO (1) | WO2020097841A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050216261A1 (en) * | 2004-03-26 | 2005-09-29 | Canon Kabushiki Kaisha | Signal processing apparatus and method |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN107731223A (en) * | 2017-11-22 | 2018-02-23 | 腾讯科技(深圳)有限公司 | Voice activity detection method, relevant apparatus and equipment |
CN107910017A (en) * | 2017-12-19 | 2018-04-13 | 河海大学 | A kind of method that threshold value is set in noisy speech end-point detection |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5732976B2 (en) * | 2011-03-31 | 2015-06-10 | 沖電気工業株式会社 | Speech segment determination device, speech segment determination method, and program |
CN104810024A (en) * | 2014-01-28 | 2015-07-29 | 上海力声特医学科技有限公司 | Double-path microphone speech noise reduction treatment method and system |
CN105023572A (en) * | 2014-04-16 | 2015-11-04 | 王景芳 | Noised voice end point robustness detection method |
CN105825871B (en) * | 2016-03-16 | 2019-07-30 | 大连理工大学 | A kind of end-point detecting method without leading mute section of voice |
CN106653062A (en) * | 2017-02-17 | 2017-05-10 | 重庆邮电大学 | Spectrum-entropy improvement based speech endpoint detection method in low signal-to-noise ratio environment |
CN108428456A (en) * | 2018-03-29 | 2018-08-21 | 浙江凯池电子科技有限公司 | Voice de-noising algorithm |
-
2018
- 2018-11-15 CN CN201880097699.4A patent/CN112955951A/en active Pending
- 2018-11-15 WO PCT/CN2018/115601 patent/WO2020097841A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050216261A1 (en) * | 2004-03-26 | 2005-09-29 | Canon Kabushiki Kaisha | Signal processing apparatus and method |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN107731223A (en) * | 2017-11-22 | 2018-02-23 | 腾讯科技(深圳)有限公司 | Voice activity detection method, relevant apparatus and equipment |
CN107910017A (en) * | 2017-12-19 | 2018-04-13 | 河海大学 | A kind of method that threshold value is set in noisy speech end-point detection |
Also Published As
Publication number | Publication date |
---|---|
CN112955951A (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210327448A1 (en) | Speech noise reduction method and apparatus, computing device, and computer-readable storage medium | |
WO2019101123A1 (en) | Voice activity detection method, related device, and apparatus | |
US10504539B2 (en) | Voice activity detection systems and methods | |
US20230298610A1 (en) | Noise suppression method and apparatus for quickly calculating speech presence probability, and storage medium and terminal | |
WO2021139327A1 (en) | Audio signal processing method, model training method, and related apparatus | |
WO2012158156A1 (en) | Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood | |
US9374651B2 (en) | Sensitivity calibration method and audio device | |
WO2022105570A1 (en) | Speech endpoint detection method, apparatus and device, and computer readable storage medium | |
US10839820B2 (en) | Voice processing method, apparatus, device and storage medium | |
CN110648687B (en) | Activity voice detection method and system | |
CN110875049B (en) | Voice signal processing method and device | |
WO2022218254A1 (en) | Voice signal enhancement method and apparatus, and electronic device | |
CN110503973B (en) | Audio signal transient noise suppression method, system and storage medium | |
US11915718B2 (en) | Position detection method, apparatus, electronic device and computer readable storage medium | |
WO2024041512A1 (en) | Audio noise reduction method and apparatus, and electronic device and readable storage medium | |
WO2017128910A1 (en) | Method, apparatus and electronic device for determining speech presence probability | |
US20230223014A1 (en) | Adapting Automated Speech Recognition Parameters Based on Hotword Properties | |
WO2020097841A1 (en) | Voice activity detection method and apparatus, storage medium and electronic device | |
US11922933B2 (en) | Voice processing device and voice processing method | |
CN112216285A (en) | Multi-person session detection method, system, mobile terminal and storage medium | |
CN113470621B (en) | Voice detection method, device, medium and electronic equipment | |
TWI756817B (en) | Voice activity detection device and method | |
US20230046518A1 (en) | Howling suppression method and apparatus, computer device, and storage medium | |
CN116913306A (en) | Voice enhancement method and device and electronic equipment | |
CN116364106A (en) | Voice detection method, device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18939865 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18939865 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.09.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18939865 Country of ref document: EP Kind code of ref document: A1 |