WO2024016793A1 - 语音信号的处理方法、装置、设备及计算机可读存储介质 - Google Patents

语音信号的处理方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2024016793A1
WO2024016793A1 PCT/CN2023/092935 CN2023092935W WO2024016793A1 WO 2024016793 A1 WO2024016793 A1 WO 2024016793A1 CN 2023092935 W CN2023092935 W CN 2023092935W WO 2024016793 A1 WO2024016793 A1 WO 2024016793A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speech signal
sound source
frequency domain
original
Prior art date
Application number
PCT/CN2023/092935
Other languages
English (en)
French (fr)
Inventor
陈俊彬
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2024016793A1 publication Critical patent/WO2024016793A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the present application relates to the field of speech processing technology, and specifically to a speech signal processing method, device, equipment and computer-readable storage medium.
  • Embodiments of the present application provide a speech signal processing method, device, equipment and computer-readable storage medium, which can improve the accuracy of speech recognition.
  • embodiments of the present application provide a speech signal processing method, including:
  • embodiments of the present application also provide a voice signal processing device, including:
  • the sound source positioning module is used to determine the first direction information of the target sound source based on the original speech signal collected by the microphone array;
  • a signal separation module configured to separate the first speech signal corresponding to the target sound source from the original speech signal according to the independent vector analysis algorithm and the first direction information;
  • a noise identification module configured to determine a noise signal from the original speech signal according to the first direction information and the parameter information of the microphone array
  • a voice noise reduction module is used to perform noise reduction processing on the first voice signal according to the noise signal to obtain to the second speech signal.
  • embodiments of the present application further provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the method provided by any embodiment of the present application is implemented. Steps in a speech signal processing method.
  • embodiments of the present application further provide an electronic device.
  • the electronic device includes a processor, a memory, and a computer program stored in the memory and executable on the processor.
  • the processor executes the The computer program implements the steps in the speech signal processing method provided in any embodiment of the present application.
  • Figure 1 is a schematic flowchart of a first method for processing a speech signal provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a speech signal processing device provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • an embodiment means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
  • Embodiments of the present application provide a voice signal processing method, including:
  • determining the first direction information of the target sound source based on the original speech signal collected by the microphone array includes:
  • the frequency domain signal is subjected to sound source positioning processing according to a sound source positioning algorithm to determine the first direction information of the target sound source.
  • performing time-frequency conversion on the original speech signal to obtain a frequency domain signal includes:
  • time-frequency conversion processing is performed on the speech signal to obtain a frequency domain signal corresponding to one frame.
  • the first direction information is a first direction vector; and according to an independent vector analysis algorithm and the first direction information, a third sound source corresponding to the target sound source is separated from the original speech signal.
  • a voice signal including:
  • the frequency domain signal is separated and processed according to an independent vector analysis algorithm to obtain a first speech signal corresponding to the target sound source.
  • the frequency domain signal is separated and processed according to an independent vector analysis algorithm to obtain the first speech signal corresponding to the target sound source, including:
  • the first speech signal corresponding to the target sound source is calculated.
  • the parameter information includes an arrangement; and determining the noise signal from the original speech signal according to the first direction information and the parameter information of the microphone array includes:
  • the frequency domain signal is beamformed according to the second direction vector to obtain a noise signal.
  • performing noise reduction processing on the first speech signal according to the noise signal to obtain a second speech signal includes:
  • the adaptive filter determined according to the coefficient performs noise reduction processing on the first speech signal to obtain a second speech signal.
  • the method before performing time-frequency conversion on the original speech signal to obtain a frequency domain signal, the method further includes:
  • a step of performing time-frequency conversion on the original voice signal to obtain a frequency domain signal is performed.
  • the technical solution provided by the embodiment of this application first determines the first direction information of the target sound source based on the original speech signal collected by the microphone array, and then separates the target sound source from the original speech signal according to the independent vector analysis algorithm and the first direction information.
  • the corresponding first speech signal is then determined from the original speech signal based on the first direction information and the parameter information of the microphone array, and then the first speech signal is denoised according to the noise signal to obtain the second speech signal.
  • this scheme combines the direction information of the target source as a constraint, which enhances the stability and accuracy of the output of the independent vector analysis algorithm and avoids the output of pure noise signals.
  • the obtained noise signal is used to perform further noise reduction processing on the first speech signal to obtain a purer signal of the target sound source, thereby improving the accuracy of speech recognition.
  • Embodiments of the present application provide a voice signal processing method.
  • the execution subject of the voice signal processing method may be an electronic device.
  • electronic devices can be terminal devices with voice control functions such as smartphones and tablets; they can also be smart home appliances with voice control functions such as smart refrigerators and smart air conditioners; they can also be wireless speakers and smart projectors. and other devices with voice control capabilities.
  • FIG. 1 is a schematic flowchart of a first method for processing a speech signal provided by an embodiment of the present application.
  • the specific process of the speech signal processing method provided by the embodiment of the present application can be as follows:
  • the electronic device is equipped with two or more microphones. These microphones form a microphone array, and when the electronic device is in a running state, the voice signals around the electronic device are collected.
  • an electronic device collects voice signals through a microphone array, in addition to the voice signals emitted by the user, it will also collect other interfering voice signals in the surrounding environment, such as environmental noise, sounds from other people, etc., that is, Said that the original speech signal collected by the microphone array is a mixed signal.
  • These signals will affect the accuracy of the electronic device's recognition of the user's target voice signal. Therefore, after the electronic device detects the original voice signal, it needs to separate and process the original voice signal to determine the voice signal of the target sound source. , that is, the voice signal sent by the user for controlling the electronic device is determined from this mixed voice signal.
  • the electronic device stores the voice signals collected by the microphone array into the buffer y frame by frame in time sequence.
  • y m (1,t) y m (2,t)
  • y m (2,t) y m (3,t)
  • y m (L-1,t) y m (L,t)
  • y m (L,t) x m (t).
  • the electronic device acquires multiple frames of time domain signals in time sequence from the buffer y for processing.
  • the first direction information of the target sound source can be determined from it according to the sound source localization algorithm. For example, in one embodiment, determining the first direction information of the target sound source based on the original speech signal collected by the microphone array includes: obtaining the original speech signal collected by the microphone array, where the original speech signal is a time domain signal; Perform time-frequency conversion to obtain a frequency domain signal; perform sound source positioning processing on the frequency domain signal according to the sound source positioning algorithm to determine the first direction information of the target sound source.
  • performing time-frequency conversion on the original speech signal to obtain the frequency domain signal includes: performing frame processing on the original speech signal to obtain continuous multi-frame speech signals arranged in the order of reception time; for each For a frame of speech signal, perform time-frequency conversion processing on the speech signal to obtain a frequency domain signal corresponding to one frame.
  • the time domain signal is reframed and windowed, for example, the frame length is T′ and the number of frames is L′.
  • the multi-frame time domain speech signal is subjected to time-frequency conversion processing to obtain the frequency domain signal.
  • FFT Fast Fourier Transform
  • the method before performing time-frequency conversion on the original voice signal to obtain the frequency domain signal, the method further includes: detecting whether there is a preset wake-up word in the original voice signal; when the original voice signal is detected When the preset wake-up word exists in the speech signal, the step of performing time-frequency conversion on the original speech signal to obtain a frequency domain signal is performed.
  • a voice control scenario users generally need to wake up the voice system using a wake-up word before the voice system can respond to subsequent voice commands.
  • the direction vector corresponding to the maximum peak value is determined as the first direction information of the target sound source, and is recorded as the first direction vector d speech .
  • the independent vector analysis algorithm and the first direction information separate the first speech signal corresponding to the target sound source from the original speech signal.
  • the traditional overIVA (overdetermined Independent Vector Analysis) algorithm separates mixed speech signals, but it is difficult to determine whether the separated speech is the target speech or the interference speech.
  • the solution of the embodiment of the present application determines the direction information characterizing the target sound source. After the first direction vector, the first direction vector is used as a constraint condition of the independent vector analysis algorithm to perform signal separation processing.
  • the first direction information is a first direction vector; according to the independent vector analysis algorithm and the first direction information, the first speech signal corresponding to the target sound source is separated from the original speech signal, including: calculating the first direction The vector corresponds to the steering vector at each frequency point of the frequency domain signal; based on the steering vector, the frequency domain signal is separated and processed according to the independent vector analysis algorithm to obtain the first speech signal corresponding to the target sound source.
  • the frequency domain signal contains multiple frequency points, and the steering vector corresponding to each frequency point is calculated according to the first direction vector, and then in the process of separating and processing the frequency domain signal according to the independent vector analysis algorithm, the steering vector is used to The algorithm is constrained so that the final separated signal is the signal corresponding to the target sound source, rather than other interference signals.
  • the first speech signal corresponding to the target sound source separated from the original speech signal is
  • the frequency domain signal is separated and processed according to the independent vector analysis algorithm to obtain the first speech signal corresponding to the target sound source, including: calculating the first auxiliary parameter matrix of the independent vector analysis algorithm based on the frequency domain signal; Modify the first auxiliary parameter matrix according to the steering vector to obtain the second auxiliary parameter matrix; calculate the target sound source subspace according to the second auxiliary parameter matrix; determine the separation matrix of the target sound source according to the target sound source subspace; according to The frequency domain signal and separation matrix are used to calculate the first speech signal corresponding to the target sound source.
  • the frequency domain signal includes K frequency points, and ⁇ (k,d speech ) is the steering vector corresponding to the first direction vector d speech at the k-th frequency point among the K frequency points.
  • ⁇ (k,d speech ) is the steering vector corresponding to the first direction vector d speech at the k-th frequency point among the K frequency points.
  • the steering vector corresponding to d speech at the kth frequency point can be obtained
  • ( ⁇ ) H represents conjugate transpose.
  • W bp (l,k) is The first row of , where W bp (l,k) is a 1 ⁇ M matrix, is an M ⁇ M matrix. Therefore, to calculate the separated first speech signal, you need to calculate
  • A(l,k) is an M ⁇ M diagonal matrix, and the elements on its diagonal elements are Find the elements on the diagonal after the inversion.
  • W(l,k) is a 1 ⁇ M matrix, representing the target sound source subspace.
  • the first element of its initial value W(0,k) is 1, and the elements at other positions are zero.
  • the function of this matrix is to separate the target sound source from M inputs.
  • U(l,k) [U 1 (l,k); U 2 (l,k);...; U M-1 (l,k)].
  • U(l,k) is a (M-1) ⁇ M matrix.
  • the solution of the embodiment of the present application does not need to wait until the complete speech signal is collected before the speech signal can be separated. Every time a new frame of speech signal is stored in the buffer y, the latest frame can be read from the buffer y.
  • Frame speech signal using an iterative method to perform a series of subsequent speech signal processing. For example, when separating signals according to the independent vector analysis algorithm, when calculating the relevant data of the l-th frame, the data of the l-1th frame can be combined.
  • the formula for calculating the first auxiliary parameter matrix above is an iterative operation, where, The elements on the diagonal of the initial value V(l,k) of V(0,k) are 1, and the elements at other positions are zero.
  • is a forgetting factor with a value ranging from 0-1.
  • X(l,k) is the frequency domain signal
  • ( ⁇ ) H represents the conjugate transpose.
  • the first auxiliary parameter matrix is modified according to the steering vector calculated above to obtain the second auxiliary parameter matrix, so that the separation matrix calculated using the second auxiliary parameter matrix can separate the target sound source from the mixed speech signal. signal rather than other interfering signals.
  • the second auxiliary parameter matrix D(l,k) V(l,k)+ ⁇ (k,d speec ) ⁇ H (k,d speec ), where ⁇ is a preset constant used to adjust the guidance The degree of participation of vectors in the independent vector analysis algorithm.
  • ⁇ (l,k) PH (l,k)D(l,k)P(l,k)
  • ⁇ (l,k) PH (l,k)D(l,k)Q (l,k)
  • Q(l,k) ⁇ D -1 (l,k) ⁇ (k,d speech ).
  • is also a preset constant used to adjust the degree of participation of the steering vector in the independent vector analysis algorithm.
  • J(l,k) (A 2 C(l,k)W H (l,k))(A 1 C(l,k)W H (l,k)) -1 .
  • C(l,k) ⁇ C(l-1,k)+(1- ⁇ )X(l,k)X H (l,k), ⁇ is the forgetting factor, the value is between 0 and 1, for example , in one embodiment, ⁇ is set to 0.95, and the initial value of C(l,k) is C(0,k), which can be set to a zero matrix.
  • W bp (l,k) after W bp (l,k) is obtained, it can be normalized, thereby improving the stability of the output of the independent vector analysis algorithm and the noise reduction effect after the algorithm converges.
  • W bp (l,k) [W bp,1 (l,k),W bp,2 (l,k),...,W bp,M (l,k)]
  • P bp (l,k)
  • the first speech signal of the target sound source is obtained. Due to the existence of environmental noise, there may also be some noise signals in the first speech signal.
  • Denoising processing can further identify noise signals from the original speech signal. Then, noise reduction processing is performed on the first speech signal based on the noise signal.
  • the parameter information includes an arrangement; determining the noise signal from the original speech signal according to the first direction information and the parameter information of the microphone array includes: according to the microphone The arrangement of the array and the first direction information determine the second direction vector of the noise signal; the frequency domain signal is beamformed according to the second direction vector to obtain the noise signal.
  • the maximum azimuth angle and pitch angle are different for different formations.
  • the first direction vector of the target sound source has been calculated above. In spatial coordinates, each direction vector can be decomposed into azimuth angle and pitch angle.
  • the direction can also be deduced when the azimuth angle and pitch angle are known. vector.
  • the arrangement of the microphone array is determined, and the component of the target sound source contained in the speech signal in the direction with the largest angle to the target sound source is the smallest. Based on this principle, it can The speech signal in the direction with the largest angle to the target sound source is regarded as the noise signal.
  • the maximum azimuth angle and the maximum pitch angle between the noise signal and the speech signal of the target sound source are known.
  • the azimuth angle and pitch angle corresponding to the target sound source can also be calculated based on the first direction vector.
  • the second direction vector d noise corresponding to the noise signal can be calculated based on the maximum azimuth angle, the maximum pitch angle, and the first direction vector between the noise signal and the speech signal of the target sound source.
  • ⁇ (k,d noise ) is the steering vector corresponding to d noise at the k-th frequency point.
  • the original speech signal is beamformed according to the direction vector, and the speech signal at the direction vector is obtained as a noise signal.
  • super-directional beamforming may be used to obtain the beam output signal B(l,k).
  • the corresponding noise signal of the multi-frame frequency domain signal is used as the input signal of the adaptive filter, and an iterative operation is performed according to the minimum mean square error algorithm to determine the coefficients of the adaptive filter; the adaptive filtering determined based on the coefficients
  • the device performs noise reduction processing on the first voice signal to obtain a second voice signal.
  • the first speech signal calculated above is Z(l,k)
  • l is the frame index
  • k is the frequency index
  • k 1,2,...,K
  • K is the number of frequency points
  • E(l,k) is the adaptive filtered signal
  • B(l,k ) is the history cache of B(l,k).
  • B(l,k) [B(l,k),B(l-1,k),...,B(l-ORD+1,k)], ORD is the number of cached frames.
  • W NLMS (l,k) is the coefficient of the adaptive filter.
  • the iteration method of this coefficient is as follows:
  • is the step size adjustment factor
  • * indicates conjugation.
  • the present application is not limited by the execution order of each described step. Certain steps may also be performed in other orders or at the same time if no conflict occurs.
  • the speech signal processing method first determines the first direction information of the target sound source based on the original speech signal collected by the microphone array, and then uses the independent vector analysis algorithm and the first direction information to obtain the first direction information from the original speech sound. Separate the first speech signal corresponding to the target sound source from the signal, then determine the noise signal from the original speech signal based on the first direction information and the parameter information of the microphone array, and then perform denoising on the first speech signal based on the noise signal. Process to obtain the second voice signal.
  • this scheme combines the direction information of the target source as a constraint, which enhances the stability and accuracy of the output of the independent vector analysis algorithm and avoids the output of pure noise signals.
  • the obtained noise signal is used to perform further noise reduction processing on the first speech signal to obtain a purer signal of the target sound source, thereby improving the accuracy of speech recognition.
  • a voice signal processing device is also provided.
  • FIG. 2 is a schematic structural diagram of a voice signal processing device 300 provided by an embodiment of the present application.
  • the voice signal processing device 300 is applied to electronic equipment.
  • the voice signal processing device 300 includes a sound source positioning module 301, a signal separation module 302, a noise identification module 303 and a voice noise reduction module 304, as follows:
  • the sound source localization module 301 is used to determine the first direction information of the target sound source based on the original speech signal collected by the microphone array;
  • the signal separation module 302 is used to separate the first speech signal corresponding to the target sound source from the original speech signal according to the independent vector analysis algorithm and the first direction information;
  • the noise identification module 303 is used to determine the noise signal from the original speech signal according to the first direction information and the parameter information of the microphone array;
  • the speech noise reduction module 304 is used to perform noise reduction processing on the first speech signal according to the noise signal to obtain a second speech signal.
  • the sound source localization module 301 is also used to: obtain the original speech signal collected by the microphone array, and the original speech signal is a time domain signal; perform time-frequency conversion on the original speech signal to obtain a frequency domain signal; and, according to the sound
  • the source positioning algorithm performs sound source positioning processing on frequency domain signals to determine the first direction information of the target sound source.
  • the sound source localization module 301 is also used to: perform frame processing on the original speech signal to obtain continuous multi-frame speech signals arranged in the order of reception time; and, for each frame of speech signal, perform speech processing on the speech signal.
  • the signal undergoes time-frequency conversion processing to obtain a frequency domain signal corresponding to one frame.
  • the first direction information is a first direction vector
  • the sound source positioning module 301 is also used to: calculate the steering vector corresponding to the first direction vector at each frequency point of the frequency domain signal; and, based on the steering vector, The frequency domain signal is separated and processed according to the independent vector analysis algorithm to obtain the first speech signal corresponding to the target sound source.
  • the signal separation module 302 is also configured to: calculate the first auxiliary parameter matrix of the independent vector analysis algorithm according to the frequency domain signal; modify the first auxiliary parameter matrix according to the steering vector. Process, obtain the second auxiliary parameter matrix; calculate the target sound source subspace according to the second auxiliary parameter matrix; determine the separation matrix of the target sound source according to the target sound source subspace; and, according to the frequency domain signal and the separation matrix, calculate The first speech signal corresponding to the target sound source.
  • the noise identification module 303 is also used to: determine the second direction vector of the noise signal according to the arrangement of the microphone array and the first direction information; and beam the frequency domain signal according to the second direction vector. Shaping, the noise signal is obtained.
  • the speech noise reduction module 304 is also used to: use the corresponding noise signal of the multi-frame frequency domain signal as the input signal of the adaptive filter, perform an iterative operation according to the minimum mean square error algorithm, and determine the value of the adaptive filter. Coefficient; the adaptive filter determined according to the coefficient performs noise reduction processing on the first speech signal to obtain a second speech signal.
  • the device further includes:
  • a speech recognition module used to detect whether there is a preset wake-up word in the original speech signal
  • the sound source localization module 301 is also used to perform time-frequency conversion on the original speech signal to obtain a frequency domain signal when detecting the presence of a preset wake-up word in the original speech signal.
  • the voice signal processing device provided in the embodiments of the present application belongs to the same concept as the voice signal processing method in the above embodiments, and the voice signal processing method provided in the embodiments can be implemented through the voice signal processing device.
  • the speech signal processing method embodiment for details of its specific implementation process, which will not be described again here.
  • the speech signal processing device proposed in the embodiment of the present application first determines the first direction information of the target sound source based on the original speech signal collected by the microphone array, and then uses the independent vector analysis algorithm and the first direction information to obtain the first direction information from the original speech sound. Separate the first speech signal corresponding to the target sound source from the signal, then determine the noise signal from the original speech signal based on the first direction information and the parameter information of the microphone array, and then perform denoising on the first speech signal based on the noise signal. Process to obtain the second voice signal.
  • this scheme combines the direction information of the target source as a constraint, which enhances the stability and accuracy of the output of the independent vector analysis algorithm and avoids the output of pure noise signals.
  • the obtained noise signal is used to perform further noise reduction processing on the first speech signal to obtain a purer signal of the target sound source, thereby improving the accuracy of speech recognition.
  • Embodiments of the present application also provide an electronic device.
  • the electronic device can be a terminal device with a voice control function such as a smartphone or a tablet computer; it can also be a smart refrigerator, a smart air conditioner, etc. with a voice control function. Smart home appliances with control functions; they can also be other devices with voice control functions such as wireless speakers, smart projectors, etc.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 400 includes a processor 401 with one or more processing cores, a memory 402 with one or more computer-readable storage media, and a computer program stored on the memory 402 and executable on the processor. Among them, the processor 401 is electrically connected to the memory 402.
  • the structure of the electronic device shown in the figures does not constitute a limitation of the electronic device, and may include more or fewer components than shown in the figures, or combine certain components, or arrange different components.
  • the processor 401 is the control center of the electronic device 400, using various interfaces and lines to connect various parts of the entire electronic device 400, by running or loading software programs and/or modules stored in the memory 402, and calling the software programs and/or modules stored in the memory 402. data, perform various functions of the electronic device 400 and process the data, thereby overall monitoring the electronic device 400.
  • the processor 401 in the electronic device 400 will follow the following steps to load instructions corresponding to the processes of one or more application programs into the memory 402, and the processor 401 will run the instructions stored in the memory. 402 application to achieve various functions:
  • the first speech signal corresponding to the target sound source is separated from the original speech signal
  • the electronic device 400 also includes: a touch display screen 403 , a radio frequency circuit 404 , a voice circuit 405 , an input unit 406 and a power supply 407 .
  • the processor 401 is electrically connected to the touch display screen 403, the radio frequency circuit 404, the voice circuit 405, the input unit 406 and the power supply 407 respectively.
  • the structure of the electronic device shown in FIG. 3 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components.
  • the touch display screen 403 can be used to display a graphical user interface and receive operation instructions generated by the user acting on the graphical user interface.
  • the touch display screen 403 may include a display panel and a touch panel.
  • the display panel can be used to display information input by the user or information provided to the user as well as various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
  • the touch panel can be used to collect the user's touch operations on or near it (such as the user's operations on or near the touch panel using a finger, stylus, or any suitable object or accessory), and generate corresponding operations instruction, and the operation instruction executes the corresponding program.
  • the touch panel may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller;
  • the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller.
  • the processor 401 can receive commands sent by the processor 401 and execute them.
  • the radio frequency circuit 404 can be used to send and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to send and receive signals with network equipment or other electronic equipment.
  • the voice circuit 405 can be used to provide a voice interface between the user and the electronic device through speakers and microphones.
  • the voice circuit 405 can transmit the electrical signal converted from the received voice data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the voice circuit 405.
  • the voice data is processed by the voice data output processor 401 and then sent to, for example, another electronic device via the radio frequency circuit 404, or the voice data is output to the memory 402 for further processing.
  • Voice circuit 405 may also include an earphone jack to provide communication between peripheral earphones and electronic devices.
  • the input unit 406 can be used to receive input numbers, character information or user characteristic information (such as fingerprints, iris, facial information, etc.), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. .
  • the power supply 407 is used to power various components of the electronic device 400 .
  • the power supply 407 can be logically connected to the processor 401 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
  • Power supply 407 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
  • the electronic device 400 may also include a camera, a sensor, a wireless fidelity module, a Bluetooth module, etc., which will not be described again here.
  • the electronic device provided in this embodiment first determines the first direction information of the target sound source based on the original speech signal collected by the microphone array, and then separates it from the original speech signal according to the independent vector analysis algorithm and the first direction information.
  • the first speech signal corresponding to the target sound source is then determined from the original speech signal according to the first direction information and the parameter information of the microphone array, and then the first speech signal is denoised according to the noise signal to obtain the first speech signal.
  • Voice signal When separating the original speech signal, this scheme combines the direction information of the target source as a constraint, which enhances the stability and accuracy of the output of the independent vector analysis algorithm and avoids the output of pure noise signals.
  • the obtained noise signal is used to perform further noise reduction processing on the first speech signal to obtain a purer signal of the target sound source, thereby improving the accuracy of speech recognition.
  • embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • any speech signal processing method provided by the embodiments of the present application is implemented. steps in.
  • the computer program can perform the following steps:
  • the first speech signal corresponding to the target sound source is separated from the original speech signal
  • the storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or optical disk etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音信号的处理方法、装置、设备及计算机可读存储介质。该方法包括:根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息(101);按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号(102);根据第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号(103);根据噪声信号对第一语音信号降噪处理(104)。

Description

语音信号的处理方法、装置、设备及计算机可读存储介质
本申请要求于2022年07月20日提交中国专利局、申请号为202210863937.6、申请名称为“语音信号的处理方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音处理技术领域,具体涉及一种语音信号的处理方法、装置、设备及计算机可读存储介质。
背景技术
目前的很多智能家电、智能终端设备等都带有语音控制功能,用户可以通过语音进行开关、控制指令的触发等,要实现准确的语音控制就需要语音的准确识别,但是噪声环境下的语义识别准确率较低。
发明内容
本申请实施例提供一种语音信号的处理方法、装置、设备及计算机可读存储介质,能够提高语音识别的准确度。
第一方面,本申请实施例提供一种语音信号的处理方法,包括:
根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号;
根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号;
根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号。
第二方面,本申请实施例还提供一种语音信号的处理装置,包括:
声源定位模块,用于根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
信号分离模块,用于按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号;
噪声识别模块,用于根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号;
语音降噪模块,用于根据所述噪声信号对所述第一语音信号进行降噪处理,得 到第二语音信号。
第三方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如本申请任一实施例提供的语音信号的处理方法中的步骤。
第四方面,本申请实施例还提供一种电子设备,所述电子设备包括处理器、存储器以及存储于所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本申请任一实施例提供的语音信号的处理方法中的步骤。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的语音信号的处理方法的第一种流程示意图。
图2为本申请实施例提供的语音信号的处理装置的结构示意图。
图3为本申请实施例提供的电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本申请的保护范围。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例提供一种语音信号的处理方法,包括:
根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号;
根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号;
根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号。
在一些实施例中,所述根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,包括:
获取麦克风阵列采集的原始语音信号,所述原始语音信号为时域信号;
对所述原始语音信号进行时频转换,得到频域信号;
按照声源定位算法对所述频域信号进行声源定位处理,确定出目标声源的第一方向信息。
在一些实施例中,所述对所述原始语音信号进行时频转换,得到频域信号,包括:
对所述原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;
对于每一帧语音信号,对所述语音信号进行时频转换处理,得到一帧对应的频域信号。
在一些实施例中,所述第一方向信息为第一方向向量;所述按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号,包括:
计算所述第一方向向量在所述频域信号的各个频点处对应的导向矢量;
基于所述导向矢量,按照独立向量分析算法对所述频域信号分离处理,得到所述目标声源对应的第一语音信号。
在一些实施例中,所述基于所述导向矢量,按照独立向量分析算法对所述频域信号分离处理,得到所述目标声源对应的第一语音信号,包括:
根据所述频域信号计算独立向量分析算法的第一辅助参数矩阵;
根据所述导向矢量对所述第一辅助参数矩阵进行修正处理,得到第二辅助参数矩阵;
根据所述第二辅助参数矩阵,计算目标声源子空间;
根据所述目标声源子空间确定出所述目标声源的分离矩阵;
根据所述频域信号和所述分离矩阵,计算得到所述目标声源对应的第一语音信号。
在一些实施例中,所述参数信息包括排布方式;所述根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号,包括:
根据所述麦克风阵列的排布方式以及所述第一方向信息,确定出噪声信号的第二方向向量;
根据所述第二方向向量对所述频域信号进行波束成形,得到噪声信号。
在一些实施例中,所述根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号,包括:
将所述多帧频域信号的对应的噪声信号作为自适应滤波器的输入信号,按照最小均方差算法进行迭代运算,确定出所述自适应滤波器的系数;
根据系数确定的所述自适应滤波器对所述第一语音信号中进行降噪处理,得到第二语音信号。
在一些实施例中,所述对所述原始语音信号进行时频转换,得到频域信号之前,所述方法还包括:
检测所述原始语音信号是否存在预设唤醒词;
当检测到所述原始语音信号中存在所述预设唤醒词时,执行对所述原始语音信号进行时频转换,得到频域信号的步骤。
本申请实施例提供的技术方案,先根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,再按照独立向量分析算法和第一方向信息从原始语音信号中分离出目标声源对应的第一语音信号,然后,根据该第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号,再根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。该方案在对原始语音信号进行分离时,结合目标生源的方向信息作为约束条件,加强了独立向量分析算法输出的稳定性和准确性,避免输出的是纯噪声信号,此外,在得到第一语音信号之后,使用获取到的噪声信号再对第一语音信号进行进一步的降噪处理,得到更加纯粹的目标声源的信号,进而提高语音识别的准确度。
本申请实施例提供一种语音信号的处理方法,该语音信号的处理方法的执行主体可以是电子设备。其中,电子设备可以是智能手机、平板电脑等带有语音控制功能的终端设备;也可以是智能冰箱、智能空调等带有语音控制功能的智能家电设备;还可以是如无线音箱、智能投影仪等其他带有语音控制功能的设备。
请参阅图1,图1为本申请实施例提供的语音信号的处理方法的第一种流程示意图。本申请实施例提供的语音信号的处理方法的具体流程可以如下:
101、根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息。
电子设备设置有两个或者两个以上的麦克风,这多个麦克风构成麦克风阵列,并在电子设备处于运行状态时,对电子设备周围的语音信号进行采集。电子设备在通过麦克风阵列采集语音信号时,除了可以采集到用户发出的语音信号之外,还会采集到周边环境中的其他干扰语音信号,例如,环境噪声,其他人发出的声音等,也就是说,麦克风阵列采集到的原始语音信号是一个混合信号。这些信号会影响到电子设备对用户的目标语音信号的识别的准确度,因此,电子设备在检测到原始语音信号后,需要对该原始语音信号进行分离处理,从中确定出目标声源的语音信号,即从这个混合语音信号中确定出用户发出的用于控制电子设备的语音信号。
麦克风阵列采集的语音信号为时域信号,记做xm(t),其中,m=1,2,...,M,t=1,2,...,T,M是麦克风阵列中麦克风的个数,T是一帧信号的长度。电子设备将麦克风阵列采集到的语音信号按照时间先后顺序逐帧存储到缓存器y中。
其中,y={y1;y2;...;yM},ym={ym(1),ym(2),...,ym(L)},ym(1)={ym(1,1),ym(1,2),...,ym(1,T)},L是缓存器中存放的时域信号的帧数。每当新进来一帧时域信号xm(t),电子设备按照如下方式为ym赋值:ym(1,t)=ym(2,t),ym(2,t)=ym(3,t),...,ym(L-1,t)=ym(L,t),ym(L,t)=xm(t)。
在后续的信号处理过程中,电子设备从缓存器y中按照时间先后顺序获取多帧时域信号进行处理。
对于原始语音信号,可以按照声源定位算法从其中确定出目标声源的第一方向信息。例如,在一实施例中,根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,包括:获取麦克风阵列采集的原始语音信号,原始语音信号为时域信号;对原始语音信号进行时频转换,得到频域信号;按照声源定位算法对频域信号进行声源定位处理,确定出目标声源的第一方向信息。
获取缓存器中的时域信号,对这些时域信号进行视频转换处理,得到频域信号,再基于频域信号进行后续的定位处理。此外,语音信号是一种时序信号,因此可以在时间轴上对连续的语音信号进行分帧处理,然后,基于连续的多帧 语音信号执行后续的操作。例如,在一实施例中,对原始语音信号进行时频转换,得到频域信号,包括:对原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;对于每一帧语音信号,对语音信号进行时频转换处理,得到一帧对应的频域信号。
该实施例中,对时域信号进行重新分帧加窗处理,例如帧长为T′,帧数为L′。然后对多帧时域的语音信号进行时频转换处理,得到频域信号。例如,按照快速傅里叶变换(Fast Fourier Transform,FFT)进行时频转换处理,得到频域信号,记为Y(l′,k′),l′=1,2,...,L′,k′=1,2,...,K′,K′是FFT点数。Y(l′,k′)={Y1(l′,k′);...;YM(l′,k′)}。
其中,在一实施例中,对所述原始语音信号进行时频转换,得到频域信号之前,所述方法还包括:检测所述原始语音信号是否存在预设唤醒词;当检测到所述原始语音信号中存在所述预设唤醒词时,执行对所述原始语音信号进行时频转换,得到频域信号的步骤。
在语音控制场景下,用户一般需要先使用唤醒词唤醒语音系统,语音系统才会对后续的语音指令进行应答。该实施例中,获取麦克风采集到的原始语音信号之后,可以先检测原始语音信号中是否存在有预设唤醒词,当存在原始唤醒词时,再执行后续的语音信号处理操作,反之,则无需执行后续的语音信号处理操作。
接下来,按照声源算法对频域信号进行声源定位处理,确定出目标声源的第一方向信息。例如,利用频域信号Y′(l′,k′)做若干个声源方向的初步测定,按照预设的角度范围,在空间中均匀选取N个方向向量dn,n=1,2,...,N,采用SRP-PHAT(Steering response power-phase transform,基于可控功率响应和相位变换)算法计算与该方向向量对应的SRP-PHAT值,然后在这些SRP-PHAT值里面确定出最大峰值对应的方向向量,将该方向向量确定为目标声源的第一方向信息,记为第一方向向量dspeec
102、按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号。
传统的overIVA(overdetermined Independent Vector Analysis,超定的独立向量分析)算法会对混合语音信号进行分离,但是难以确定分离出的是目标语音还是干扰语音。本申请实施例的方案,在确定出表征目标声源的方向信息的 第一方向向量后,将该第一方向向量作为独立向量分析算法的约束条件,进行信号分离处理。
在一些实施例中,第一方向信息为第一方向向量;按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号,包括:计算第一方向向量在频域信号的各个频点处对应的导向矢量;基于导向矢量,按照独立向量分析算法对频域信号分离处理,得到目标声源对应的第一语音信号。
其中,频域信号中包含多个频点,根据第一方向向量计算出的各个频点对应的导向矢量,然后在按照独立向量分析算法对频域信号分离处理的过程中,使用该导向矢量对算法进行约束,使得最终分离出的信号是目标声源对应的信号,而不是其他的干扰信号。
其中,从原始语音信号中分离出的目标声源对应的第一语音信号为
在一些实施中,基于导向矢量,按照独立向量分析算法对频域信号分离处理,得到目标声源对应的第一语音信号,包括:根据频域信号计算独立向量分析算法的第一辅助参数矩阵;根据导向矢量对第一辅助参数矩阵进行修正处理,得到第二辅助参数矩阵;根据第二辅助参数矩阵,计算目标声源子空间;根据目标声源子空间确定出目标声源的分离矩阵;根据频域信号和分离矩阵,计算得到目标声源对应的第一语音信号。
其中,频域信号包括K个频点,α(k,dspeec)为第一方向向量dspeec在K个频点中的第k个频点处对应的导向矢量。具体地,先求M个麦克风在dspeec的相对时延,即在dspeec方向下,声波从麦克风处传播到坐标原点的时间。以原点为起点,第m个麦克风的坐标为终点,表示成向量形式为δm,则有相对时延其中符号“·”表示内积。
可以得到dspeec在第k个频点处对应的导向矢量
其中,对原始语音信号(时域信号)xm经过时频转换后,得到频域信号 X(l,k),其中,l代表第l帧,k代表频率索引,k=1,2,...,K,K是FFT点数。(·)H表示共轭转置。Wbp(l,k)是的第一行,其中,Wbp(l,k)为1×M的矩阵,是一个M×M的矩阵。因此,要计算出分离出的第一语音信号,则需要计算出
其中, 是独立向量分析算法的分离矩阵,是一个M×M的矩阵。A(l,k)是个M×M的对角矩阵,其对角线元上的元素为求逆后的对角线上的元素。
其中,
W(l,k)是一个1×M的矩阵,表示目标声源子空间,其初始值W(0,k)的第一个元素为1,其它位置的元素为零。该矩阵的作用是从M个输入中分离出目标声源。U(l,k)=[U1(l,k);U2(l,k);...;UM-1(l,k)]。U(l,k)是一个(M-1)×M的矩阵。
接下来,对W(l,k)和U(l,k)的计算过程进行说明。首先,计算第一辅助参数矩阵V(l,k)。
其中,本申请实施例的方案不需要等到采集到完整的语音信号之后,才能进行语音信号的分离,缓存器y中每存入一帧新的语音信号,就可以从缓存器中读取最新一帧语音信号,采用迭代的方式进行后续语音信号的一些列处理。例如,在按照独立向量分析算法进行信号的分离时,在计算第l帧的相关数据时,可以结合第l-1帧的数据。例如上文中计算第一辅助参数矩阵的公式,就是按照迭代的方式进行运算,其中,V(l,k)的初始值V(0,k)的对角线上的元素为1,其它位置的元素为零。α是一个取值范围在0-1之间的遗忘因子。X(l,k)是频域信号,(·)H表示共轭转置。
接下来,根据上文中计算得到的导向矢量对第一辅助参数矩阵进行修正,得到第二辅助参数矩阵,使得使用第二辅助参数矩阵计算得到的分离矩阵能够从混合语音信号中分离出目标声源的信号,而非其他干扰信号。
具体地,第二辅助参数矩阵D(l,k)=V(l,k)+λα(k,dspeecH(k,dspeec),其中,λ为预设常数,用来调节导向矢量在独立向量分析算法中的参与程度。
其中,W(l,k)的计算公式如下:
如果Φ(l,k)等于0,则
如果Φ(l,k)不等于0,则
其中,Ψ(l,k)=PH(l,k)D(l,k)P(l,k),Φ(l,k)=PH(l,k)D(l,k)Q(l,k),Q(l,k)=λσD-1(l,k)α(k,dspeec)。其中,σ也是一个预设常数,用来调节导向矢量在独立向量分析算法中的参与程度。
U(l,k)是一个(M-1)×M的矩阵,U(l,k)=[J(l,k),-IM-1]。
其中,J(l,k)=(A2C(l,k)WH(l,k))(A1C(l,k)WH(l,k))-1
A1=[1,O1×M-1],A2=[O(M-1)×1,IM-1],I*为*行*列的单位矩阵。O*为*行*列的零矩阵,C(l,k)是M×M的方阵。
C(l,k)=αC(l-1,k)+(1-α)X(l,k)XH(l,k),α为遗忘因子,取值在0~1之间,例如,在一实施例中,α设置为0.95,C(l,k)初始初值为C(0,k),可设置为零矩阵。
按照上文中公式计算得到W(l,k)和U(l,k),即可得到Wbp(l,k)。
其中,在一实施例中,在得到Wbp(l,k)之后,可以对其进行归一化处理,从而提高是独立向量分析算法的输出的稳定性,以及算法收敛之后的降噪效果。
其中,归一化的方式如下:
Wbp(l,k)=[Wbp,1(l,k),Wbp,2(l,k),...,Wbp,M(l,k)]
Pbp(l,k)=|Wbp,1(l,k)|+|Wbp,2(l,k)|+...+|Wbp,M(l,k)|
103、根据第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号。
按照上述方式对原始语音信号进行分离处理之后,得到目标声源的第一语音信号,由于环境噪声的存在,该第一语音信号中还可能会存在有部分噪声信号,为了进一步地对该信号进行去噪处理,还可以进一步地从原始语音信号中识别出噪声信号。再基于该噪声信号对第一语音信号进行降噪处理。
例如,在一实施例中,参数信息包括排布方式;所述根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号,包括:根据所述麦克风阵列的排布方式以及所述第一方向信息,确定出噪声信号的第二方向向量;根据所述第二方向向量对所述频域信号进行波束成形,得到噪声信号。
对于麦克风阵列来说,不同阵型的方位角最大夹角和俯仰角最大夹角都不一样。常规的麦克风阵列的阵型有线阵和圆阵两种。假如麦克风阵列的阵型是线阵,那么与目标声源方位角的最大夹角为90°、与目标声源俯仰角的最大夹角为45°,如果阵型是圆阵,那么与目标声源方位角的最大夹角为180°、与目标声源俯仰角的最大夹角为45°。上文中已经计算出目标声源的第一方向向量,在空间坐标下,每一个方向向量都可以分解为方位角和俯仰角,反之,在知道方位角和俯仰角的情况下也可以反推出方向向量。对于电子设备来说,其麦克风阵列的排布方式是确定的,而在与目标声源具有最大夹角的方向的语音信号中包含的目标声源的分量是最小的,基于这样的原理,可以将与目标声源具有最大夹角的方向的语音信号作为噪声信号。该噪声信号与目标声源的语音信号之间的方位角最大夹角、俯仰角最大夹角都是已知的,目标声源对应的方位角和俯仰角也可以根据第一方向向量计算得到,可以根据噪声信号与目标声源的语音信号之间的方位角最大夹角、俯仰角最大夹角,以及第一方向向量计算得到噪声信号对应的第二方向向量dnoise
在确定出第二方向向量后,计算第二方向向量的导向矢量。α(k,dnoise)为dnoise在第k个频点处对应的导向矢量。具体地,先求M个麦克风在dnoise的相对时延,即在dnoise方向下,声波从麦克风处传播到坐标原点的时间。以原点为起点,第m个麦克风的坐标为终点,表示成向量形式为δm,则有相对时 延其中符号“·”表示内积。可以得到
然后,根据该方向矢量对原始语音信号进行波束成形处理,得到该方向矢量处的语音信号,作为噪声信号。其中,在一实施例中,可以采用超指向型波束成形得到波束输出信号B(l,k)。
104、根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。
确定出噪声信号后,将多帧频域信号的对应的噪声信号作为自适应滤波器的输入信号,按照最小均方差算法进行迭代运算,确定出自适应滤波器的系数;根据系数确定的自适应滤波器对第一语音信号中进行降噪处理,得到第二语音信号。
上文计算出的第一语音信号为Z(l,k),按照如下公式可以计算出第二语音信号E(l,k)。
E(l,k)=Z(l,k)-B(l,k)WNLMS(l,k)
其中,l是帧索引,k是频率索引,且k=1,2,...,K,K是频点数量,E(l,k)为自适应滤波后的信号,B(l,k)是B(l,k)的历史缓存。
B(l,k)=[B(l,k),B(l-1,k),...,B(l-ORD+1,k)],ORD是缓存的帧数。
WNLMS(l,k)为自适应滤波器的系数。其中,该系数的迭代方式如下:
其中,μ是步长调节因子,·*表示求共轭。
按照最小均方差算法进行迭代运算确定出自适应滤波器的系数,再按照上文中的公式计算出第二语音信号E(l,k),已完成对目标声源的语音信号的进一步降噪。
具体实施时,本申请不受所描述的各个步骤的执行顺序的限制,在不产生冲突的情况下,某些步骤还可以采用其它顺序进行或者同时进行。
由上可知,本申请实施例提供的语音信号的处理方法,先根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,再按照独立向量分析算法和第一方向信息从原始语音信号中分离出目标声源对应的第一语音信号,然后,根据该第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号,再根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。 该方案在对原始语音信号进行分离时,结合目标生源的方向信息作为约束条件,加强了独立向量分析算法输出的稳定性和准确性,避免输出的是纯噪声信号,此外,在得到第一语音信号之后,使用获取到的噪声信号再对第一语音信号进行进一步的降噪处理,得到更加纯粹的目标声源的信号,进而提高语音识别的准确度。
在一实施例中还提供一种语音信号的处理装置。请参阅图2,图2为本申请实施例提供的语音信号的处理装置300的结构示意图。其中该语音信号的处理装置300应用于电子设备,该语音信号的处理装置300包括声源定位模块301、信号分离模块302、噪声识别模块303以及语音降噪模块304,如下:
声源定位模块301,用于根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
信号分离模块302,用于按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号;
噪声识别模块303,用于根据第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号;
语音降噪模块304,用于根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。
在一些实施例中,声源定位模块301还用于:获取麦克风阵列采集的原始语音信号,原始语音信号为时域信号;对原始语音信号进行时频转换,得到频域信号;以及,按照声源定位算法对频域信号进行声源定位处理,确定出目标声源的第一方向信息。
在一些实施例中,声源定位模块301还用于:对原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;以及,对于每一帧语音信号,对语音信号进行时频转换处理,得到一帧对应的频域信号。
在一些实施例中,第一方向信息为第一方向向量;声源定位模块301还用于:计算第一方向向量在频域信号的各个频点处对应的导向矢量;以及,基于导向矢量,按照独立向量分析算法对频域信号分离处理,得到目标声源对应的第一语音信号。
在一些实施例中,信号分离模块302还用于:根据频域信号计算独立向量分析算法的第一辅助参数矩阵;根据导向矢量对第一辅助参数矩阵进行修正处 理,得到第二辅助参数矩阵;根据第二辅助参数矩阵,计算目标声源子空间;根据目标声源子空间确定出目标声源的分离矩阵;以及,根据频域信号和分离矩阵,计算得到目标声源对应的第一语音信号。
在一些实施例中,噪声识别模块303还用于:根据麦克风阵列的排布方式以及第一方向信息,确定出噪声信号的第二方向向量;以及,根据第二方向向量对频域信号进行波束成形,得到噪声信号。
在一些实施例中,语音降噪模块304还用于:将多帧频域信号的对应的噪声信号作为自适应滤波器的输入信号,按照最小均方差算法进行迭代运算,确定出自适应滤波器的系数;根据系数确定的所述自适应滤波器对所述第一语音信号中进行降噪处理,得到第二语音信号。
在一些实施例中,该装置还包括:
语音识别模块,用于检测原始语音信号是否存在预设唤醒词;
声源定位模块301还用于:当检测到原始语音信号中存在预设唤醒词时,执行对原始语音信号进行时频转换,得到频域信号。
应当说明的是,本申请实施例提供的语音信号的处理装置与上文实施例中的语音信号的处理方法属于同一构思,通过该语音信号的处理装置可以实现语音信号的处理方法实施例中提供的任一方法,其具体实现过程详见语音信号的处理方法实施例,此处不再赘述。
由上可知,本申请实施例提出的语音信号的处理装置,先根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,再按照独立向量分析算法和第一方向信息从原始语音信号中分离出目标声源对应的第一语音信号,然后,根据该第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号,再根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。该方案在对原始语音信号进行分离时,结合目标生源的方向信息作为约束条件,加强了独立向量分析算法输出的稳定性和准确性,避免输出的是纯噪声信号,此外,在得到第一语音信号之后,使用获取到的噪声信号再对第一语音信号进行进一步的降噪处理,得到更加纯粹的目标声源的信号,进而提高语音识别的准确度。
本申请实施例还提供一种电子设备,电子设备可以是智能手机、平板电脑等带有语音控制功能的终端设备;也可以是智能冰箱、智能空调等带有语音控 制功能的智能家电设备;还可以是如无线音箱、智能投影仪等其他带有语音控制功能的设备。请参阅图3,图3为本申请实施例提供的电子设备的结构示意图。该电子设备400包括有一个或者一个以上处理核心的处理器401、有一个或一个以上计算机可读存储介质的存储器402及存储在存储器402上并可在处理器上运行的计算机程序。其中,处理器401与存储器402电性连接。本领域技术人员可以理解,图中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
处理器401是电子设备400的控制中心,利用各种接口和线路连接整个电子设备400的各个部分,通过运行或加载存储在存储器402内的软件程序和/或模块,以及调用存储在存储器402内的数据,执行电子设备400的各种功能和处理数据,从而对电子设备400进行整体监控。
在本申请实施例中,电子设备400中的处理器401会按照如下的步骤,将一个或一个以上的应用程序的进程对应的指令加载到存储器402中,并由处理器401来运行存储在存储器402中的应用程序,从而实现各种功能:
根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号;
根据第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号;
根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
可选的,如图3所示,电子设备400还包括:触控显示屏403、射频电路404、语音电路405、输入单元406以及电源407。其中,处理器401分别与触控显示屏403、射频电路404、语音电路405、输入单元406以及电源407电性连接。本领域技术人员可以理解,图3中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
触控显示屏403可用于显示图形用户界面以及接收用户作用于图形用户界面产生的操作指令。触控显示屏403可以包括显示面板和触控面板。其中, 显示面板可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。触控面板可用于收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并生成相应的操作指令,且操作指令执行对应程序。可选的,触控面板可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器401,并能接收处理器401发来的命令并加以执行。
射频电路404可用于收发射频信号,以通过无线通信与网络设备或其他电子设备建立无线通讯,与网络设备或其他电子设备之间收发信号。
语音电路405可以用于通过扬声器、传声器提供用户与电子设备之间的语音接口。语音电路405可将接收到的语音数据转换后的电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由语音电路405接收后转换为语音数据,再将语音数据输出处理器401处理后,经射频电路404以发送给比如另一电子设备,或者将语音数据输出至存储器402以便进一步处理。语音电路405还可能包括耳塞插孔,以提供外设耳机与电子设备的通信。
输入单元406可用于接收输入的数字、字符信息或用户特征信息(例如指纹、虹膜、面部信息等),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
电源407用于给电子设备400的各个部件供电。可选的,电源407可以通过电源管理系统与处理器401逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源407还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管图3中未示出,电子设备400还可以包括摄像头、传感器、无线保真模块、蓝牙模块等,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详 述的部分,可以参见其他实施例的相关描述。
由上可知,本实施例提供的电子设备,先根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,再按照独立向量分析算法和第一方向信息从原始语音信号中分离出目标声源对应的第一语音信号,然后,根据该第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号,再根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。该方案在对原始语音信号进行分离时,结合目标生源的方向信息作为约束条件,加强了独立向量分析算法输出的稳定性和准确性,避免输出的是纯噪声信号,此外,在得到第一语音信号之后,使用获取到的噪声信号再对第一语音信号进行进一步的降噪处理,得到更加纯粹的目标声源的信号,进而提高语音识别的准确度。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如本申请实施例提供的任意一种语音信号的处理方法中的步骤。例如,该计算机程序可以执行如下步骤:
根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
按照独立向量分析算法和第一方向信息,从原始语音信号中分离出目标声源对应的第一语音信号;
根据第一方向信息和麦克风阵列的参数信息从原始语音信号中确定出噪声信号;
根据噪声信号对第一语音信号进行降噪处理,得到第二语音信号。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的计算机程序,可以执行本申请实施例所提供的任一种语音信号的处理方法中的步骤,因此,可以实现本申请实施例所提供的任一种语音信号的处理方法所能实现的有益效果,详见前面的实施例,在此不 再赘述。
以上对本申请实施例所提供的一种语音信号的处理方法、装置、设备及计算机可读存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种语音信号的处理方法,其中,包括:
    根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
    按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号;
    根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号;
    根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号。
  2. 如权利要求1所述的语音信号的处理方法,其中,所述根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息,包括:
    获取麦克风阵列采集的原始语音信号,所述原始语音信号为时域信号;
    对所述原始语音信号进行时频转换,得到频域信号;
    按照声源定位算法对所述频域信号进行声源定位处理,确定出目标声源的第一方向信息。
  3. 如权利要求2所述的语音信号的处理方法,其中,所述对所述原始语音信号进行时频转换,得到频域信号,包括:
    对所述原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;
    对于每一帧语音信号,对所述语音信号进行时频转换处理,得到一帧对应的频域信号。
  4. 如权利要求3所述的语音信号的处理方法,其中,所述第一方向信息为第一方向向量;所述按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号,包括:
    计算所述第一方向向量在所述频域信号的各个频点处对应的导向矢量;
    基于所述导向矢量,按照独立向量分析算法对所述频域信号分离处理,得到所述目标声源对应的第一语音信号。
  5. 如权利要求4所述的语音信号的处理方法,其中,所述基于所述导向矢量,按照独立向量分析算法对所述频域信号分离处理,得到所述目标声源对应的第一语音信号,包括:
    根据所述频域信号计算独立向量分析算法的第一辅助参数矩阵;
    根据所述导向矢量对所述第一辅助参数矩阵进行修正处理,得到第二辅助参数矩阵;
    根据所述第二辅助参数矩阵,计算目标声源子空间;
    根据所述目标声源子空间确定出所述目标声源的分离矩阵;
    根据所述频域信号和所述分离矩阵,计算得到所述目标声源对应的第一语音信号。
  6. 如权利要求3所述的语音信号的处理方法,其中,所述参数信息包括排布方式;所述根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号,包括:
    根据所述麦克风阵列的排布方式以及所述第一方向信息,确定出噪声信号的第二方向向量;
    根据所述第二方向向量对所述频域信号进行波束成形,得到噪声信号。
  7. 如权利要求6所述的方法,其中,所述根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号,包括:
    将所述多帧频域信号的对应的噪声信号作为自适应滤波器的输入信号,按照最小均方差算法进行迭代运算,确定出所述自适应滤波器的系数;
    根据系数确定的所述自适应滤波器对所述第一语音信号中进行降噪处理,得到第二语音信号。
  8. 如权利要求2所述的方法,其中,所述对所述原始语音信号进行时频转换,得到频域信号之前,所述方法还包括:
    检测所述原始语音信号是否存在预设唤醒词;
    当检测到所述原始语音信号中存在所述预设唤醒词时,执行对所述原始语音信号进行时频转换,得到频域信号的步骤。
  9. 一种语音信号的处理装置,其中,包括:
    声源定位模块,用于根据麦克风阵列采集的原始语音信号确定出目标声源的第一方向信息;
    信号分离模块,用于按照独立向量分析算法和所述第一方向信息,从所述原始语音信号中分离出所述目标声源对应的第一语音信号;
    噪声识别模块,用于根据所述第一方向信息和所述麦克风阵列的参数信息从所述原始语音信号中确定出噪声信号;
    语音降噪模块,用于根据所述噪声信号对所述第一语音信号进行降噪处理,得到第二语音信号。
  10. 如权利要求9所述的装置,其中,所述声源定位模块还用于:
    获取麦克风阵列采集的原始语音信号,所述原始语音信号为时域信号;
    对所述原始语音信号进行时频转换,得到频域信号;
    按照声源定位算法对所述频域信号进行声源定位处理,确定出目标声源的第一方向信息。
  11. 如权利要求10所述的装置,其中,所述声源定位模块还用于:
    对所述原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;
    对于每一帧语音信号,对所述语音信号进行时频转换处理,得到一帧对应的频域信号。
  12. 如权利要求11所述的装置,其中,所述第一方向信息为第一方向向量;所述声源定位模块还用于:
    计算所述第一方向向量在所述频域信号的各个频点处对应的导向矢量;
    基于所述导向矢量,按照独立向量分析算法对所述频域信号分离处理,得到所述目标声源对应的第一语音信号。
  13. 如权利要求12所述的装置,其中,所述信号分离模块302还用于:
    根据所述频域信号计算独立向量分析算法的第一辅助参数矩阵;
    根据所述导向矢量对所述第一辅助参数矩阵进行修正处理,得到第二辅助参数矩阵;
    根据所述第二辅助参数矩阵,计算目标声源子空间;
    根据所述目标声源子空间确定出所述目标声源的分离矩阵;
    根据所述频域信号和所述分离矩阵,计算得到所述目标声源对应的第一语音信号。
  14. 如权利要求11所述的装置,其中,所述参数信息包括排布方式;所述噪声识别模块303还用于:
    根据所述麦克风阵列的排布方式以及所述第一方向信息,确定出噪声信号的第二方向向量;
    根据所述第二方向向量对所述频域信号进行波束成形,得到噪声信号。
  15. 如权利要求14所述的装置,其中,所述语音降噪模块304还用于:
    将所述多帧频域信号的对应的噪声信号作为自适应滤波器的输入信号,按照最小均方差算法进行迭代运算,确定出所述自适应滤波器的系数;
    根据系数确定的所述自适应滤波器对所述第一语音信号中进行降噪处理,得到第二语音信号。
  16. 如权利要求10所述的装置,其中,所述装置还包括:
    语音识别模块,用于检测所述原始语音信号是否存在预设唤醒词;
    所述声源定位模块还用于:当检测到所述原始语音信号中存在所述预设唤醒词时,对所述原始语音信号进行时频转换,得到频域信号。
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的语音信号的处理方法中的步骤。
  18. 一种电子设备,其中,所述电子设备包括处理器、存储器以及存储于所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1所述的语音信号的处理方法中的步骤。
  19. 如权利要求18所述的电子设备,其中,所述处理器执行所述计算机程序时还可以实现:
    获取麦克风阵列采集的原始语音信号,所述原始语音信号为时域信号;
    对所述原始语音信号进行时频转换,得到频域信号;
    按照声源定位算法对所述频域信号进行声源定位处理,确定出目标声源的第一方向信息。
  20. 如权利要求19所述的电子设备,其中,所述处理器执行所述计算机程序时还可以实现:
    对所述原始语音信号进行分帧处理,得到按照接收时间的先后顺序排列的连续多帧语音信号;
    对于每一帧语音信号,对所述语音信号进行时频转换处理,得到一帧对应 的频域信号。
PCT/CN2023/092935 2022-07-20 2023-05-09 语音信号的处理方法、装置、设备及计算机可读存储介质 WO2024016793A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210863937.6 2022-07-20
CN202210863937.6A CN117174078A (zh) 2022-07-20 2022-07-20 语音信号的处理方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2024016793A1 true WO2024016793A1 (zh) 2024-01-25

Family

ID=88930498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092935 WO2024016793A1 (zh) 2022-07-20 2023-05-09 语音信号的处理方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN117174078A (zh)
WO (1) WO2024016793A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180350381A1 (en) * 2017-05-31 2018-12-06 Apple Inc. System and method of noise reduction for a mobile device
CN111883166A (zh) * 2020-07-17 2020-11-03 北京百度网讯科技有限公司 一种语音信号处理方法、装置、设备以及存储介质
CN113077808A (zh) * 2021-03-22 2021-07-06 北京搜狗科技发展有限公司 一种语音处理方法、装置和用于语音处理的装置
US20210217434A1 (en) * 2015-03-18 2021-07-15 Industry-University Cooperation Foundation Sogang University Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
CN113889135A (zh) * 2020-07-03 2022-01-04 华为技术有限公司 一种估计声源波达方向的方法、电子设备及芯片系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210217434A1 (en) * 2015-03-18 2021-07-15 Industry-University Cooperation Foundation Sogang University Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
US20180350381A1 (en) * 2017-05-31 2018-12-06 Apple Inc. System and method of noise reduction for a mobile device
CN113889135A (zh) * 2020-07-03 2022-01-04 华为技术有限公司 一种估计声源波达方向的方法、电子设备及芯片系统
CN111883166A (zh) * 2020-07-17 2020-11-03 北京百度网讯科技有限公司 一种语音信号处理方法、装置、设备以及存储介质
CN113077808A (zh) * 2021-03-22 2021-07-06 北京搜狗科技发展有限公司 一种语音处理方法、装置和用于语音处理的装置

Also Published As

Publication number Publication date
CN117174078A (zh) 2023-12-05

Similar Documents

Publication Publication Date Title
CN107577449B (zh) 唤醒语音的拾取方法、装置、设备及存储介质
WO2020192721A1 (zh) 一种语音唤醒方法、装置、设备及介质
WO2020103703A1 (zh) 一种音频数据处理方法、装置、设备及存储介质
WO2020143652A1 (zh) 一种关键词的检测方法以及相关装置
WO2020088153A1 (zh) 语音处理方法、装置、存储介质和电子设备
WO2019101123A1 (zh) 语音活性检测方法、相关装置和设备
CN109756818B (zh) 双麦克风降噪方法、装置、存储介质及电子设备
TWI790236B (zh) 音量調節方法、裝置、電子設備及存儲介質
WO2020048431A1 (zh) 一种语音处理方法、电子设备和显示设备
CN111863020B (zh) 语音信号处理方法、装置、设备及存储介质
WO2022028083A1 (zh) 电子设备的降噪方法、装置、存储介质及电子设备
WO2020238203A1 (zh) 降噪方法、降噪装置及可实现降噪的设备
CN115497500B (zh) 音频处理方法、装置、存储介质及智能眼镜
CN110660404A (zh) 基于零陷滤波预处理的语音通信和交互应用系统、方法
US11164591B2 (en) Speech enhancement method and apparatus
WO2024016793A1 (zh) 语音信号的处理方法、装置、设备及计算机可读存储介质
WO2023155607A1 (zh) 终端设备和语音唤醒方法
WO2020107455A1 (zh) 语音处理方法、装置、存储介质及电子设备
US20220272442A1 (en) Voice processing method, electronic device and readable storage medium
WO2023020076A1 (zh) 设备的唤醒方法
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
US11783809B2 (en) User voice activity detection using dynamic classifier
CN116935883B (zh) 声源定位方法、装置、存储介质及电子设备
WO2024027246A1 (zh) 声音信号处理方法、装置、电子设备和存储介质
CN117012202B (zh) 语音通道识别方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23841862

Country of ref document: EP

Kind code of ref document: A1