WO2023060400A1 - 语音存在概率计算方法、系统、语音增强方法、系统以及耳机 - Google Patents

语音存在概率计算方法、系统、语音增强方法、系统以及耳机 Download PDF

Info

Publication number
WO2023060400A1
WO2023060400A1 PCT/CN2021/123111 CN2021123111W WO2023060400A1 WO 2023060400 A1 WO2023060400 A1 WO 2023060400A1 CN 2021123111 W CN2021123111 W CN 2021123111W WO 2023060400 A1 WO2023060400 A1 WO 2023060400A1
Authority
WO
WIPO (PCT)
Prior art keywords
probability
model
entropy
speech
voice
Prior art date
Application number
PCT/CN2021/123111
Other languages
English (en)
French (fr)
Inventor
肖乐
张承乾
廖风云
齐心
Original Assignee
深圳市韶音科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市韶音科技有限公司 filed Critical 深圳市韶音科技有限公司
Priority to KR1020237020638A priority Critical patent/KR20230109716A/ko
Priority to JP2023542599A priority patent/JP2024506237A/ja
Priority to CN202180077272.XA priority patent/CN116508328A/zh
Priority to PCT/CN2021/123111 priority patent/WO2023060400A1/zh
Priority to EP21960151.5A priority patent/EP4227941A4/en
Publication of WO2023060400A1 publication Critical patent/WO2023060400A1/zh
Priority to US18/305,398 priority patent/US20230260529A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/027Spatial or constructional arrangements of microphones, e.g. in dummy heads
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1058Manufacture or assembly
    • H04R1/1075Mountings of transducers in earphones or headphones

Definitions

  • This description relates to the technical field of voice signal processing, and in particular to a voice existence probability calculation method, system, voice enhancement method, system and earphones.
  • the noise covariance matrix is crucial.
  • the main method in the prior art is to calculate the noise covariance matrix based on the method of voice presence probability, such as estimating the voice presence probability by voice activity detection (Voice Activity Detection, referred to as VAD), and then calculating the noise covariance matrix.
  • voice Activity Detection Voice Activity Detection
  • VAD voice Activity Detection
  • the estimation accuracy of the speech existence probability in the prior art is insufficient, resulting in a low estimation accuracy of the noise covariance matrix, which in turn leads to a poor speech enhancement effect of the MVDR algorithm.
  • the MVDR algorithm in the prior art is mostly used in microphone array devices with a large number of microphones and large spacing, such as mobile phones and smart speakers, while the voice enhancement effect of devices with a small number of microphones and small spacing such as earphones is poor.
  • This specification provides a higher precision voice existence probability calculation method, system, voice enhancement method, system and earphone.
  • this specification provides a voice existence probability calculation method for M microphones distributed in a preset array shape, where M is an integer greater than 1, including: acquiring microphone signals output by the M microphones, The microphone signal satisfies the first model or the second model of Gaussian distribution, one of the first model and the second model is a speech presence model, and the other is a speech absence model; based on maximum likelihood estimation and The expectation maximization algorithm iteratively optimizes the first model and the second model until convergence, and in the iterative process, based on the entropy of the first probability when the microphone signal is the first model and the entropy of a second probability when the microphone signal is the second model, determining whether the speech presence model is the first model or the second model, the first probability being complementary to the second probability; and When the maximum likelihood estimation and expectation maximization algorithms converge, the probability that the microphone signal is the speech presence model is used as the speech presence probability of the microphone signal and output.
  • the first variance of the Gaussian distribution corresponding to the first model includes the product of the first parameter and the first spatial covariance matrix; and the second variance of the Gaussian distribution corresponding to the second model includes the first The product of the two parameters with the second spatial covariance matrix.
  • the iterative optimization of the first model and the second model based on maximum likelihood estimation and expectation maximization algorithm respectively includes: based on maximum likelihood estimation and expectation maximization algorithm, Constructing an objective function; determining optimization parameters, the optimization parameters including the first spatial covariance matrix and the second spatial covariance matrix; determining initial values of the optimization parameters; based on the objective function and the optimization parameters is an initial value of , performing multiple iterations on the optimization parameters until the objective function converges, including: determining the speech based on the entropy of the first probability and the entropy of the second probability in the multiple iterations There is a probability whether it is the first model or the second model; and outputting the converged value of the optimization parameter and its corresponding first probability and the second probability.
  • said determining whether said speech presence probability is said first model or said second model based on entropy of said first probability and entropy of said second probability in said plurality of iterations including: in any iteration of the plurality of iterations, calculating the entropy of the first probability and the entropy of the second probability, determining whether the speech existence probability is the first model or the second model, including: determining that the entropy of the first probability is greater than the entropy of the second probability, determining that the speech presence model is the second model; or determining that the entropy of the first probability is smaller than the entropy of the second probability,
  • the speech presence model is determined as the first model.
  • said determining whether said speech presence probability is said first model or said second model based on entropy of said first probability and entropy of said second probability in said plurality of iterations Including: in the first iteration of the plurality of iterations, calculating the entropy of the first probability and the entropy of the second probability, and determining whether the speech existence probability is the first model or the second model , comprising: determining that the entropy of the first probability is greater than the entropy of the second probability, determining that the speech presence model is the second model; or determining that the entropy of the first probability is less than the entropy of the second probability , determining that the speech presence model is the first model.
  • the multiple iterations of the optimization parameters further include: in each iteration of the multiple iterations: based on the entropy of the first probability and the entropy of the second probability modifying the first probability and the second probability, including: determining that the first model is the speech presence model, and the entropy of the first probability is greater than the entropy of the second probability, and the first The value corresponding to the probability is exchanged with the value corresponding to the second probability; or it is determined that the second model is the speech presence model, and the entropy of the second probability is greater than the entropy of the first probability, and the The value corresponding to the first probability is swapped with the value corresponding to the second probability; and the optimization parameter is updated based on the revised first probability and the second probability.
  • performing multiple iterations on the optimization parameters further includes: performing reversible correction on the optimization parameters in each iteration of the multiple iterations, including: determining that the optimization parameters are irreversible , correcting the optimized parameters through a deviation matrix, the deviation matrix including one of an identity matrix, a random matrix subject to a normal distribution or a uniform distribution.
  • this specification also provides a voice presence probability calculation system, including at least one storage medium and at least one processor, the at least one storage medium stores at least one instruction set for voice presence probability calculation; the at least one A processor, connected in communication with the at least one storage medium, wherein when the speech existence probability calculation system is running, the at least one processor reads the at least one instruction set and implements the speech as described in the first aspect of this specification. There is a probability calculation method.
  • this specification also provides a speech enhancement method for M microphones distributed in a preset array shape, where M is an integer greater than 1, including: acquiring the microphone signals output by the M microphones; The voice presence probability calculation method according to any one of claims 1-7, determining the voice presence probability of the microphone signal; determining the noise covariance matrix of the microphone signal based on the voice presence probability; based on the MVDR method and the noise space covariance matrix, determining filter coefficients corresponding to the microphone signals; and combining the microphone signals based on the filter coefficients to output a target audio signal.
  • this specification also provides a speech enhancement system, including at least one storage medium and at least one processor, and the at least one storage medium stores at least one instruction set for speech enhancement; the at least one processor simultaneously The at least one storage medium is connected by communication, wherein when the speech enhancement system is running, the at least one processor reads the at least one instruction set and executes the speech enhancement method described in the third aspect of this specification.
  • this specification also provides an earphone, including a microphone array and a computing device, the microphone array includes M microphones distributed in a preset array shape, and M is an integer greater than 1; when the computing device runs Communicatively connected with the microphone array, and execute the speech enhancement method described in the third aspect of this specification.
  • the M microphones are linearly distributed, and M is not greater than 5, and the distance between adjacent microphones among the M microphones is between 20 mm and 40 mm.
  • the earphone further includes a first shell and a second shell, the microphone array is mounted on the first shell, the first shell includes a first interface, and a first A magnetic device; the computing device is installed on the second housing, the second housing includes a second interface, and is provided with a second magnetic device, wherein the first magnetic device and the second magnetic device The suction force between the first shell and the second shell is detachably connected.
  • the first housing further includes contacts disposed at the first interface and communicatively connected with the microphone array; and the second housing further includes guide rails disposed at the first interface
  • the second interface is communicatively connected with the computing device, and when the first casing is connected to the second casing, the contacts are in contact with the guide rail, so that the microphone array communicates with the computing device connect.
  • each microphone in the microphone array can collect audio from multiple sound sources in the space, and output corresponding microphone signals.
  • the audio signal of each sound source satisfies a Gaussian distribution.
  • Multiple microphone signals output by multiple microphone arrays satisfy a joint Gaussian distribution.
  • the voice presence probability calculation method, system, voice enhancement method, system and earphone can respectively acquire the voice presence model and the absence of voice in the presence of voice in the multiple microphone signals
  • the speech does not exist model, and based on the maximum likelihood estimation and the expectation maximization algorithm through multiple iteration optimization, and in the iterative process according to the entropy of the speech existence probability and the speech non-existence probability entropy, the speech existence probability and the speech non-existence probability
  • the non-existence probability is corrected to calculate and determine the model parameters of the speech presence model and the model parameters when the speech does not exist.
  • the maximum likelihood estimation and expectation maximization algorithm converge, the speech presence probability corresponding to the speech presence model is obtained.
  • the voice presence probability calculation method, system, voice enhancement method, system, and earphone correct the voice presence probability and the voice non-existence probability in the iterative process by comparing the entropy of the voice presence probability and the entropy of the voice non-existence probability, so as to Obtain faster convergence speed and better convergence results, so that the estimation accuracy of speech existence probability and noise covariance matrix is higher, and then the speech enhancement effect of MVDR is improved.
  • Fig. 1 shows the hardware schematic diagram of a kind of speech presence probability computing system provided according to the embodiment of this description
  • FIG. 2A shows a schematic diagram of an exploded structure of an electronic device provided according to an embodiment of this specification
  • Fig. 2B shows a front view of a first casing provided according to an embodiment of the present specification
  • Fig. 2C shows a top view of a first housing provided according to an embodiment of the present specification
  • Fig. 2D shows a front view of a second casing provided according to an embodiment of the present specification
  • Fig. 2E shows a bottom view of a second casing provided according to an embodiment of the present specification
  • FIG. 3 shows a flow chart of a method for calculating a voice presence probability according to an embodiment of the specification
  • FIG. 4 shows a flow chart of iterative optimization provided according to an embodiment of this specification
  • FIG. 5 shows a flow chart of multiple iterations provided according to an embodiment of this specification
  • FIG. 6 shows another flow chart of multiple iterations provided according to an embodiment of the specification.
  • Fig. 7 shows a flow chart of a speech enhancement method provided according to an embodiment of this specification.
  • Minimum Variance Distortionless Response It is an adaptive beamforming algorithm based on the maximum signal-to-interference-noise ratio (SINR) criterion.
  • SINR signal-to-interference-noise ratio
  • the MVDR algorithm can adaptively make the array output power in the desired direction Minimum and maximum SINR. Its goal is to minimize the variance of the recorded signal. If the noise signal and the desired signal are uncorrelated, then the variance of the recorded signal is the sum of the variances of the desired and noise signals. Therefore, MVDR solutions seek to minimize this sum, thereby mitigating the effects of noisy signals. Its principle is to select appropriate filter coefficients under the constraint that the desired signal is not distorted, so that the average power output by the array is minimized.
  • Speech presence probability the probability that the target speech signal exists in the current audio signal.
  • Fig. 1 shows a hardware schematic diagram of a voice presence probability calculation system provided according to an embodiment of this specification.
  • the voice presence probability calculation system can be applied to the electronic device 200 .
  • the electronic device 200 may be a wireless earphone, a wired earphone, or a smart wearable device, such as a device with audio processing functions such as smart glasses, a smart helmet, or a smart watch.
  • the electronic device 200 may also be a mobile device, a tablet computer, a laptop computer, a vehicle built-in device, or the like, or any combination thereof.
  • a mobile device may include a smart home device, a smart mobile device, or the like, or any combination thereof.
  • the smart mobile device may include a mobile phone, a personal digital assistant, a game device, a navigation device, an Ultra-mobile Personal Computer (Ultra-mobile Personal Computer, UMPC), etc., or any combination thereof.
  • the smart home device may include a smart TV, a desktop computer, etc., or any combination thereof.
  • built-in devices in a motor vehicle may include an on-board computer, on-board television, and the like.
  • the electronic device 200 as an earphone as an example for description.
  • the earphone can be a wireless earphone or a wired earphone.
  • the electronic device 200 may include a microphone array 220 and a computing device 240 .
  • the microphone array 220 may be an audio collection device of the electronic device 200 .
  • the microphone array 220 may be configured to acquire local audio and output a microphone signal, that is, an electronic signal carrying audio information.
  • the microphone array 220 may include M microphones 222 distributed in a preset array shape. Wherein, the M is an integer greater than 1.
  • the M microphones 222 may be distributed uniformly or non-uniformly.
  • M microphones 222 may output microphone signals.
  • the M microphones 222 can output M microphone signals. Each microphone 222 corresponds to a microphone signal.
  • the M microphone signals are collectively referred to as the microphone signals. In some embodiments, the M microphones 222 may be distributed linearly.
  • the M microphones 222 may also be distributed in arrays of other shapes, such as circular arrays, rectangular arrays, and so on. For the convenience of description, in the following description, we will take M microphones 222 distributed linearly as an example for description.
  • M can be any integer greater than 1, such as 2, 3, 4, 5, or even more, and so on.
  • M may be an integer greater than 1 and not greater than 5, such as in products such as earphones.
  • the distance between adjacent microphones 222 among the M microphones 222 may be between 20 mm and 40 mm. In some embodiments, the distance between adjacent microphones 222 may be smaller, such as between 10 mm and 20 mm.
  • the microphone 222 may be a bone conduction microphone that directly collects human body vibration signals.
  • the bone conduction microphone may include a vibration sensor, such as an optical vibration sensor, an acceleration sensor, and the like.
  • the vibration sensor can collect a mechanical vibration signal (for example, a signal generated by the vibration of the skin or bone when the user speaks), and convert the mechanical vibration signal into an electrical signal.
  • the mechanical vibration signal mentioned here mainly refers to the vibration transmitted through the solid.
  • the bone conduction microphone contacts the user's skin or bones through the vibration sensor or the vibration component connected to the vibration sensor, thereby collecting the vibration signal generated by the bone or skin when the user makes a sound, and converting the vibration signal into an electrical signal .
  • the vibration sensor may be a device that is sensitive to mechanical vibration but insensitive to air vibration (ie, the response capability of the vibration sensor to mechanical vibration exceeds the response capability of the vibration sensor to air vibration). Since the bone conduction microphone can directly pick up the vibration signal of the sound emitting part, the bone conduction microphone can reduce the influence of environmental noise.
  • the microphone 222 may also be an air conduction microphone that directly collects air vibration signals.
  • the air conduction microphone collects the air vibration signal caused by the user when making a sound, and converts the air vibration signal into an electrical signal.
  • M microphones 220 may be M bone conduction microphones. In some embodiments, M microphones 220 may also be M air conduction microphones. In some embodiments, the M microphones 220 may include both bone conduction microphones and air conduction microphones. Certainly, the microphone 222 may also be other types of microphones. Such as optical microphones, microphones that receive myoelectric signals, and so on.
  • Computing device 240 may be communicatively coupled with microphone array 220 .
  • the communication connection refers to any form of connection capable of receiving information directly or indirectly.
  • the computing device 240 can communicate data with the microphone array 220 through a wireless communication connection; in some embodiments, the computing device 240 can also be directly connected with the microphone array 220 to transfer data with each other through wires; in some embodiments
  • the computing device 240 can also establish an indirect connection with the microphone array 220 by directly connecting with other circuits through wires, so as to realize mutual data transfer.
  • the direct connection between the computing device 240 and the microphone array 220 will be described as an example.
  • the computing device 240 may be a hardware device with a data information processing function.
  • the voice presence probability calculation system may include a calculation device 240 .
  • the voice presence probability calculation system may be applied to the computing device 240. That is, the voice presence probability calculation system can run on the computing device 240 .
  • the voice existence probability calculation system may include a hardware device with a data information processing function and a necessary program for driving the hardware device to work.
  • the speech existence probability calculation system may also be only a hardware device with data processing capability, or just a program running on the hardware device.
  • the voice presence probability calculation system may store data or instructions for executing the voice presence probability calculation method described in this specification, and may execute the data and/or instructions.
  • the voice presence probability calculation system runs on the computing device 240, the voice presence probability calculation system can obtain the microphone signal from the microphone array 220 based on the communication connection, and execute the data or the voice presence probability calculation method described in this specification.
  • An instruction is used to calculate the existence probability of speech in the microphone signal.
  • the method for calculating the voice presence probability is introduced in other parts of this specification. For example, the method for calculating the voice presence probability is introduced in the description of FIGS. 3 to 6 .
  • computing device 240 may include at least one storage medium 243 and at least one processor 242 .
  • the electronic device 200 may also include a communication port 245 and an internal communication bus 241 .
  • Internal communication bus 241 may connect various system components, including storage media 243 , processor 242 and communication ports 245 .
  • the communication port 245 can be used for data communication between the computing device 240 and the outside world.
  • computing device 240 may acquire the microphone signals from microphone array 220 via communication port 245 .
  • At least one storage medium 243 may include a data storage device.
  • the data storage device may be a non-transitory storage medium or a temporary storage medium.
  • the data storage device may include one or more of a magnetic disk, a read-only storage medium (ROM) or a random-access storage medium (RAM).
  • the storage medium 243 may also include at least one instruction set stored in the data storage device, for calculating the voice presence probability on the microphone signal.
  • the instructions are computer program codes, and the computer program codes may include programs, routines, objects, components, data structures, procedures, modules, etc. for executing the method for calculating the voice presence probability provided in this specification.
  • At least one processor 242 may be communicatively connected to at least one storage medium 243 through an internal communication bus 241 .
  • the communication connection refers to any form of connection capable of receiving information directly or indirectly.
  • At least one processor 242 is configured to execute the above at least one instruction set.
  • the voice presence probability calculation system can run on the computing device 240, at least one processor 242 reads the at least one instruction set, and executes the voice presence probability calculation method provided in this specification according to the instructions of the at least one instruction set.
  • the processor 242 can execute all the steps included in the voice existence probability calculation method.
  • Processor 242 may be in the form of one or more processors.
  • processor 242 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced instruction set computers (RISC), Application Specific Integrated Circuit (ASIC), Application Specific Instruction Set Processor (ASIP), Central Processing Unit (CPU), Graphics Processing Unit (GPU), Physical Processing Unit (PPU), Microcontroller Unit, Digital Signal Processor ( DSP), Field Programmable Gate Array (FPGA), Advanced RISC Machine (ARM), Programmable Logic Device (PLD), any circuit or processor capable of performing one or more functions, etc., or any combination thereof.
  • RISC reduced instruction set computers
  • ASIC Application Specific Integrated Circuit
  • ASIP Application Specific Instruction Set Processor
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • PPU Physical Processing Unit
  • Microcontroller Unit Microcontroller Unit
  • DSP Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • ARM Programmable Logic Device
  • PLD Programmable Logic Device
  • the computing device 240 in this specification may also include multiple processors 242, therefore, the operations and/or method steps disclosed in this specification may be executed by one processor as described in this specification, or by multiple processors. joint execution of the processors.
  • the processor 242 of the computing device 240 executes step A and step B in this specification, it should be understood that step A and step B can also be jointly or separately executed by two different processors 242 (for example, the first processor Step A is performed and the second processor performs step B, or the first and second processors jointly perform steps A and B).
  • FIG. 2A shows a schematic diagram of an exploded structure of an electronic device 200 provided according to an embodiment of the present specification.
  • the electronic device 200 may include a microphone array 220 , a computing device 240 , a first housing 260 and a second housing 280 .
  • the first housing 260 may be an installation base of the microphone array 220 .
  • the microphone array 220 may be installed inside the first housing 260 .
  • the shape of the first casing 260 can be adaptively designed according to the distribution shape of the microphone array 220 , which is not limited in this specification.
  • the second housing 280 may be a mounting base for the computing device 240 .
  • the computing device 240 may be installed inside the second housing 280 .
  • the shape of the second housing 280 can be adaptively designed according to the shape of the computing device 240 , which is not limited in this specification.
  • the electronic device 200 is an earphone
  • the second casing 280 can be connected with the wearing part.
  • the second case 280 may be connected with the first case 260 .
  • microphone array 220 may be electrically connected to computing device 240 .
  • the microphone array 220 may be electrically connected to the computing device 240 through the connection of the first housing 260 and the second housing 280 .
  • the first shell 260 can be fixedly connected with the second shell 280 , for example, integrally formed, welded, riveted, glued, and so on.
  • the first housing 260 can be detachably connected to the second housing 280 .
  • Computing device 240 may be communicatively coupled with various microphone arrays 220 . Specifically, different microphone arrays 220 may have different numbers of microphones 222 in the microphone array 220, different array shapes, different distances between the microphones 222, and different installation angles of the microphone array 220 in the first housing 260. The installation position in a case 260 is different and so on.
  • the user can replace the corresponding microphone array 220 according to different application scenarios, so that the electronic device 200 is applicable to a wider range of scenarios. For example, when the distance between the user and the electronic device 200 is relatively short in the application scenario, the user can replace the microphone array 220 with a smaller spacing. For another example, when the distance between the user and the electronic device 200 is relatively short in the application scenario, the user can replace it with a microphone array 220 with a larger spacing and a larger number, and so on.
  • the detachable connection may be any form of physical connection, such as screw connection, buckle connection, magnetic suction connection, and so on.
  • the first housing 260 and the second housing 280 may be magnetically connected. That is, the detachable connection between the first housing 260 and the second housing 280 is achieved through the adsorption force of the magnetic device.
  • FIG. 2B shows a front view of a first housing 260 provided according to an embodiment of this specification
  • FIG. 2C shows a top view of a first housing 260 provided according to an embodiment of this specification.
  • the first housing 260 may include a first interface 262 .
  • the first housing 260 may also include contacts 266 .
  • the first housing 260 may further include an angle sensor (not shown in FIGS. 2B and 2C ).
  • the first interface 262 may be an installation interface of the first housing 260 and the second housing 280 .
  • first interface 262 may be circular.
  • the first interface 262 can be rotatably connected with the second housing 280 .
  • the first housing 260 When the first housing 260 is installed on the second housing 280, the first housing 260 can rotate relative to the second housing 280, adjust the angle of the first housing 260 relative to the second housing 280, thereby adjusting the microphone angle of the array 220 .
  • a first magnetic device 263 may be disposed on the first interface 262 .
  • the first magnetic device 263 may be disposed at a position where the first interface 262 is close to the second housing 280 .
  • the first magnetic device 263 can generate a magnetic adsorption force, so as to achieve a detachable connection with the second housing 280 .
  • the first housing 260 is quickly connected to the second housing 280 through the suction force.
  • the first housing 260 can be rotated relative to the second housing 280 to adjust the angle of the microphone array 220 . Under the action of the adsorption force, when the first housing 260 rotates relative to the second housing 280 , the connection between the first housing 260 and the second housing 280 can still be maintained.
  • a first positioning device (not shown in FIG. 2B and FIG. 2C ) may also be provided on the first interface 262 .
  • the first positioning device may be a positioning step protruding outward, or a positioning hole extending inward.
  • the first positioning device can cooperate with the second housing 280 to realize quick installation of the first housing 260 and the second housing 280 .
  • the first housing 260 may further include contacts 266 .
  • Contacts 266 may be installed at the first interface 262 .
  • the contacts 266 can protrude outward from the first interface 262 .
  • the contacts 266 can be elastically connected to the first interface 262 .
  • Contacts 266 may be communicatively coupled with M microphones 222 in microphone array 220 .
  • the contacts 266 can be made of resilient metal for data transmission.
  • the first housing 260 is connected to the second housing 280
  • the microphone array 220 can communicate with the computing device 240 through the contacts 266 .
  • the contacts 266 may be distributed in a circle. After the first housing 260 is connected with the second housing 280 , when the first housing 260 rotates relative to the second housing 280 , the contacts 266 can also rotate relative to the second housing 280 and remain in contact with the computing device 240 communication connection.
  • an angle sensor (not shown in FIG. 2B and FIG. 2C ) may also be provided on the first housing 260 .
  • the angle sensor may be communicatively coupled to contacts 266 enabling a communicative link with computing device 240 .
  • the angle sensor can collect angle data of the first housing 260, thereby determining the angle of the microphone array 220, and providing reference data for subsequent calculation of voice existence probability.
  • FIG. 2D shows a front view of a second casing 280 provided according to an embodiment of the present specification
  • FIG. 2E shows a bottom view of a second casing 280 provided according to an embodiment of this specification.
  • the second housing 280 may include a second interface 282 .
  • the second housing 280 may further include a rail 286 .
  • the second interface 282 may be an installation interface between the second housing 280 and the first housing 260 .
  • the second interface 282 may be circular.
  • the second interface 282 can be rotatably connected with the first interface 262 of the first housing 260 .
  • the first housing 260 When the first housing 260 is installed on the second housing 280, the first housing 260 can rotate relative to the second housing 280, adjust the angle of the first housing 260 relative to the second housing 280, thereby adjusting the microphone angle of the array 220 .
  • a second magnetic device 283 may be disposed on the second interface 282 .
  • the second magnetic device 283 may be disposed at a position where the second interface 282 is close to the first housing 260 .
  • the second magnetic device 283 can generate a magnetic adsorption force, so as to achieve a detachable connection with the first interface 262 .
  • the second magnetic device 283 may cooperate with the first magnetic device 263 .
  • the first housing 260 is close to the second housing 260
  • the first housing 260 is quickly installed on the second housing 280 through the adsorption force between the second magnetic device 283 and the first magnetic device 263 .
  • the second magnetic device 283 is opposite to the first magnetic device 263 .
  • the first housing 260 can be rotated relative to the second housing 280 to adjust the angle of the microphone array 220 .
  • the connection between the first housing 260 and the second housing 280 can still be maintained.
  • a second positioning device (not shown in FIG. 2D and FIG. 2E ) may also be provided on the second interface 282 .
  • the second positioning device may be a positioning step protruding outward, or a positioning hole extending inward.
  • the second positioning device can cooperate with the first positioning device of the first housing 260 to realize quick installation of the first housing 260 and the second housing 280 .
  • the first positioning device is the positioning step
  • the second positioning device may be the positioning hole.
  • the second positioning device may be the positioning step.
  • the second housing 280 may further include a guide rail 286 .
  • a guide rail 286 may be installed at the second interface 282 .
  • Rail 286 may be communicatively coupled with computing device 240 .
  • the guide rail 286 can be made of metal to realize data transmission.
  • the contact 266 can contact the guide rail 286 to form a communication connection, so as to realize the communication connection between the microphone array 220 and the computing device 240 to realize data transmission.
  • the contacts 266 can be elastically connected to the first interface 262 .
  • the contact 266 can be completely contacted with the guide rail 286 to realize a reliable communication connection.
  • the rails 286 may be distributed in a circle.
  • Fig. 3 shows a flowchart of a method P100 for calculating a voice presence probability according to an embodiment of the present specification.
  • the method P100 may calculate a speech presence probability of the microphone signal.
  • the processor 242 may execute the method P100.
  • the method P100 may include:
  • S120 Acquire microphone signals output by M microphones 222 .
  • each microphone 222 can output a corresponding microphone signal.
  • the M microphones 222 correspond to the M microphone signals.
  • the method P100 calculates the voice presence probability, the calculation may be performed based on all the microphone signals in the M microphone signals, or based on part of the microphone signals. Therefore, the microphone signals may include M microphone signals corresponding to M microphone signals or part of the microphone signals. In the following description of this specification, description will be made by taking the microphone signal may include M microphone signals corresponding to M microphone signals as an example.
  • the microphone 222 can collect the noise in the surrounding environment, and can also collect the target voice of the target user.
  • N signal sources there are N signal sources around the microphone 222, respectively s 1 (t), . . . , s N (t).
  • s v (t) is a signal source vector composed of N signal sources s 1 (t), ..., s N (t).
  • the sound field mode of the N signal sources s v (t) is a far field mode.
  • N signal sources s v (t) can be regarded as plane waves.
  • the microphone signal at time t we denote the microphone signal at time t as x(t).
  • the microphone signal x(t) may be a signal vector composed of M microphone signals. At this point, the microphone signal x(t) can be expressed as the following formula:
  • a v ( ⁇ ) is the steering vector of N signal sources s v (t).
  • ⁇ 1 , . . . , ⁇ N are the incident angles between the N signal sources s 1 (t), . . . , s N (t) and the microphone 222 , respectively.
  • a v ( ⁇ ) may be a function related to ⁇ 1 , . . . , ⁇ N and the distance d 1 , .
  • the computing device 240 pre-stores relative positional relationships of the M microphones 222 , such as relative distances or relative coordinates. That is, d 1 , . . . , d M-1 are pre-stored in the computing device 240 .
  • the microphone signal x(t) is a time domain signal.
  • the computing device 240 may also perform spectrum analysis on the microphone signal x(t). Specifically, the computing device 240 may perform Fourier transform based on the time-domain signal x(t) of the microphone signal to obtain the frequency-domain signal x f,t of the microphone signal.
  • the microphone signal x f,t in the frequency domain will be used for description. At this time, the microphone signal x f, t can be expressed as the following formula:
  • N signal sources Gaussian distribution can be satisfied. Can be expressed as the following formula:
  • the Gaussian distribution Can be a complex Gaussian distribution. in, for Variance.
  • the microphone signal x f, t also satisfies the Gaussian distribution.
  • the microphone signal x f, t may be a speech presence model or a speech absence model satisfying a Gaussian distribution.
  • x f, t can be expressed as the following formula:
  • the speech presence probability corresponding to the microphone signal x f, t may be the probability that the microphone signal x f, t belongs to a speech presence model.
  • the speech existence probability corresponding to the microphone signal x f, t as Described microphone signal x f
  • the speech non-existence probability corresponding to t is defined as
  • the speech presence distribution probability corresponding to the microphone signal x f, t as We define the speech presence distribution probability corresponding to the microphone signal x f, t in the speech absence model as The microphone signal x f, the probability of existence of speech corresponding to t Can be expressed as the following formula:
  • the computing device 240 needs to determine the voice presence variance corresponding to the voice presence model And there is no variance in the speech corresponding to the speech non-existence model It is assumed that the microphone signal xf,t can be a first model or a second model satisfying a Gaussian distribution.
  • One of the first model and the second model is a speech presence model, and the other is a speech absence model.
  • first variance of the Gaussian distribution corresponding to the first model is the first variance of the Gaussian distribution corresponding to the first model.
  • first variance as the first parameter and the first spatial covariance matrix product of .
  • the computing device 240 needs to determine which of the first model and the second model is the speech presence model and which is the speech absence model.
  • S140 Perform iterative optimization on the first model and the second model based on the maximum likelihood estimation and expectation maximization algorithms, respectively, until convergence.
  • the computing device 240 may use an iterative optimization method to iteratively optimize the first model and the second model respectively, so as to obtain the first variance of the first model and the second variance of the second model
  • the calculation means 240 may be based on the microphone signal x f,t being the first probability when the first model entropy and the second probability when the microphone signal x f,t is the second model entropy It is determined whether a speech presence model is the first model or the second model.
  • first probability Can be the probability that the microphone signal xf,t belongs to the first model in both the first model and the second model.
  • second probability Can be the probability that the microphone signal xf,t belongs to the second model in the first model and the second model.
  • first probability with second probability Complementary that is We define the first distribution probability corresponding to the microphone signal x f, t in the first model as We define the second distribution probability corresponding to the microphone signal x f, t in the second model as The microphone signal x f,t corresponds to the first probability Can be expressed as the following formula:
  • the microphone signal x f,t corresponds to the second probability Can be expressed as the following formula:
  • step S140 may include:
  • the unknown parameters include the first variance of the first model and the second variance of the second model where the hidden variable is the first probability that the microphone signal x f, t belongs to the first model and the first probability that the microphone signal x f,t belongs to the second model Therefore, the maximum likelihood estimation and expectation maximization algorithm are used to estimate the first variance and the second variance of the second model Perform iterative optimization.
  • the objective function is the maximum likelihood estimation function.
  • the maximum likelihood estimation function can be expressed as the following formula:
  • the optimization parameters may include the first spatial covariance matrix and the second spatial covariance matrix
  • first spatial covariance matrix initial value of and the second spatial covariance matrix initial value Can be the same or different.
  • the first spatial covariance matrix initial value of and/or the second spatial covariance matrix initial value Can be the identity matrix I N .
  • the first spatial covariance matrix initial value of and/or the second spatial covariance matrix initial value It can be directly calculated based on several adjacent frames of microphone signals. at this time and / or Can be expressed as the following formula:
  • the computing device 240 may, during the multiple iterations, based on the first probability entropy and the second probability entropy It is determined whether the speech presence probability is the first model or the second model.
  • the computing device 240 may, in any one of the multiple iterations, based on the first probability entropy and the second probability entropy Determine whether the speech existence probability is the first model or the second model, as shown in FIG. 5 .
  • Fig. 5 shows a flow chart of multiple iterations provided according to the embodiment of this specification, corresponding to step S146. As shown in Figure 5, step S146 may be included in each iteration:
  • step S146-2 may be, when it is determined that the optimization parameters are irreversible, correct the optimization parameters through a deviation matrix.
  • the deviation matrix may include one of a unit matrix, a random matrix subject to normal distribution or uniform distribution.
  • the optimization parameters include the first spatial covariance matrix and the second spatial covariance matrix According to formula (11) and formula (12), it can be seen that to obtain the first parameter and the second parameter First spatial covariance matrix and the second spatial covariance matrix Need to be reversible. The larger the condition number of the matrix, the closer the matrix is to a singular matrix (non-invertible matrix).
  • the first spatial covariance matrix or the second spatial covariance matrix When it is irreversible (that is, the matrix condition number is greater than a certain threshold ⁇ ), give the first spatial covariance matrix or the second spatial covariance matrix Add a slight perturbation to make corrections to ensure its reversibility.
  • the calculation device 240 can calculate the first spatial covariance matrix and the second spatial covariance matrix Make reversible judgments. if or then represents the first spatial covariance matrix or the second spatial covariance matrix Irreversible, reversible correction is required.
  • is the condition number threshold.
  • n 10000. In some embodiments, n can be larger or smaller.
  • the first spatial covariance matrix or the second spatial covariance matrix When the first spatial covariance matrix or the second spatial covariance matrix When it is irreversible, the first spatial covariance matrix can be changed by the bias matrix Q or the second spatial covariance matrix Make corrections. At this point, the first spatial covariance matrix or the second spatial covariance matrix Can be expressed as the following formula:
  • Q is the deviation matrix.
  • Step S146-6 may include:
  • step S146 may also include:
  • Step S146-9 may be performed during the iteration process, or after the iteration ends, with the first probability in any one of the multiple iterations and the second probability
  • the first probability entropy and the second probability entropy Therefore, it is determined whether the speech existence probability is the first model or the second model.
  • Entropy represents the degree of chaos, or disorder, of a system. The more disordered the system, the greater the entropy value; the more ordered the system, the smaller the entropy value. All noise signals in N signal sources are more disordered than speech signals in N signal sources. Therefore, the entropy of the speech absence model is larger than that of the speech presence model.
  • the calculation device 240 can obtain the first probability and the second probability and calculate the first probability entropy and the second probability entropy when the first probability entropy greater than the second probability entropy , the computing device 240 may determine that the speech presence model is the second model, and the first model is the speech absence model. when the first probability entropy less than second probability entropy , the computing device 240 may determine that the speech presence model is the first model, and the second model is the speech absence model.
  • computing device 240 may, in a first iteration of said plurality of iterations, based on said first probability entropy and the second probability entropy Determining whether the speech existence probability is the first model or the second model, and in each iteration of the subsequent iterations, the first probability and the second probability Amendments are made to correct the phenomenon of misjudgment of the probability of speech existence, as shown in FIG. 6 .
  • Fig. 6 shows another flow chart of multiple iterations provided according to the embodiment of this specification, corresponding to step S146. As shown in Figure 6, step S146 may include:
  • the calculation device 240 may determine the first parameter based on formula (11) and formula (12) in the first iteration and the second parameter The first probability is then determined based on formula (8) and formula (9) and the second probability Then calculate the first probability entropy and the second probability entropy and compare. when the first probability entropy greater than the second probability entropy , the computing device 240 may determine that the speech presence model is the second model, and the first model is the speech absence model. when the first probability entropy less than second probability entropy , the computing device 240 may determine that the speech presence model is the first model, and the second model is the speech absence model.
  • step S146 may also be included in each iteration after the first iteration:
  • Step S146-11 Perform reversible correction on the optimization parameters. Step S146-2 as mentioned above will not be repeated here.
  • step S146-14 may be that the calculation device 240 calculates the first probability entropy and the second probability entropy and compare.
  • the speech presence model is the first model
  • the first probability entropy greater than the second probability entropy then the first probability The corresponding value and the second probability The corresponding values are swapped. coming first probability The corresponding value is updated to the second probability corresponding to the value, the second probability The corresponding value is updated to the first probability corresponding value.
  • the speech presence model is the first model, if the first probability entropy less than second probability entropy then the first probability and the second probability Make corrections.
  • the speech presence model is the second model
  • the first probability entropy less than second probability entropy then the first probability The corresponding value and the second probability The corresponding values are swapped. coming first probability The corresponding value is updated to the second probability corresponding to the value, the second probability The corresponding value is updated to the first probability corresponding value.
  • the speech presence model is the second model, if the first probability entropy greater than the second probability entropy then the first probability and the second probability Make corrections.
  • step S146-14 and step S146-15 the entropy of the speech presence model can be made smaller than the entropy of the speech absence model during each iteration, so as to ensure that each iteration converges toward the target direction, thereby speeding up the convergence speed.
  • Step S146-16 may include:
  • step S140 may also include:
  • the calculation device 240 may output the value of the corresponding optimization parameter when the objective function converges as the convergence value of the optimization parameter.
  • the calculation device 240 can calculate the first probability corresponding to the convergence value of the optimization parameter and the second probability to output.
  • the optimized parameter first spatial covariance matrix and the second spatial covariance matrix is based on the first probability and the second probability calculated.
  • the computing device 240 can convert the first spatial covariance matrix and the second spatial covariance matrix Corresponding first probability and the second probability output.
  • the method P100 may also include:
  • the computing device 240 may base on the first probability entropy and the second probability entropy It is determined whether the speech presence model is the first model or the second model.
  • the probability that the microphone signal x f,t is the speech presence model can be the first probability that the microphone signal x f,t is the first model
  • the speech existence probability of the microphone signal x f, t Can be when the objective function converges, the first spatial covariance matrix
  • the probability that the microphone signal x f,t is the speech presence model can be the second probability that the microphone signal x f,t is the second model
  • the speech existence probability of the microphone signal x f, t Can be when the objective function converges, the second spatial covariance matrix
  • Calculation device 240 can put speech existence probability Output to other computing modules, such as speech enhancement modules, etc.
  • the calculation device 240 can use the first probability corresponding to the first model entropy The second probability corresponding to the second model entropy To determine which of the first model and the second model is the speech presence model, which one is the speech non-existence model, so as to obtain the speech presence probability of the microphone signal x f, t To correct the misjudgment phenomenon of speech probability in the iterative process and improve the probability of speech existence The precision of the calculation.
  • the computing device 240 may, in the iterative process, according to the first probability entropy and the second probability entropy to the first probability and the second probability Make corrections to make the optimization parameters iterate towards the target direction, thereby speeding up the convergence speed and further increasing the probability of voice existence calculation accuracy.
  • the speech enhancement system can also be applied to the electronic device 200 .
  • the speech enhancement system may include computing device 240 .
  • a speech enhancement system may be applied to computing device 240 . That is, the speech enhancement system may run on computing device 240 .
  • the speech enhancement system may include hardware devices with data information processing functions and necessary programs to drive the hardware devices to work.
  • the speech enhancement system may also be only a hardware device with data processing capability, or just a program running on the hardware device.
  • the speech enhancement system may store data or instructions for executing the speech enhancement method described in this specification, and may execute the data and/or instructions.
  • the speech enhancement system can obtain the microphone signal from the microphone array 220 based on the communication connection, and execute the data or instructions of the speech enhancement method described in this specification.
  • the speech enhancement method is described elsewhere in this specification. For example, the speech enhancement method is introduced in the description of FIG. 7 .
  • the storage medium 243 may also include at least one instruction set stored in the data storage device, for performing MVDR-based speech enhancement calculation on the microphone signal.
  • the instructions are computer program codes, and the computer program codes may include programs, routines, objects, components, data structures, procedures, modules, etc. for executing the speech enhancement method provided in this specification.
  • the processor 242 may read the at least one instruction set, and execute the speech enhancement method provided in this specification according to the instruction of the at least one instruction set. The processor 242 can execute all the steps included in the speech enhancement method.
  • Fig. 7 shows a flow chart of a speech enhancement method P200 provided according to an embodiment of this specification.
  • the method P200 may perform speech enhancement on the microphone signal based on the MVDR method.
  • the processor 242 may execute the method P200.
  • the method P200 may include:
  • S220 Acquire microphone signals x f,t output by the M microphones.
  • noise covariance matrix Can be expressed as the following formula:
  • the filter coefficient ⁇ f,t can be expressed as the following formula:
  • ⁇ s is the incident angle of the signal corresponding to the target direction.
  • ⁇ s is known, when is also known.
  • computing means 240 may base on the noise covariance matrix To perform subspace decomposition, calculate
  • the filter coefficient ⁇ f,t can also be expressed as the following formula:
  • the convergence value corresponding to the speech non-existence model is the convergence value corresponding to the speech non-existence model.
  • the first model is a speech non-existent model, for The corresponding convergence value.
  • the second model is a speech absence model, for The corresponding convergence value.
  • the target audio signal yf,t can be expressed as the following formula:
  • the computing device 240 can output the target audio signal y f,t to other electronic devices, such as remote communication devices.
  • the voice existence probability calculation system and method P100 , the voice enhancement system and method P200 , and the electronic device 200 provided in this specification are used in a microphone 220 array composed of a plurality of microphones 222 .
  • the voice presence probability calculation system and method P100, the voice enhancement system and method P200, and the electronic device 200 can respectively obtain a voice presence model when there is voice in a plurality of microphone signals and a voice absence model when there is no voice, and based on extreme
  • the large likelihood estimation and expectation maximization algorithm is optimized through multiple iterations, and in the iterative process, according to the entropy of the speech existence probability and the speech non-existence probability entropy, the speech existence probability and the speech non-existence probability are corrected to calculate and determine the speech
  • the model parameters of the existing model and the model parameters when the voice does not exist, when the maximum likelihood estimation and the expectation maximization algorithm converge, obtain the voice presence probability corresponding to the voice presence model.
  • the speech presence probability calculation system and method P100, the speech enhancement system and method P200, and the electronic device 200 compare the speech presence probability and the speech non-existence probability in the iterative process by comparing the entropy of the speech presence probability and the speech non-existence probability entropy Amendments are made to obtain faster convergence speed and better convergence results, so that the estimation accuracy of speech existence probability and noise covariance matrix is higher, and then the speech enhancement effect of MVDR is improved.
  • Another aspect of this specification provides a non-transitory storage medium, which stores at least one set of executable instructions used for voice existence probability calculation.
  • the executable instructions When the executable instructions are executed by a processor, the executable instructions guide the processing.
  • the device implements the steps of the voice presence probability calculation method P100 described in this specification.
  • various aspects of this specification can also be implemented in the form of a program product, which includes program codes.
  • the program product When the program product is run on a computing device (such as the computing device 240 ), the program code is used to make the computing device execute the voice presence probability calculation steps described in this specification.
  • a program product for implementing the method described above may include program code on a portable compact disk read only memory (CD-ROM), and may be run on a computing device.
  • CD-ROM portable compact disk read only memory
  • a readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system (such as the processor 242).
  • the program product may reside on any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • readable storage media include: electrical connections with one or more conductors, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the computer readable storage medium may include a data signal carrying readable program code in baseband or as part of a carrier wave traveling as a data signal. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable storage medium may also be any readable medium other than a readable storage medium that can send, propagate or transport a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the readable storage medium may be transmitted by any suitable medium, including but not limited to wireless, cable, optical cable, RF, etc., or any suitable combination of the above.
  • Program code for carrying out the operations of this specification may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language - such as "C" or similar programming language.
  • the program code may execute entirely on the computing device, partly on the computing device, as a stand-alone software package, partly on the computing device and partly on a remote computing device or entirely on the remote computing device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

本说明书提供的语音存在概率计算方法、系统、语音增强方法、系统以及耳机,通过对比语音存在概率的熵和语音不存在概率的熵来对迭代过程中的语音存在概率和语音不存在概率进行修正,以获取更快的收敛速度和更好的收敛结果,从而使得语音存在概率和噪声协方差矩阵估计精度更高,进而提高MVDR的语音增强效果。

Description

语音存在概率计算方法、系统、语音增强方法、系统以及耳机 技术领域
本说明书涉及语音信号处理技术领域,尤其涉及一种语音存在概率计算方法、系统、语音增强方法、系统以及耳机。
背景技术
在基于波束成形算法的语音增强技术中,特别是最小方差无失真响应(Minimum Variance Distortionless Response,简称MVDR)的自适应波束形成算法中,怎么求解描述不同麦克风之间噪声统计特性关系的参量——噪声协方差矩阵至关重要。现有技术中的主要方法是基于语音存在概率的方法计算噪声协方差矩阵,比如通过语音活动检测方法(Voice Activity Detection,简称VAD)估计语音存在概率,进而计算噪声协方差矩阵。但是现有技术中的语音存在概率估计准确率不够,导致噪声协方差矩阵估计精度较低,进而导致MVDR算法的语音增强效果较差。尤其当麦克风数目较少,比如小于5个时,效果急剧下降。因此,现有技术中的MVDR算法多用于手机、智能音箱等麦克风数量多且间距大的麦克风阵列设备中,而对于耳机这种麦克风数量少且间距小的设备语音增强效果较差。
因此,需要提供一种精度更高的语音存在概率计算方法、系统、语音增强方法、系统以及耳机。
发明内容
本说明书提供一种精度更高的语音存在概率计算方法、系统、语音增 强方法、系统以及耳机。
第一方面,本说明书提供一种语音存在概率计算方法,用于呈预设阵列形状分布的M个麦克风,所述M为大于1的整数,包括:获取所述M个麦克风输出的麦克风信号,所述麦克风信号满足高斯分布的第一模型或第二模型,所述第一模型和所述第二模型中的一个为语音存在模型,另一个为语音不存在模型;基于极大似然估计以及期望最大化算法分别对所述第一模型以及所述第二模型进行迭代优化,直至收敛,在迭代过程中,基于所述麦克风信号为所述第一模型时的第一概率的熵以及所述麦克风信号为所述第二模型时的第二概率的熵,确定所述语音存在模型是所述第一模型还是所述第二模型,所述第一概率与所述第二概率互补;以及在所述极大似然估计以及期望最大化算法收敛时,将所述麦克风信号为所述语音存在模型的概率作为所述麦克风信号的语音存在概率并输出。
在一些实施例中,所述第一模型对应的高斯分布的第一方差包括第一参数与第一空间协方差矩阵的乘积;以及所述第二模型对应的高斯分布的第二方差包括第二参数与第二空间协方差矩阵的乘积。
在一些实施例中,所述基于极大似然估计以及期望最大化算法分别对所述第一模型以及所述第二模型进行迭代优化,包括:基于极大似然估计以及期望最大化算法,构建目标函数;确定优化参数,所述优化参数包括所述第一空间协方差矩阵以及所述第二空间协方差矩阵;确定所述优化参数的初始值;基于所述目标函数以及所述优化参数的初始值,对所述优化参数进行多次迭代,直至所述目标函数收敛,包括:在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型;以及输出所述优化参数的收敛值及其对应的所述第一概率和所述第二概率。
在一些实施例中,所述在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型, 包括:在所述多次迭代的任意一次迭代中,计算所述第一概率的熵和所述第二概率的熵,确定所述语音存在概率是所述第一模型还是所述第二模型,包括:确定所述第一概率的熵大于所述第二概率的熵,确定所述语音存在模型为所述第二模型;或者确定所述第一概率的熵小于所述第二概率的熵,确定所述语音存在模型为所述第一模型。
在一些实施例中,所述在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型,包括:在所述多次迭代的第一次迭代中,计算所述第一概率的熵和所述第二概率的熵,确定所述语音存在概率是所述第一模型还是所述第二模型,包括:确定所述第一概率的熵大于所述第二概率的熵,确定所述语音存在模型为所述第二模型;或者确定所述第一概率的熵小于所述第二概率的熵,确定所述语音存在模型为所述第一模型。
在一些实施例中,所述对所述优化参数进行多次迭代,还包括在所述多次迭代的每次迭代中:基于所述第一概率的熵和所述第二概率的熵对所述第一概率和所述第二概率进行修正,包括:确定所述第一模型为所述语音存在模型,以及所述第一概率的熵大于所述第二概率的熵,将所述第一概率对应的值与所述第二概率对应的值互换;或者确定所述第二模型为所述语音存在模型,以及所述第二概率的熵大于所述第一概率的熵,将所述第一概率对应的值与所述第二概率对应的值互换;以及基于修正后的所述第一概率和所述第二概率更新所述优化参数。
在一些实施例中,所述对所述优化参数进行多次迭代,还包括在所述多次迭代的每次迭代中:对所述优化参数进行可逆化修正,包括:确定所述优化参数不可逆,通过偏差矩阵对所述优化参数进行修正,所述偏差矩阵包括单位矩阵、服从正态分布或均匀分布的随机矩阵中的一种。
第二方面,本说明书还提供一种语音存在概率计算系统,包括至少一个存储介质以及至少一个处理器,所述至少一个存储介质存储有至少一个 指令集用于语音存在概率计算;所述至少一个处理器,同所述至少一个存储介质通信连接,其中当所述语音存在概率计算系统运行时,所述至少一个处理器读取所述至少一个指令集并实施本说明书第一方面所述的语音存在概率计算方法。
第三方面,本说明书还提供一种语音增强方法,用于呈预设阵列形状分布的M个麦克风,所述M为大于1的整数,包括:获取所述M个麦克风输出的麦克风信号;基于权利要求1-7中任一项所述的语音存在概率计算方法,确定所述麦克风信号的所述语音存在概率;基于所述语音存在概率确定所述麦克风信号的噪声协方差矩阵;基于MVDR方法以及所述噪声空间协方差矩阵,确定所述麦克风信号对应的滤波系数;以及基于所述滤波系数对所述麦克风信号进行合并,输出目标音频信号。
第四方面,本说明书还提供一种语音增强系统,包括至少一个存储介质以及至少一个处理器,所述至少一个存储介质存储有至少一个指令集用于进行语音增强;所述至少一个处理器同所述至少一个存储介质通信连接,其中当所述语音增强系统运行时,所述至少一个处理器读取所述至少一个指令集并执行本说明书第三方面所述的语音增强方法。
第五方面,本说明书还提供一种耳机,包括麦克风阵列以及计算装置,所述麦克风阵列包括呈预设阵列形状分布的M个麦克风,所述M为大于1的整数;所述计算装置运行时与所述麦克风阵列通信连接,并执行本说明书第三方面所述的语音增强方法。
在一些实施例中,所述M个麦克风呈线性分布,并且M不大于5,所述M个麦克风中相邻麦克风的间距在20mm~40mm之间。
在一些实施例中,所述耳机还包括第一壳体以及第二壳体,所述麦克风阵列安装在所述第一壳体上,所述第一壳体包括第一接口,设置有第一磁性装置;所述计算装置安装在所述第二壳体上,所述第二壳体包括第二 接口,设置有第二磁性装置,其中,所述第一磁性装置与所述第二磁性装置之间的吸附力使得所述第一壳体与所述第二壳体可拆卸连接。
在一些实施例中,所述第一壳体还包括触点,设置在所述第一接口处,与所述麦克风阵列通信连接;以及所述第二壳体还包括导轨,设置在所述第二接口处,与所述计算装置通信连接,当所述第一壳体与所述第二壳体连接时,所述触点与所述导轨接触,使得所述麦克风阵列与所述计算装置通信连接。
由以上技术方案可知,本说明书提供的语音存在概率计算方法、系统、语音增强方法、系统以及耳机,用于由多个麦克风组成的麦克风阵列。其中,所述麦克风阵列中的每个麦克风均可以采集空间中的多个音源的音频,并输出相对应的麦克风信号。其中,每个音源的音频信号满足高斯分布。多个麦克风阵列输出的多个麦克风信号满足联合高斯分布。为了获取所述多个麦克风信号中的语音存在概率,所述语音存在概率计算方法、系统、语音增强方法、系统以及耳机可以分别获取多个麦克风信号中存在语音时的语音存在模型和不存在语音时的语音不存在模型,并基于极大似然估计以及期望最大化算法通过多次迭代优化,并在迭代过程中根据语音存在概率的熵和语音不存在概率的熵,对语音存在概率和语音不存在概率进行修正,从而计算确定语音存在模型的模型参数和语音不存在时的模型参数,在所述极大似然估计以及期望最大化算法收敛时,获取语音存在模型对应的语音存在概率。所述语音存在概率计算方法、系统、语音增强方法、系统以及耳机,通过对比语音存在概率的熵和语音不存在概率的熵来对迭代过程中的语音存在概率和语音不存在概率进行修正,以获取更快的收敛速度和更好的收敛结果,从而使得语音存在概率和噪声协方差矩阵估计精度更高,进而提高MVDR的语音增强效果。
本说明书提供的语音存在概率计算方法、系统、语音增强方法、系统 以及耳机的其他功能将在以下说明中部分列出。根据描述,以下数字和示例介绍的内容将对那些本领域的普通技术人员显而易见。本说明书提供的语音存在概率计算方法、系统、语音增强方法、系统以及耳机的创造性方面可以通过实践或使用下面详细示例中所述的方法、装置和组合得到充分解释。
附图说明
为了更清楚地说明本说明书实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本说明书的实施例提供的一种语音存在概率计算系统的硬件示意图;
图2A示出了根据本说明书的实施例提供的一种电子设备的爆炸结构示意图;
图2B示出了根据本说明书的实施例提供的一种第一壳体的主视图;
图2C示出了根据本说明书的实施例提供的一种第一壳体的俯视图;
图2D示出了根据本说明书的实施例提供的一种第二壳体的主视图;
图2E示出了根据本说明书的实施例提供的一种第二壳体的仰视图;
图3示出了根据本说明书的实施例提供的一种语音存在概率计算方法的流程图;
图4示出了根据本说明书的实施例提供的一种迭代优化的流程图;
图5示出了根据本说明书的实施例提供的一种多次迭代的流程图;
图6示出了根据本说明书的实施例提供的另一种多次迭代的流程图; 以及
图7示出了根据本说明书的实施例提供的一种语音增强方法的流程图。
具体实施方式
以下描述提供了本说明书的特定应用场景和要求,目的是使本领域技术人员能够制造和使用本说明书中的内容。对于本领域技术人员来说,对所公开的实施例的各种局部修改是显而易见的,并且在不脱离本说明书的精神和范围的情况下,可以将这里定义的一般原理应用于其他实施例和应用。因此,本说明书不限于所示的实施例,而是与权利要求一致的最宽范围。
这里使用的术语仅用于描述特定示例实施例的目的,而不是限制性的。比如,除非上下文另有明确说明,这里所使用的,单数形式“一”,“一个”和“该”也可以包括复数形式。当在本说明书中使用时,术语“包括”、“包含”和/或“含有”意思是指所关联的整数,步骤、操作、元素和/或组件存在,但不排除一个或多个其他特征、整数、步骤、操作、元素、组件和/或组的存在或在该系统/方法中可以添加其他特征、整数、步骤、操作、元素、组件和/或组。
考虑到以下描述,本说明书的这些特征和其他特征、以及结构的相关元件的操作和功能、以及部件的组合和制造的经济性可以得到明显提高。参考附图,所有这些形成本说明书的一部分。然而,应该清楚地理解,附图仅用于说明和描述的目的,并不旨在限制本说明书的范围。还应理解,附图未按比例绘制。
本说明书中使用的流程图示出了根据本说明书中的一些实施例的系统实现的操作。应该清楚地理解,流程图的操作可以不按顺序实现。相反,操作可以以反转顺序或同时实现。此外,可以向流程图添加一个或多个其 他操作。可以从流程图中移除一个或多个操作。
为了方便描述,首先对说明书中将要出现的术语进行如下解释:
最小方差无失真响应(Minimum Variance Distortionless Response,简称MVDR):是一种基于最大信干噪比(SINR)准则的自适应波束形成算法,MVDR算法可以自适应的使阵列输出在期望方向上的功率最小同时信干噪比最大。其目标是最小化记录信号的方差。如果噪声信号和期望信号不相关,那么记录信号的方差是期望信号和噪声信号的方差之和。因此,MVDR解决方案寻求最小化该总和,从而减轻噪声信号的影响。其原理是在期望信号无失真的约束条件下,选择合适的滤波器系数,使得阵列输出的平均功率最小化。
语音存在概率:在当前的音频信号中存在目标语音信号的概率。
高斯分布:正态分布(Normal distribution),也称“常态分布”,又名高斯分布(Gaussian distribution),正态曲线呈钟型,两头低,中间高,左右对称因其曲线呈钟形,因此人们又经常称之为钟形曲线。若随机变量X服从一个数学期望为μ、方差为σ 2的正态分布,记为N(μ,σ 2)。其概率密度函数为正态分布的期望值μ决定了其位置,其标准差σ决定了分布的幅度。当μ=0,σ=1时的正态分布是标准正态分布。
图1示出了根据本说明书的实施例提供的一种语音存在概率计算系统的硬件示意图。语音存在概率计算系统可以应用于电子设备200。
在一些实施例中,电子设备200可以是无线耳机、有线耳机、智能穿戴式设备,比如,智能眼镜、智能头盔或者智能腕表等具有音频处理功能的设备。电子设备200也可以是移动设备、平板电脑、笔记本电脑、机动车内置装置或类似内容,或其任意组合。在一些实施例中,移动设备可包括智能家居设备、智能移动设备或类似设备,或其任意组合。比如,所述智能移动设备可包括手机、个人数字辅助、游戏设备、导航设备、超级移 动个人计算机(Ultra-mobile Personal Computer,UMPC)等,或其任意组合。在一些实施例中,所述智能家居装置可包括智能电视、台式电脑等,或任意组合。在一些实施例中,机动车中的内置装置可包括车载计算机、车载电视等。
在本说明书中,我们以电子设备200为耳机为例进行描述。所述耳机可以是无线耳机,也可以是有线耳机。如图1所示,电子设备200可以包括麦克风阵列220以及计算装置240。
麦克风阵列220可以是电子设备200的音频采集设备。麦克风阵列220可以被配置为获取本地音频,并输出麦克风信号,也就是携带了音频信息的电子信号。麦克风阵列220可以包括呈预设阵列形状分布的M个麦克风222。其中,所述M为大于1的整数。M个麦克风222可以均匀分布,也可以非均匀分布。M个麦克风222可以输出麦克风信号。M个麦克风222可以输出M个麦克风信号。每个麦克风222对应一个麦克风信号。所述M个麦克风信号统称为所述麦克风信号。在一些实施例中,M个麦克风222可以呈线性分布。在一些实施例中,M个麦克风222也可以呈其他形状的阵列分布,比如,圆形阵列,矩形阵列等等。为了方便描述,下面的描述中我们将以M个麦克风222呈线性分布为例进行描述。在一些实施例中啊,M可以是大于1的任意整数,比如,2,3,4,5,甚至更多,等等。在一些实施例中,由于空间限制,M可以是大于1且不大于5的整数,比如在耳机等产品中。当电子设备200为耳机时,M个麦克风222中相邻麦克风222的间距可以在20mm~40mm之间。在一些实施例中,相邻麦克风222的间距可以更小,比如在10mm~20mm之间。
在一些实施例中,麦克风222可以是直接采集人体振动信号的骨传导麦克风。骨传导麦克风可以包括振动传感器,比如光学振动传感器、加速度传感器等。所述振动传感器可以采集机械振动信号(比如,由用户说话时皮肤或骨骼产生的振动产生的信号),并将该机械振动信号转换成电信号。 这里所说的机械振动信号主要指经由固体传播的振动。骨传导麦克风通过所述振动传感器或与所述振动传感器连接的振动部件与用户的皮肤或骨骼进行接触,从而采集用户在发出声音时骨骼或皮肤产生的振动信号,并将振动信号转换为电信号。在一些实施例中,所述振动传感器可以是对机械振动敏感而对空气振动不敏感的装置(即所述振动传感器对于机械振动的响应能力超过所述振动传感器对于空气振动的响应能力)。由于骨传导麦克风能够直接拾取发声部位的振动信号,骨传导麦克风能降低环境噪声的影响。
在一些实施例中,麦克风222也可以是直接采集空气振动信号的气传导麦克风。气传导麦克风通过采集用户在发出声音时引起的空气振动信号,并将空气振动信号转化为电信号。
在一些实施例中,M个麦克风220可以是M个骨传导麦克风。在一些实施例中,M个麦克风220也可以是M个气传导麦克风。在一些实施例中,M个麦克风220既可以包括骨传导麦克风,也可以包括气传导麦克风。当然,麦克风222也可以是其他类型的麦克风。比如光学麦克风,接收肌电信号的麦克风,等等。
计算装置240可以与麦克风阵列220通信连接。所述通信连接是指能够直接地或者间接地接收信息的任何形式的连接。在一些实施例中,计算装置240可以同麦克风阵列220通过无线通信连接来彼此传递数据;在一些实施例中,计算装置240也可以同麦克风阵列220通过电线直接连接来彼此传递数据;在一些实施例中,计算装置240也可以通过电线同其他电路直接连接来建立同麦克风阵列220的间接连接,从而实现彼此传递数据。在本说明书中将以计算装置240同麦克风阵列220电线直接连接为例进行描述。
计算装置240可以是具有数据信息处理功能的硬件设备。在一些实施例中,语音存在概率计算系统可以包括计算装置240。在一些实施例中,语 音存在概率计算系统可以应用于计算装置240。即语音存在概率计算系统可以在计算装置240上运行。语音存在概率计算系统可以包括具有数据信息处理功能的硬件设备和驱动该硬件设备工作所需必要的程序。当然,语音存在概率计算系统也可以仅为具有数据处理能力的硬件设备,或者,仅为运行在硬件设备中的程序。
语音存在概率计算系统可以存储有执行本说明书描述的语音存在概率计算方法的数据或指令,并可以执行所述数据和/或指令。当语音存在概率计算系统在计算装置240上运行时,语音存在概率计算系统可以基于所述通信连接从麦克风阵列220中获取所述麦克风信号,并执行本说明书描述的语音存在概率计算方法的数据或指令,计算所述麦克风信号中的语音存在概率。所述语音存在概率计算方法在本说明书中的其他部分介绍。比如,在图3至图6的描述中介绍了所述语音存在概率计算方法。
如图1所示,计算装置240可以包括至少一个存储介质243和至少一个处理器242。在一些实施例中,电子设备200还可以包括通信端口245和内部通信总线241。
内部通信总线241可以连接不同的系统组件,包括存储介质243、处理器242和通信端口245。
通信端口245可以用于计算装置240同外界的数据通信。比如,计算装置240可以通过通信端口245从麦克风阵列220中获取所述麦克风信号。
至少一个存储介质243可以包括数据存储装置。所述数据存储装置可以是非暂时性存储介质,也可以是暂时性存储介质。比如,所述数据存储装置可以包括磁盘、只读存储介质(ROM)或随机存取存储介质(RAM)中的一种或多种。当语音存在概率计算系统可以在计算装置240上运行时,存储介质243还可以包括存储在所述数据存储装置中的至少一个指令集,用于对所述麦克风信号进行语音存在概率计算。所述指令是计算机程序代 码,所述计算机程序代码可以包括执行本说明书提供的语音存在概率计算方法的程序、例程、对象、组件、数据结构、过程、模块等等。
至少一个处理器242可以同至少一个存储介质243通过内部通信总线241通信连接。所述通信连接是指能够直接地或者间接地接收信息的任何形式的连接。至少一个处理器242用以执行上述至少一个指令集。当语音存在概率计算系统可以在计算装置240上运行时,至少一个处理器242读取所述至少一个指令集,并且根据所述至少一个指令集的指示执行本说明书提供的语音存在概率计算方法。处理器242可以执行语音存在概率计算方法包含的所有步骤。处理器242可以是一个或多个处理器的形式,在一些实施例中,处理器242可以包括一个或多个硬件处理器,例如微控制器,微处理器,精简指令集计算机(RISC),专用集成电路(ASIC),特定于应用的指令集处理器(ASIP),中央处理单元(CPU),图形处理单元(GPU),物理处理单元(PPU),微控制器单元,数字信号处理器(DSP),现场可编程门阵列(FPGA),高级RISC机器(ARM),可编程逻辑器件(PLD),能够执行一个或多个功能的任何电路或处理器等,或其任何组合。仅仅为了说明问题,在本说明书中计算装置240中仅描述了一个处理器242。然而,应当注意,本说明书中计算装置240还可以包括多个处理器242,因此,本说明书中披露的操作和/或方法步骤可以如本说明书所述的由一个处理器执行,也可以由多个处理器联合执行。例如,如果在本说明书中计算装置240的处理器242执行步骤A和步骤B,则应该理解,步骤A和步骤B也可以由两个不同处理器242联合或分开执行(例如,第一处理器执行步骤A,第二处理器执行步骤B,或者第一和第二处理器共同执行步骤A和B)。
图2A示出了根据本说明书的实施例提供的一种电子设备200的爆炸结构示意图。如图2A所示,电子设备200可以包括麦克风阵列220、计算装置240、第一壳体260以及第二壳体280。
第一壳体260可以是麦克风阵列220的安装基体。麦克风阵列220可以安装在第一壳体260内部。第一壳体260的形状可以根据麦克风阵列220的分布形状做适应性设计,本说明书对此不做过多限定。第二壳体280可以是计算装置240的安装基体。计算装置240可以安装在第二壳体280的内部。第二壳体280的形状可以根据计算装置240的形状做适应性设计,本说明书对此不做过多限定。当电子设备200为耳机时,第二壳体280可以与穿戴部位连接。第二壳体280可以与第一壳体260连接。如前所述,麦克风阵列220可以与计算装置240电连接。具体地,麦克风阵列220可以与计算装置240通过第一壳体260和第二壳体280的连接实现电连接。
在一些实施例中,第一壳体260可以与第二壳体280固定连接,比如,一体成型、焊接、铆接、粘接,等等。在一些实施例中,第一壳体260可以与第二壳体280可拆卸连接。计算装置240可以与不同的麦克风阵列220通信连接。具体地,不同的麦克风阵列220可以是麦克风阵列220中的麦克风222的数量不同、阵列形状不同、麦克风222间距不同、麦克风阵列220在第一壳体260中的安装角度不同,麦克风阵列220在第一壳体260中的安装位置不同等等。使用者可以根据应用场景的不同,更换对应的麦克风阵列220,以使电子设备200适用于更广泛的场景。比如,当应用场景中使用者与电子设备200的距离较近时,使用者可以更换为间距更小的麦克风阵列220。再比如,当应用场景中使用者与电子设备200的距离较近时,使用者可以更换为间距更大、数量更多的麦克风阵列220,等等。
所述可拆卸连接可以是任意形式的物理连接,比如,螺纹连接、卡扣连接、磁吸连接,等等。在一些实施例中,第一壳体260与第二壳体280之间可以是磁吸连接。即第一壳体260与第二壳体280之间通过磁性装置的吸附力进行可拆卸连接。
图2B示出了根据本说明书的实施例提供的一种第一壳体260的主视图; 图2C示出了根据本说明书的实施例提供的一种第一壳体260的俯视图。如图2B和图2C所示,第一壳体260可以包括第一接口262。在一些实施例中,第一壳体260还可以包括触点266。在一些实施例中,第一壳体260还可以包括角度传感器(图2B和图2C中未示出)。
第一接口262可以是第一壳体260与第二壳体280的安装接口。在一些实施例中,第一接口262可以是圆形。第一接口262可以与第二壳体280转动连接。当第一壳体260安装在第二壳体280上时,第一壳体260可以相对于第二壳体280转动,调整第一壳体260相对于第二壳体280的角度,从而调整麦克风阵列220的角度。
第一接口262上可以设置有第一磁性装置263。第一磁性装置263可以设置在第一接口262靠近第二壳体280的位置。第一磁性装置263可以产生磁性吸附力,从而实现与第二壳体280的可拆卸连接。当第一壳体260靠近第二壳体260时,通过所述吸附力,使第一壳体260与第二壳体280快速连接。在一些实施例中,在第一壳体260与第二壳体280连接后,第一壳体260还可以相对于第二壳体280转动,以调整麦克风阵列220的角度。在所述吸附力的作用下,在第一壳体260相对于第二壳体280转动时,依然可以保持第一壳体260与第二壳体280的连接。
在一些实施例中,第一接口262上还可以设置有第一定位装置(图2B和图2C中未示出)。所述第一定位装置可以是向外凸起的定位台阶,也可以是向内延伸的定位孔。所述第一定位装置可以与第二壳体280配合,以实现第一壳体260与第二壳体280的快速安装。
如图2B和图2C所示,在一些实施例中,第一壳体260还可以包括触点266。触点266可以安装在第一接口262处。触点266可以从第一接口262处向外突出。触点266可以与第一接口262弹性连接。触点266可以与麦克风阵列220中的M个麦克风222通信连接。触点266可以由具有弹性 的金属制成,以实现数据传输。当第一壳体260与第二壳体280连接时,麦克风阵列220可以通过触点266与计算装置240实现通信连接。在一些实施例中,触点266可以呈圆形分布。在第一壳体260与第二壳体280连接后,第一壳体260相对于第二壳体280转动时,触点266也可以相对于第二壳体280转动,并保持与计算装置240的通信连接。
在一些实施例中,第一壳体260上还可以设置有角度传感器(图2B和图2C中未示出)。所述角度传感器可以与触点266通信连接,从而实现与计算装置240的通信连接。所述角度传感器可以采集第一壳体260的角度数据,从而确定麦克风阵列220所处的角度,为后续的语音存在概率的计算提供参考数据。
图2D示出了根据本说明书的实施例提供的一种第二壳体280的主视图;图2E示出了根据本说明书的实施例提供的一种第二壳体280的仰视图。如图2D和图2E所示,第二壳体280可以包括第二接口282。在一些实施例中,第二壳体280还可以包括导轨286。
第二接口282可以是第二壳体280与第一壳体260的安装接口。在一些实施例中,第二接口282可以是圆形。第二接口282可以与第一壳体260的第一接口262转动连接。当第一壳体260安装在第二壳体280上时,第一壳体260可以相对于第二壳体280转动,调整第一壳体260相对于第二壳体280的角度,从而调整麦克风阵列220的角度。
第二接口282上可以设置有第二磁性装置283。第二磁性装置283可以设置在第二接口282靠近第一壳体260的位置。第二磁性装置283可以产生磁性吸附力,从而实现与第一接口262的可拆卸连接。第二磁性装置283可以与第一磁性装置263配合使用。当第一壳体260靠近第二壳体260时,通过第二磁性装置283与第一磁性装置263间的吸附力,使第一壳体260快速安装在第二壳体280上。当第一壳体260安装在第二壳体260上时, 第二磁性装置283与第一磁性装置263的位置相对。在一些实施例中,在第一壳体260与第二壳体280连接后,第一壳体260还可以相对于第二壳体280转动,以调整麦克风阵列220的角度。在所述吸附力的作用下,在第一壳体260相对于第二壳体280转动时,依然可以保持第一壳体260与第二壳体280的连接。
在一些实施例中,第二接口282上还可以设置有第二定位装置(图2D和图2E中未示出)。所述第二定位装置可以是向外凸起的定位台阶,也可以是向内延伸的定位孔。所述第二定位装置可以与第一壳体260的第一定位装置配合,以实现第一壳体260与第二壳体280的快速安装。当所述第一定位装置为所述定位台阶时,所述第二定位装置可以是所述定位孔。当所述第一定位装置为所述定位孔时,所述第二定位装置可以是所述定位台阶。
如图2D和图2E所示,在一些实施例中,第二壳体280还可以包括导轨286。导轨286可以安装在第二接口282处。导轨286可以与计算装置240通信连接。导轨286可以由金属材质制成,以实现数据传输。当第一壳体260与第二壳体280连接时,触点266可以与导轨286接触形成通信连接,从而实现麦克风阵列220与计算装置240的通信连接,以实现数据传输。如前所述,触点266可以与第一接口262弹性连接。因此,在第一可以260与第二壳体280连接后,在所述弹性连接的弹力作用下,可以使触点266与导轨286完全接触,以实现可靠的通信连接。在一些实施例中,导轨286可以呈圆形分布。在第一壳体260与第二壳体280连接后,第一壳体260相对于第二壳体280转动时,触点266也可以相对于导轨286转动,并保持与导轨286的通信连接。
图3示出了根据本说明书的实施例提供的语音存在概率计算方法P100的流程图。所述方法P100可以计算所述麦克风信号的语音存在概率。具体 地,处理器242可以执行所述方法P100。如图3所示,所述方法P100可以包括:
S120:获取M个麦克风222输出的麦克风信号。
如前所述,每个麦克风222都可以输出对应的麦克风信号。M个麦克风222对应M个麦克风信号。所述方法P100在计算语音存在概率时,可以基于M个麦克风信号中的所有麦克风信号进行计算,也可以基于部分麦克风信号进行计算。因此,所述麦克风信号可以包括M个麦克风222对应M个麦克风信号或者部分麦克风信号。本说明书后面的描述中将以所述麦克风信号可以包括M个麦克风222对应M个麦克风信号为例进行描述。
如前所述,麦克风222可以采集周围环境中的噪声,也可以采集目标用户的目标语音。假设麦克风222的周围存在N个信号源,分别为s 1(t)、......、s N(t)。为了方便描述,我们将N个信号源定义为s v(t)。s v(t)是由N个信号源s 1(t)、......、s N(t)组成的信号源矢量。其中,v=n或s+n。其中,当v=n时,表示N个信号源s v(t)全部是噪声信号。当v=s+n时,表示N个信号源s v(t)由噪声信号和目标语音信号组成。N个信号源s v(t)的声场模式为远场模式。N个信号源s v(t)可以视作平面波。为了方便描述,我们将t时刻的麦克风信号标记为x(t)。麦克风信号x(t)可以是由M个麦克风信号组成的信号矢量。此时,麦克风信号x(t)可以表示为以下公式:
Figure PCTCN2021123111-appb-000001
其中,a v(θ)为N个信号源s v(t)的导引矢量。θ 1、......、θ N分别是N个信号源s 1(t)、......、s N(t)与麦克风222之间的入射角度。a v(θ)可以是与θ 1、......、θ N以及相邻麦克风222之间的距离d 1、......、d M-1有关的函数。 计算装置240中预先存储有M个麦克风222的相对位置关系,比如相对距离,或者相对坐标。即计算装置240中预先存储有d 1、......、d M-1
麦克风信号x(t)为时域信号。在一些实施例中,在步骤S120中,计算装置240还可以对所述麦克风信号x(t)进行频谱分析。具体地,计算装置240可以基于麦克风信号的时域信号x(t)做傅里叶变换,获取所述麦克风信号的频域信号x f,t。以下的描述中,将以频域中的麦克风信号x f,t进行描述。此时,麦克风信号x f,t以表示为以下公式:
Figure PCTCN2021123111-appb-000002
其中,
Figure PCTCN2021123111-appb-000003
为频域中的导引矢量。
Figure PCTCN2021123111-appb-000004
为频域中的N个信号源对应的信号复幅度。在一些实施例中,N个信号源
Figure PCTCN2021123111-appb-000005
可以满足高斯分布。
Figure PCTCN2021123111-appb-000006
可以表示为以下公式:
Figure PCTCN2021123111-appb-000007
在一些实施例中,所述高斯分布
Figure PCTCN2021123111-appb-000008
可以是复高斯分布。其中,
Figure PCTCN2021123111-appb-000009
Figure PCTCN2021123111-appb-000010
的方差。当v=n时,
Figure PCTCN2021123111-appb-000011
为满足高斯分布的语音不存在模型。当v=s+n时,
Figure PCTCN2021123111-appb-000012
为满足高斯分布的语音存在模型。v=n时的语音不存在模型的方差
Figure PCTCN2021123111-appb-000013
不同于v=s+n时的语音存在模型的方差
Figure PCTCN2021123111-appb-000014
根据公式(2)和公式(3)可知,麦克风信号x f,t也满足高斯分布。具体地,麦克风信号x f,t可以是满足高斯分布的语音存在模型或语音不存在模型。x f,t可以表示为以下公式:
Figure PCTCN2021123111-appb-000015
其中,
Figure PCTCN2021123111-appb-000016
为x f,t的方差。
Figure PCTCN2021123111-appb-000017
为了方便描述,我们将
Figure PCTCN2021123111-appb-000018
定义为空间协方差矩阵。当v=n时,x f,t为满足高斯分布的语音不存在模型。当v=s+n时,x f,t为满足高斯分布的语音存在模型。
所述麦克风信号x f,t对应的语音存在概率可以是所述麦克风信号x f,t属于语音存在模型的概率。为了方便描述,我们将所述麦克风信号x f,t对应的 语音存在概率定义为
Figure PCTCN2021123111-appb-000019
所述麦克风信号x f,t对应的语音不存在概率定义为
Figure PCTCN2021123111-appb-000020
我们将在语音存在模型中,所述麦克风信号x f,t对应的语音存在分布概率定义为
Figure PCTCN2021123111-appb-000021
我们将在语音不存在模型中,所述麦克风信号x f,t对应的语音存在分布概率定义为
Figure PCTCN2021123111-appb-000022
所述麦克风信号x f,t对应的语音存在概率
Figure PCTCN2021123111-appb-000023
可以表示为以下公式:
Figure PCTCN2021123111-appb-000024
为了计算
Figure PCTCN2021123111-appb-000025
计算装置240需确定语音存在模型对应的语音存在方差
Figure PCTCN2021123111-appb-000026
以及语音不存在模型对应的语音不存在方差
Figure PCTCN2021123111-appb-000027
假设麦克风信号x f,t可以是满足高斯分布的第一模型或第二模型。所述第一模型和所述第二模型中的一个为语音存在模型,另一个为语音不存在模型。
为了方便描述我们将第一模型定义为以下公式:
Figure PCTCN2021123111-appb-000028
其中,
Figure PCTCN2021123111-appb-000029
为第一模型对应的高斯分布的第一方差。第一方差
Figure PCTCN2021123111-appb-000030
为第一参数
Figure PCTCN2021123111-appb-000031
和第一空间协方差矩阵
Figure PCTCN2021123111-appb-000032
的乘积。
我们将第二模型定义为以下公式:
Figure PCTCN2021123111-appb-000033
其中,
Figure PCTCN2021123111-appb-000034
为第二模型对应的高斯分布的第二方差。第二方差
Figure PCTCN2021123111-appb-000035
为第二参数
Figure PCTCN2021123111-appb-000036
与第二空间协方差矩阵
Figure PCTCN2021123111-appb-000037
的乘积。
为了计算
Figure PCTCN2021123111-appb-000038
计算装置240需确定第一模型和第二模型中哪个是语音存在模型,哪个是语音不存在模型。
S140:基于极大似然估计以及期望最大化极大似然估计以及期望最大化算法分别对所述第一模型以及所述第二模型进行迭代优化,直至收敛。
计算装置240可以采用迭代优化的方法分别对第一模型以及第二模型进行迭代优化,以获取第一模型的第一方差
Figure PCTCN2021123111-appb-000039
以及第二模型的第二方 差
Figure PCTCN2021123111-appb-000040
在迭代过程中,计算装置240可以基于麦克风信号x f,t为第一模型时的第一概率
Figure PCTCN2021123111-appb-000041
的熵
Figure PCTCN2021123111-appb-000042
以及麦克风信号x f,t为第二模型时的第二概率
Figure PCTCN2021123111-appb-000043
的熵
Figure PCTCN2021123111-appb-000044
确定语音存在模型是所述第一模型还是所述第二模型。
第一概率
Figure PCTCN2021123111-appb-000045
可以是在第一模型和第二模型中,麦克风信号x f,t属于第一模型的概率。第二概率
Figure PCTCN2021123111-appb-000046
可以是在第一模型和第二模型中,麦克风信号x f,t属于第二模型的概率。其中,第一概率
Figure PCTCN2021123111-appb-000047
与第二概率
Figure PCTCN2021123111-appb-000048
互补,即
Figure PCTCN2021123111-appb-000049
我们将在第一模型中,所述麦克风信号x f,t对应的第一分布概率定义为
Figure PCTCN2021123111-appb-000050
我们将在第二模型中,所述麦克风信号x f,t对应的第二分布概率定义为
Figure PCTCN2021123111-appb-000051
所述麦克风信号x f,t对应的第一概率
Figure PCTCN2021123111-appb-000052
可以表示为以下公式:
Figure PCTCN2021123111-appb-000053
所述麦克风信号x f,t对应的第二概率
Figure PCTCN2021123111-appb-000054
可以表示为以下公式:
Figure PCTCN2021123111-appb-000055
图4示出了根据本说明书的实施例提供的一种迭代优化的流程图。图4所示的为步骤S140。如图4所示,步骤S140可以包括:
S142:基于极大似然估计以及期望最大化算法,构建目标函数。
如前所述,未知参数包括第一模型的第一方差
Figure PCTCN2021123111-appb-000056
以及第二模型的第二方差
Figure PCTCN2021123111-appb-000057
其中,隐藏变量为麦克风信号x f,t属于第一模型的第一概率
Figure PCTCN2021123111-appb-000058
以及麦克风信号x f,t属于第二模型的第一概率
Figure PCTCN2021123111-appb-000059
因此,采用极大似然估计以及期望最大化算法对第一方差
Figure PCTCN2021123111-appb-000060
以及第二模型的第二方差
Figure PCTCN2021123111-appb-000061
进行迭代优化。其中,目标函数为极大似然估计函数。所述极大似然估计函数可以表示为以下公式:
Figure PCTCN2021123111-appb-000062
S144:确定优化参数。
第一参数
Figure PCTCN2021123111-appb-000063
和第一空间协方差矩阵
Figure PCTCN2021123111-appb-000064
之间的关系可以表示为以下公式:
Figure PCTCN2021123111-appb-000065
第二参数
Figure PCTCN2021123111-appb-000066
和第二空间协方差矩阵
Figure PCTCN2021123111-appb-000067
之间的关系可以表示为以下公式:
Figure PCTCN2021123111-appb-000068
因此,所述优化参数可以包括所述第一空间协方差矩阵
Figure PCTCN2021123111-appb-000069
以及所述第二空间协方差矩阵
Figure PCTCN2021123111-appb-000070
S145:确定所述优化参数的初始值。
为了方便描述,我们将第一空间协方差矩阵
Figure PCTCN2021123111-appb-000071
的初始值定义为
Figure PCTCN2021123111-appb-000072
将第二空间协方差矩阵
Figure PCTCN2021123111-appb-000073
的初始值定义为
Figure PCTCN2021123111-appb-000074
第一空间协方差矩阵
Figure PCTCN2021123111-appb-000075
的初始值
Figure PCTCN2021123111-appb-000076
和第二空间协方差矩阵
Figure PCTCN2021123111-appb-000077
初始值
Figure PCTCN2021123111-appb-000078
可以是相同的,也可以是不同的。在一些实施例中,第一空间协方差矩阵
Figure PCTCN2021123111-appb-000079
的初始值
Figure PCTCN2021123111-appb-000080
和/或第二空间协方差矩阵
Figure PCTCN2021123111-appb-000081
初始值
Figure PCTCN2021123111-appb-000082
可以是单位矩阵I N。在一些实施例中,第一空间协方差矩阵
Figure PCTCN2021123111-appb-000083
的初始值
Figure PCTCN2021123111-appb-000084
和/或第二空间协方差矩阵
Figure PCTCN2021123111-appb-000085
初始值
Figure PCTCN2021123111-appb-000086
可以是基于相邻若干帧麦克风信号直接计算得到的。此时
Figure PCTCN2021123111-appb-000087
和/或
Figure PCTCN2021123111-appb-000088
可以表示为以下公式:
Figure PCTCN2021123111-appb-000089
S146:基于所述目标函数以及所述优化参数的初始值,对所述优化参数进行多次迭代,直至所述目标函数收敛。
如前所述,计算装置240可以在所述多次迭代过程中,基于所述第一概率
Figure PCTCN2021123111-appb-000090
的熵
Figure PCTCN2021123111-appb-000091
和所述第二概率
Figure PCTCN2021123111-appb-000092
的熵
Figure PCTCN2021123111-appb-000093
确定所述语音存在概率是所述第一模型还是所述第二模型。
在一些实施例中,计算装置240可以在所述多次迭代中的任意一次迭代中基于所述第一概率
Figure PCTCN2021123111-appb-000094
的熵
Figure PCTCN2021123111-appb-000095
和所述第二概率
Figure PCTCN2021123111-appb-000096
的熵
Figure PCTCN2021123111-appb-000097
确定所述语音存在概率是所述第一模型还是所述第二模型,如图5所示。图5示出了根据本说明书的实施例提供的一种多次迭代的流程图,对应步骤S146。如图5所示,步骤S146可以包括在每一次迭代中:
S146-2:对所述优化参数进行可逆化修正。
具体地,步骤S146-2可以是,确定所述优化参数不可逆时,通过偏差矩阵对所述优化参数进行修正。所述偏差矩阵可以包括单位矩阵、服从正态分布或均匀分布的随机矩阵中的一种。如前所述,所述优化参数包括第一空间协方差矩阵
Figure PCTCN2021123111-appb-000098
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000099
根据公式(11)和公式(12)可知,要获取第一参数
Figure PCTCN2021123111-appb-000100
和第二参数
Figure PCTCN2021123111-appb-000101
第一空间协方差矩阵
Figure PCTCN2021123111-appb-000102
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000103
需可逆。矩阵条件数越大,矩阵越接近一个奇异矩阵(不可逆矩阵)。当第一空间协方差矩阵
Figure PCTCN2021123111-appb-000104
或第二空间协方差矩阵
Figure PCTCN2021123111-appb-000105
不可逆(即矩阵条件数大于某一阈值η)时,给第一空间协方差矩阵
Figure PCTCN2021123111-appb-000106
或第二空间协方差矩阵
Figure PCTCN2021123111-appb-000107
加轻微扰动,以进行修正保证其可逆。
具体地,计算装置240可以对第一空间协方差矩阵
Figure PCTCN2021123111-appb-000108
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000109
进行可逆化判断。如果
Figure PCTCN2021123111-appb-000110
或者
Figure PCTCN2021123111-appb-000111
则代表第一空间协方差矩阵
Figure PCTCN2021123111-appb-000112
或者第二空间协方差矩阵
Figure PCTCN2021123111-appb-000113
不可逆,需进行可逆化修正。其中,η为条件数阈值。在一些实施例中,η=10000。在一些实施例中,η可以更大或者更小。
当第一空间协方差矩阵
Figure PCTCN2021123111-appb-000114
或者第二空间协方差矩阵
Figure PCTCN2021123111-appb-000115
不可逆时,可以通过偏差矩阵Q对第一空间协方差矩阵
Figure PCTCN2021123111-appb-000116
或者第二空间协方差矩阵
Figure PCTCN2021123111-appb-000117
进行修正。此时,第一空间协方差矩阵
Figure PCTCN2021123111-appb-000118
或者第二空间协方差矩阵
Figure PCTCN2021123111-appb-000119
可以表示为以下公式:
Figure PCTCN2021123111-appb-000120
Figure PCTCN2021123111-appb-000121
其中,Q为偏差矩阵。μ为偏差系数。在一些实施例中,μ=0.001。
当第一空间协方差矩阵
Figure PCTCN2021123111-appb-000122
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000123
均可逆时,无需修正。
S146-3:基于公式(11)和公式(12)确定第一参数
Figure PCTCN2021123111-appb-000124
和第二参数
Figure PCTCN2021123111-appb-000125
S146-4:基于公式(8)和公式(9)确定第一概率
Figure PCTCN2021123111-appb-000126
以及第二概率
Figure PCTCN2021123111-appb-000127
S146-5:基于第一概率
Figure PCTCN2021123111-appb-000128
以及第二概率
Figure PCTCN2021123111-appb-000129
更新优化参数第一空间协方差矩阵
Figure PCTCN2021123111-appb-000130
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000131
第一空间协方差矩阵
Figure PCTCN2021123111-appb-000132
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000133
可以表示为以下公式:
Figure PCTCN2021123111-appb-000134
Figure PCTCN2021123111-appb-000135
S146-6:基于目标函数,判断迭代是否停止。
步骤S146-6可以包括:
S146-7:确定迭代停止,输出优化参数的收敛值。或者
S146-8:确定迭代没有停止,继续进行下一次迭代。
如图5所示,步骤S146还可以包括:
S146-9:在所述多次迭代的任意一次迭代中,基于第一概率
Figure PCTCN2021123111-appb-000136
的熵
Figure PCTCN2021123111-appb-000137
和第二概率
Figure PCTCN2021123111-appb-000138
的熵
Figure PCTCN2021123111-appb-000139
确定语音存在概率是第一模型还是第二模型。
步骤S146-9可以是在迭代过程中执行,也可以是在迭代结束后,以所述多次迭代中的任意一次迭代中的第一概率
Figure PCTCN2021123111-appb-000140
和第二概率
Figure PCTCN2021123111-appb-000141
为计算参数,计算第一概率
Figure PCTCN2021123111-appb-000142
的熵
Figure PCTCN2021123111-appb-000143
和第二概率
Figure PCTCN2021123111-appb-000144
的熵
Figure PCTCN2021123111-appb-000145
从而确定所述语音存在概率是所述第一模型还是所述第二模型。熵代表了一个 系统的混乱程度,或者说是无序程度。系统越无序,熵值就越大;系统越有序,熵值就越小。N个信号源中全部为噪声信号比N个信号源中存在语音信号时更无序。因此,语音不存在模型的熵比语音存在模型的熵更大。
具体地,在步骤S146-9中,计算装置240可以获取任意一次迭代中的第一概率
Figure PCTCN2021123111-appb-000146
和第二概率
Figure PCTCN2021123111-appb-000147
并计算第一概率
Figure PCTCN2021123111-appb-000148
的熵
Figure PCTCN2021123111-appb-000149
和第二概率
Figure PCTCN2021123111-appb-000150
的熵
Figure PCTCN2021123111-appb-000151
当第一概率
Figure PCTCN2021123111-appb-000152
的熵
Figure PCTCN2021123111-appb-000153
大于第二概率
Figure PCTCN2021123111-appb-000154
的熵
Figure PCTCN2021123111-appb-000155
时,计算装置240可以判定语音存在模型为第二模型,而第一模型为语音不存在模型。当第一概率
Figure PCTCN2021123111-appb-000156
的熵
Figure PCTCN2021123111-appb-000157
小于第二概率
Figure PCTCN2021123111-appb-000158
的熵
Figure PCTCN2021123111-appb-000159
时,计算装置240可以判定语音存在模型为第一模型,而第二模型为语音不存在模型。
在一些实施例中,计算装置240可以在所述多次迭代中的第一次迭代中基于所述第一概率
Figure PCTCN2021123111-appb-000160
的熵
Figure PCTCN2021123111-appb-000161
和所述第二概率
Figure PCTCN2021123111-appb-000162
的熵
Figure PCTCN2021123111-appb-000163
确定所述语音存在概率是所述第一模型还是所述第二模型,并在之后的多次迭代的每次迭代中对第一概率
Figure PCTCN2021123111-appb-000164
和第二概率
Figure PCTCN2021123111-appb-000165
进行修正,以纠正语音存在概率错判现象,如图6所示。图6示出了根据本说明书的实施例提供的另一种多次迭代的流程图,对应步骤S146。如图6所示,步骤S146可以包括:
S146-10:在所述多次迭代的第一次迭代中,计算第一概率
Figure PCTCN2021123111-appb-000166
的熵
Figure PCTCN2021123111-appb-000167
和第二概率
Figure PCTCN2021123111-appb-000168
的熵
Figure PCTCN2021123111-appb-000169
确定语音存在概率是第一模型还是第二模型。
具体地,在步骤S146-1中,计算装置240可以在第一次迭代中,基于基于公式(11)和公式(12)确定第一参数
Figure PCTCN2021123111-appb-000170
和第二参数
Figure PCTCN2021123111-appb-000171
然后基于公式(8)和公式(9)确定第一概率
Figure PCTCN2021123111-appb-000172
以及第二概率
Figure PCTCN2021123111-appb-000173
然后计算第一概率
Figure PCTCN2021123111-appb-000174
的熵
Figure PCTCN2021123111-appb-000175
和第二概率
Figure PCTCN2021123111-appb-000176
的熵
Figure PCTCN2021123111-appb-000177
并进行对比。当第一概率
Figure PCTCN2021123111-appb-000178
的熵
Figure PCTCN2021123111-appb-000179
大于第二概率
Figure PCTCN2021123111-appb-000180
的熵
Figure PCTCN2021123111-appb-000181
时,计 算装置240可以判定语音存在模型为第二模型,而第一模型为语音不存在模型。当第一概率
Figure PCTCN2021123111-appb-000182
的熵
Figure PCTCN2021123111-appb-000183
小于第二概率
Figure PCTCN2021123111-appb-000184
的熵
Figure PCTCN2021123111-appb-000185
时,计算装置240可以判定语音存在模型为第一模型,而第二模型为语音不存在模型。
如图6所示,步骤S146还可以包括在第一次迭代之后的每一次迭代中:
S146-11:对所述优化参数进行可逆化修正。如前所述的步骤S146-2,在此不再赘述。
S146-12:基于公式(11)和公式(12)确定第一参数
Figure PCTCN2021123111-appb-000186
和第二参数
Figure PCTCN2021123111-appb-000187
S146-13:基于公式(8)和公式(9)确定第一概率
Figure PCTCN2021123111-appb-000188
以及第二概率
Figure PCTCN2021123111-appb-000189
S146-14:基于第一概率
Figure PCTCN2021123111-appb-000190
的熵
Figure PCTCN2021123111-appb-000191
和第二概率
Figure PCTCN2021123111-appb-000192
的熵
Figure PCTCN2021123111-appb-000193
对第一概率
Figure PCTCN2021123111-appb-000194
和第二概率
Figure PCTCN2021123111-appb-000195
进行修正。
具体地,步骤S146-14可以是计算装置240计算第一概率
Figure PCTCN2021123111-appb-000196
的熵
Figure PCTCN2021123111-appb-000197
和第二概率
Figure PCTCN2021123111-appb-000198
的熵
Figure PCTCN2021123111-appb-000199
并进行对比。当语音存在模型为第一模型时,若第一概率
Figure PCTCN2021123111-appb-000200
的熵
Figure PCTCN2021123111-appb-000201
大于第二概率
Figure PCTCN2021123111-appb-000202
的熵
Figure PCTCN2021123111-appb-000203
则将第一概率
Figure PCTCN2021123111-appb-000204
对应的值与第二概率
Figure PCTCN2021123111-appb-000205
对应的值互换。即将第一概率
Figure PCTCN2021123111-appb-000206
对应的值更新为第二概率
Figure PCTCN2021123111-appb-000207
对应的值,将第二概率
Figure PCTCN2021123111-appb-000208
对应的值更新为第一概率
Figure PCTCN2021123111-appb-000209
对应的值。当语音存在模型为第一模型时,若第一概率
Figure PCTCN2021123111-appb-000210
的熵
Figure PCTCN2021123111-appb-000211
小于第二概率
Figure PCTCN2021123111-appb-000212
的熵
Figure PCTCN2021123111-appb-000213
则不对第一概率
Figure PCTCN2021123111-appb-000214
和第二概率
Figure PCTCN2021123111-appb-000215
进行修正。当语音存在模型为第二模型时,若第一概率
Figure PCTCN2021123111-appb-000216
的熵
Figure PCTCN2021123111-appb-000217
小于第二概率
Figure PCTCN2021123111-appb-000218
的熵
Figure PCTCN2021123111-appb-000219
则将第一概率
Figure PCTCN2021123111-appb-000220
对应的值与第二概率
Figure PCTCN2021123111-appb-000221
对应的值互换。即将第一概率
Figure PCTCN2021123111-appb-000222
对应的值更新为第二概率
Figure PCTCN2021123111-appb-000223
对应的值,将第二概率
Figure PCTCN2021123111-appb-000224
对应的值更新为第一概率
Figure PCTCN2021123111-appb-000225
对应的值。当语音存在模型为第二模型时,若第一概率
Figure PCTCN2021123111-appb-000226
的熵
Figure PCTCN2021123111-appb-000227
大 于第二概率
Figure PCTCN2021123111-appb-000228
的熵
Figure PCTCN2021123111-appb-000229
则不对第一概率
Figure PCTCN2021123111-appb-000230
和第二概率
Figure PCTCN2021123111-appb-000231
进行修正。
S146-15:基于修正后的第一概率
Figure PCTCN2021123111-appb-000232
和第二概率
Figure PCTCN2021123111-appb-000233
更新所述优化参数,第一空间协方差矩阵
Figure PCTCN2021123111-appb-000234
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000235
在步骤S146-14和步骤S146-15中,可以使每次迭代过程中,语音存在模型的熵小于语音不存在模型的熵,以保证每次迭代向着目标方向收敛,从而加快收敛速度。
S146-16:基于目标函数,判断迭代是否停止。
步骤S146-16可以包括:
S146-17:确定迭代停止,输出优化参数的收敛值。或者
S146-18:确定迭代没有停止,继续进行下一次迭代。
如图4所示,步骤S140还可以包括:
S148:输出所述优化参数的收敛值及其对应的第一概率
Figure PCTCN2021123111-appb-000236
和第二概率
Figure PCTCN2021123111-appb-000237
如前所述,当目标函数收敛时,计算装置240可以将目标函数收敛时对应的优化参数的值作为优化参数的收敛值进行输出。同时,计算装置240可以将优化参数的收敛值对应的第一概率
Figure PCTCN2021123111-appb-000238
和第二概率
Figure PCTCN2021123111-appb-000239
进行输出。如公式(16)和公式(17)所示,优化参数第一空间协方差矩阵
Figure PCTCN2021123111-appb-000240
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000241
是基于第一概率
Figure PCTCN2021123111-appb-000242
和第二概率
Figure PCTCN2021123111-appb-000243
计算得到的。当目标函数收敛时,计算装置240可以将第一空间协方差矩阵
Figure PCTCN2021123111-appb-000244
以及第二空间协方差矩阵
Figure PCTCN2021123111-appb-000245
对应的第一概率
Figure PCTCN2021123111-appb-000246
和第二概率
Figure PCTCN2021123111-appb-000247
输出。
如图3所示,所述方法P100还可以包括:
S160:在极大似然估计以及期望最大化算法收敛时,将麦克风信号x f,t为语音存在模型的概率作为麦克风信号x f,t的语音存在概率
Figure PCTCN2021123111-appb-000248
并输出。
如前所述,在步骤S140中,计算装置240可以基于第一概率
Figure PCTCN2021123111-appb-000249
的熵
Figure PCTCN2021123111-appb-000250
和第二概率
Figure PCTCN2021123111-appb-000251
的熵
Figure PCTCN2021123111-appb-000252
确定语音存在模型是第一模型还是第二模型。当语音存在模型为第一模型时,麦克风信号x f,t为语音存在模型的概率可以是麦克风信号x f,t为第一模型的第一概率
Figure PCTCN2021123111-appb-000253
此时,麦克风信号x f,t的语音存在概率
Figure PCTCN2021123111-appb-000254
可以是目标函数收敛时,第一空间协方差矩阵
Figure PCTCN2021123111-appb-000255
的收敛值对应的第一概率
Figure PCTCN2021123111-appb-000256
当语音存在模型为第一模型时,麦克风信号x f,t为语音存在模型的概率可以是麦克风信号x f,t为第二模型的第二概率
Figure PCTCN2021123111-appb-000257
此时,麦克风信号x f,t的语音存在概率
Figure PCTCN2021123111-appb-000258
可以是目标函数收敛时,第二空间协方差矩阵
Figure PCTCN2021123111-appb-000259
的收敛值对应的第二概率
Figure PCTCN2021123111-appb-000260
计算装置240可以将语音存在概率
Figure PCTCN2021123111-appb-000261
输出至其他计算模块,比如语音增强模块,等等。
综上所述,本说明书提供的语音存在概率计算系统和方法P100中,计算装置240可以根据第一模型对应的第一概率
Figure PCTCN2021123111-appb-000262
的熵
Figure PCTCN2021123111-appb-000263
和第二模型对应的第二概率
Figure PCTCN2021123111-appb-000264
的熵
Figure PCTCN2021123111-appb-000265
来确定第一模型和第二模型中哪一个是语音存在模型,哪一个是语音不存在模型,从而获取麦克风信号x f,t的语音存在概率
Figure PCTCN2021123111-appb-000266
以对迭代过程中存在的语音概率错判现象进行纠正,提高语音存在概率
Figure PCTCN2021123111-appb-000267
计算的精度。同时,计算装置240可以在迭代过程中根据第一概率
Figure PCTCN2021123111-appb-000268
的熵
Figure PCTCN2021123111-appb-000269
和第二概率
Figure PCTCN2021123111-appb-000270
的熵
Figure PCTCN2021123111-appb-000271
对第一概率
Figure PCTCN2021123111-appb-000272
和第二概率
Figure PCTCN2021123111-appb-000273
进行修正,以使优化参数向目标方向迭代,从而加快收敛速度,进一步提高语音存在概率
Figure PCTCN2021123111-appb-000274
的计算精度。
本说明书还提供一种语音增强系统。语音增强系统也可以应用于电子设备200。在一些实施例中,语音增强系统可以包括计算装置240。在一些实施例中,语音增强系统可以应用于计算装置240。即语音增强系统可以在计算装置240上运行。语音增强系统可以包括具有数据信息处理功能的硬 件设备和驱动该硬件设备工作所需必要的程序。当然,语音增强系统也可以仅为具有数据处理能力的硬件设备,或者,仅为运行在硬件设备中的程序。
语音增强系统可以存储有执行本说明书描述的语音增强方法的数据或指令,并可以执行所述数据和/或指令。当语音增强系统在计算装置240上运行时,语音增强系统可以基于所述通信连接从麦克风阵列220中获取所述麦克风信号,并执行本说明书描述的语音增强方法的数据或指令。所述语音增强方法在本说明书中的其他部分介绍。比如,在图7的描述中介绍了所述语音增强方法。
当语音增强系统在计算装置240上运行时,所述语音增强系统与麦克风阵列220通信连接。存储介质243还可以包括存储在所述数据存储装置中的至少一个指令集,用于对所述麦克风信号进行基于MVDR的语音增强计算。所述指令是计算机程序代码,所述计算机程序代码可以包括执行本说明书提供的语音增强方法的程序、例程、对象、组件、数据结构、过程、模块等等。处理器242可以读取所述至少一个指令集,并且根据所述至少一个指令集的指示执行本说明书提供的语音增强方法。处理器242可以执行语音增强方法包含的所有步骤。
图7示出了根据本说明书的实施例提供的语音增强方法P200的流程图。所述方法P200可以基于MVDR方法对所述麦克风信号进行语音增强。具体地,处理器242可以执行所述方法P200。如图7所示,所述方法P200可以包括:
S220:获取所述M个麦克风输出的麦克风信号x f,t
如步骤S120所述,在此不再赘述。
S240:基于所述语音存在概率计算方法P100,确定麦克风信号x f,t的语音存在概率
Figure PCTCN2021123111-appb-000275
S260:基于语音存在概率
Figure PCTCN2021123111-appb-000276
确定麦克风信号x f,t的噪声协方差矩阵
Figure PCTCN2021123111-appb-000277
噪声协方差矩阵
Figure PCTCN2021123111-appb-000278
可以表示为以下公式:
Figure PCTCN2021123111-appb-000279
S280:基于MVDR方法以及所述噪声空间协方差矩阵
Figure PCTCN2021123111-appb-000280
确定麦克风信号x f,t对应的滤波系数ω f,t
滤波系数ω f,t可以表示为以下公式:
Figure PCTCN2021123111-appb-000281
其中,
Figure PCTCN2021123111-appb-000282
为目标语音所在的目标方向对应的导引矢量。θ s为目标方向对应的信号入射角度。在一些实施例中,θ s是已知的,此时
Figure PCTCN2021123111-appb-000283
也是已知的。在一些实施例中,θ s是未知的,计算装置240可以基于噪声协方差矩阵
Figure PCTCN2021123111-appb-000284
进行子空间分解,计算
Figure PCTCN2021123111-appb-000285
在一些实施例中,滤波系数ω f,t也可以表示为以下公式:
Figure PCTCN2021123111-appb-000286
其中,
Figure PCTCN2021123111-appb-000287
是语音不存在模型对应的收敛值。当第一模型为语音不存在模型时,
Figure PCTCN2021123111-appb-000288
Figure PCTCN2021123111-appb-000289
对应的收敛值。当第二模型为语音不存在模型时,
Figure PCTCN2021123111-appb-000290
Figure PCTCN2021123111-appb-000291
对应的收敛值。
S290:基于所述滤波系数ω f,t对所述麦克风信号x f,t进行合并,输出目标音频信号y f,t
目标音频信号y f,t可以表示为以下公式:
y f,t=ω f,t Hx f,t    公式(21)
计算装置240可以将目标音频信号y f,t输出至其他电子设备,比如远端通话设备。
综上所述,本说明书提供的语音存在概率计算系统和方法P100、语音增强系统和方法P200以及电子设备200,用于由多个麦克风222组成的麦克风220阵列。所述语音存在概率计算系统和方法P100、语音增强系统和方法P200以及电子设备200可以分别获取多个麦克风信号中存在语音时的语音存在模型和不存在语音时的语音不存在模型,并基于极大似然估计以及期望最大化算法通过多次迭代优化,并在迭代过程中根据语音存在概率的熵和语音不存在概率的熵,对语音存在概率和语音不存在概率进行修正,从而计算确定语音存在模型的模型参数和语音不存在时的模型参数,在所述极大似然估计以及期望最大化算法收敛时,获取语音存在模型对应的语音存在概率。所述语音存在概率计算系统和方法P100、语音增强系统和方法P200以及电子设备200,通过对比语音存在概率的熵和语音不存在概率的熵来对迭代过程中的语音存在概率和语音不存在概率进行修正,以获取更快的收敛速度和更好的收敛结果,从而使得语音存在概率和噪声协方差矩阵估计精度更高,进而提高MVDR的语音增强效果。
本说明书另一方面提供一种非暂时性存储介质,存储有至少一组用来语音存在概率计算可执行指令,当所述可执行指令被处理器执行时,所述可执行指令指导所述处理器实施本说明书所述的语音存在概率计算方法P100的步骤。在一些可能的实施方式中,本说明书的各个方面还可以实现为一种程序产品的形式,其包括程序代码。当所述程序产品在计算设备(比如计算装置240)上运行时,所述程序代码用于使计算设备执行本说明书描述的语音存在概率计算步骤。用于实现上述方法的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)包括程序代码,并可以在计算设备上运行。然而,本说明书的程序产品不限于此,在本说明书中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统(例如处理器242)使用或者与其结合使用。所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。 可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。可以以一种或多种程序设计语言的任意组合来编写用于执行本说明书操作的程序代码,所述程序设计语言包括面向对象的程序设计语言-诸如Java、C++等,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在计算设备上执行、部分地在计算设备上执行、作为一个独立的软件包执行、部分在计算设备上部分在远程计算设备上执行、或者完全在远程计算设备上执行。
上述对本说明书特定实施例进行了描述。其他实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者是可能有利的。
综上所述,在阅读本详细公开内容之后,本领域技术人员可以明白,前述详细公开内容可以仅以示例的方式呈现,并且可以不是限制性的。尽管这里没有明确说明,本领域技术人员可以理解本说明书需求囊括对实施例的各种合理改变,改进和修改。这些改变,改进和修改旨在由本说明书提出,并且在本说明书的示例性实施例的精神和范围内。
此外,本说明书中的某些术语已被用于描述本说明书的实施例。例如,“一个实施例”,“实施例”和/或“一些实施例”意味着结合该实施例描述的特定特征,结构或特性可以包括在本说明书的至少一个实施例中。因此,可以强调并且应当理解,在本说明书的各个部分中对“实施例”或“一个实施例”或“替代实施例”的两个或更多个引用不一定都指代相同的实施例。此外,特定特征,结构或特性可以在本说明书的一个或多个实施例中适当地组合。
应当理解,在本说明书的实施例的前述描述中,为了帮助理解一个特征,出于简化本说明书的目的,本说明书将各种特征组合在单个实施例、附图或其描述中。然而,这并不是说这些特征的组合是必须的,本领域技术人员在阅读本说明书的时候完全有可能将其中一部分特征提取出来作为单独的实施例来理解。也就是说,本说明书中的实施例也可以理解为多个次级实施例的整合。而每个次级实施例的内容在于少于单个前述公开实施例的所有特征的时候也是成立的。
本文引用的每个专利,专利申请,专利申请的出版物和其他材料,例如文章,书籍,说明书,出版物,文件,物品等,可以通过引用结合于此。用于所有目的的全部内容,除了与其相关的任何起诉文件历史,可能与本文件不一致或相冲突的任何相同的,或者任何可能对权利要求的最宽范围具有限制性影响的任何相同的起诉文件历史。现在或以后与本文件相关联。举例来说,如果在与任何所包含的材料相关联的术语的描述、定义和/或使用与本文档相关的术语、描述、定义和/或之间存在任何不一致或冲突时, 使用本文件中的术语为准。
最后,应理解,本文公开的申请的实施方案是对本说明书的实施方案的原理的说明。其他修改后的实施例也在本说明书的范围内。因此,本说明书披露的实施例仅仅作为示例而非限制。本领域技术人员可以根据本说明书中的实施例采取替代配置来实现本说明书中的申请。因此,本说明书的实施例不限于申请中被精确地描述过的实施例。

Claims (14)

  1. 一种语音存在概率计算方法,其特征在于,用于呈预设阵列形状分布的M个麦克风,所述M为大于1的整数,包括:
    获取所述M个麦克风输出的麦克风信号,所述麦克风信号满足高斯分布的第一模型或第二模型,所述第一模型和所述第二模型中的一个为语音存在模型,另一个为语音不存在模型;
    基于极大似然估计以及期望最大化算法分别对所述第一模型以及所述第二模型进行迭代优化,直至收敛,在迭代过程中,基于所述麦克风信号为所述第一模型时的第一概率的熵以及所述麦克风信号为所述第二模型时的第二概率的熵,确定所述语音存在模型是所述第一模型还是所述第二模型,所述第一概率与所述第二概率互补;以及
    在所述极大似然估计以及期望最大化算法收敛时,将所述麦克风信号为所述语音存在模型的概率作为所述麦克风信号的语音存在概率并输出。
  2. 如权利要求1所述的语音存在概率计算方法,其特征在于,所述第一模型对应的高斯分布的第一方差包括第一参数与第一空间协方差矩阵的乘积;以及
    所述第二模型对应的高斯分布的第二方差包括第二参数与第二空间协方差矩阵的乘积。
  3. 如权利要求2所述的语音存在概率计算方法,其特征在于,所述基于极大似然估计以及期望最大化算法分别对所述第一模型以及所述第二模型进行迭代优化,包括:
    基于极大似然估计以及期望最大化算法,构建目标函数;
    确定优化参数,所述优化参数包括所述第一空间协方差矩阵以及所述第二空间协方差矩阵;
    确定所述优化参数的初始值;
    基于所述目标函数以及所述优化参数的初始值,对所述优化参数进行多次迭代,直至所述目标函数收敛,包括:
    在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型;以及
    输出所述优化参数的收敛值及其对应的所述第一概率和所述第二概率。
  4. 如权利要求3所述的语音存在概率计算方法,其特征在于,所述在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型,包括:
    在所述多次迭代的任意一次迭代中,计算所述第一概率的熵和所述第二概率的熵,确定所述语音存在概率是所述第一模型还是所述第二模型,包括:
    确定所述第一概率的熵大于所述第二概率的熵,确定所述语音存在模型为所述第二模型;或者
    确定所述第一概率的熵小于所述第二概率的熵,确定所述语音存在模型为所述第一模型。
  5. 如权利要求3所述的语音存在概率计算方法,其特征在于,所述在所述多次迭代中基于所述第一概率的熵和所述第二概率的熵确定所述语音存在概率是所述第一模型还是所述第二模型,包括:
    在所述多次迭代的第一次迭代中,计算所述第一概率的熵和所述第二概率的熵,确定所述语音存在概率是所述第一模型还是所述第二模型,包括:
    确定所述第一概率的熵大于所述第二概率的熵,确定所述语音存在模型为所述第二模型;或者
    确定所述第一概率的熵小于所述第二概率的熵,确定所述语音存在模型为所述第一模型。
  6. 如权利要求5所述的语音存在概率计算方法,其特征在于,所述对所述优化参数进行多次迭代,还包括在所述多次迭代的每次迭代中:
    基于所述第一概率的熵和所述第二概率的熵对所述第一概率和所述第二概率进行修正,包括:
    确定所述第一模型为所述语音存在模型,以及所述第一概率的熵大于所述第二概率的熵,将所述第一概率对应的值与所述第二概率对应的值互换;或者
    确定所述第二模型为所述语音存在模型,以及所述第二概率的熵大于所述第一概率的熵,将所述第一概率对应的值与所述第二概率对应的值互换;以及
    基于修正后的所述第一概率和所述第二概率更新所述优化参数。
  7. 如权利要求3所述的语音存在概率计算方法,其特征在于,所述对所述优化参数进行多次迭代,还包括在所述多次迭代的每次迭代中:
    对所述优化参数进行可逆化修正,包括:
    确定所述优化参数不可逆,通过偏差矩阵对所述优化参数进行修正,所述偏差矩阵包括单位矩阵、服从正态分布或均匀分布的随机矩阵中的一种。
  8. 一种语音存在概率计算系统,其特征在于,包括:
    至少一个存储介质,存储有至少一个指令集用于语音存在概率计算;以及
    至少一个处理器,同所述至少一个存储介质通信连接,
    其中当所述语音存在概率计算系统运行时,所述至少一个处理器读取所述至少一个指令集并实施权利要求1-7中任一项所述的语音存在概率计算方法。
  9. 一种语音增强方法,其特征在于,用于呈预设阵列形状分布的M个麦克风,所述M为大于1的整数,包括:
    获取所述M个麦克风输出的麦克风信号;
    基于权利要求1-7中任一项所述的语音存在概率计算方法,确定所述麦克风信号的所述语音存在概率;
    基于所述语音存在概率确定所述麦克风信号的噪声协方差矩阵;
    基于MVDR方法以及所述噪声空间协方差矩阵,确定所述麦克风信号对应的滤波系数;以及
    基于所述滤波系数对所述麦克风信号进行合并,输出目标音频信号。
  10. 一种语音增强系统,其特征在于,包括:
    至少一个存储介质,存储有至少一个指令集用于进行语音增强;以及
    至少一个处理器,同所述至少一个存储介质通信连接,
    其中当所述语音增强系统运行时,所述至少一个处理器读取所述至少一个指令集并实施权利要求9所述的语音增强方法。
  11. 一种耳机,其特征在于,包括:
    麦克风阵列,包括呈预设阵列形状分布的M个麦克风,所述M为大于1的整数;以及
    计算装置,运行时与所述麦克风阵列通信连接,并执行权利要求9所述的语音增强方法。
  12. 如权利要求11所述的耳机,其特征在于,所述M个麦克风呈线性分布,并且M不大于5,所述M个麦克风中相邻麦克风的间距在20mm~40mm之间。
  13. 如权利要求11所述的耳机,其特征在于,还包括:
    第一壳体,所述麦克风阵列安装在所述第一壳体上,包括:
    第一接口,设置有第一磁性装置;以及
    第二壳体,所述计算装置安装在所述第二壳体上,包括:
    第二接口,设置有第二磁性装置,
    其中,所述第一磁性装置与所述第二磁性装置之间的吸附力使得所述第一壳体与所述第二壳体可拆卸连接。
  14. 如权利要求13所述的耳机,其特征在于,所述第一壳体还包括:
    触点,设置在所述第一接口处,与所述麦克风阵列通信连接;以及
    所述第二壳体还包括:
    导轨,设置在所述第二接口处,与所述计算装置通信连接,
    当所述第一壳体与所述第二壳体连接时,所述触点与所述导轨接触,使得所述麦克风阵列与所述计算装置通信连接。
PCT/CN2021/123111 2021-10-11 2021-10-11 语音存在概率计算方法、系统、语音增强方法、系统以及耳机 WO2023060400A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020237020638A KR20230109716A (ko) 2021-10-11 2021-10-11 음성존재확률 판정방법과 시스템, 음성강화방법과 시스템, 헤드폰
JP2023542599A JP2024506237A (ja) 2021-10-11 2021-10-11 音声存在確率の計算方法、音声存在確率の計算システム、音声強調方法、音声強調システム及びイヤホン
CN202180077272.XA CN116508328A (zh) 2021-10-11 2021-10-11 语音存在概率计算方法、系统、语音增强方法、系统以及耳机
PCT/CN2021/123111 WO2023060400A1 (zh) 2021-10-11 2021-10-11 语音存在概率计算方法、系统、语音增强方法、系统以及耳机
EP21960151.5A EP4227941A4 (en) 2021-10-11 2021-10-11 METHOD AND SYSTEM FOR CALCULATING THE PROBABILITY OF SPEECH PRESENCE, METHOD AND SYSTEM FOR SPEECH IMPROVEMENT AND HEADPHONES
US18/305,398 US20230260529A1 (en) 2021-10-11 2023-04-24 Methods and systems for determining speech presence probability, speech enhancement methods and systems, and headphones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/123111 WO2023060400A1 (zh) 2021-10-11 2021-10-11 语音存在概率计算方法、系统、语音增强方法、系统以及耳机

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/305,398 Continuation US20230260529A1 (en) 2021-10-11 2023-04-24 Methods and systems for determining speech presence probability, speech enhancement methods and systems, and headphones

Publications (1)

Publication Number Publication Date
WO2023060400A1 true WO2023060400A1 (zh) 2023-04-20

Family

ID=85987161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123111 WO2023060400A1 (zh) 2021-10-11 2021-10-11 语音存在概率计算方法、系统、语音增强方法、系统以及耳机

Country Status (6)

Country Link
US (1) US20230260529A1 (zh)
EP (1) EP4227941A4 (zh)
JP (1) JP2024506237A (zh)
KR (1) KR20230109716A (zh)
CN (1) CN116508328A (zh)
WO (1) WO2023060400A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935835A (zh) * 2024-03-22 2024-04-26 浙江华创视讯科技有限公司 音频降噪方法、电子设备以及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11916988B2 (en) * 2020-09-28 2024-02-27 Bose Corporation Methods and systems for managing simultaneous data streams from multiple sources
CN117275528B (zh) * 2023-11-17 2024-03-01 浙江华创视讯科技有限公司 语音存在概率的估计方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883181A (zh) * 2020-06-30 2020-11-03 海尔优家智能科技(北京)有限公司 音频检测方法、装置、存储介质及电子装置
CN113270106A (zh) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 双麦克风的风噪声抑制方法、装置、设备及存储介质

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883181A (zh) * 2020-06-30 2020-11-03 海尔优家智能科技(北京)有限公司 音频检测方法、装置、存储介质及电子装置
CN113270106A (zh) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 双麦克风的风噪声抑制方法、装置、设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4227941A4 *
ZHU XUNYU, PAN XIANG: "Research on Speech Enhancement Algorithm Based on Microphone Linear Array", JOURNAL OF HANGZHOU DIANZI UNIVERSITY, CN, 15 September 2020 (2020-09-15), CN , XP093058636, ISSN: 1001-9146, DOI: 10.13954/j.cnki.hdu.2020.05.006 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935835A (zh) * 2024-03-22 2024-04-26 浙江华创视讯科技有限公司 音频降噪方法、电子设备以及存储介质
CN117935835B (zh) * 2024-03-22 2024-06-07 浙江华创视讯科技有限公司 音频降噪方法、电子设备以及存储介质

Also Published As

Publication number Publication date
US20230260529A1 (en) 2023-08-17
KR20230109716A (ko) 2023-07-20
CN116508328A (zh) 2023-07-28
EP4227941A4 (en) 2023-12-27
JP2024506237A (ja) 2024-02-13
EP4227941A1 (en) 2023-08-16

Similar Documents

Publication Publication Date Title
WO2023060400A1 (zh) 语音存在概率计算方法、系统、语音增强方法、系统以及耳机
US10631102B2 (en) Microphone system and a hearing device comprising a microphone system
US11134330B2 (en) Earbud speech estimation
KR101337695B1 (ko) 강력한 노이즈 저감을 위한 마이크로폰 어레이 서브세트 선택
US20130294611A1 (en) Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation
US10186277B2 (en) Microphone array speech enhancement
US10431240B2 (en) Speech enhancement method and system
US20180012617A1 (en) Microphone array noise suppression using noise field isotropy estimation
CN110554357A (zh) 声源定位方法和装置
US20230215453A1 (en) Electronic device for controlling beamforming and operating method thereof
WO2023115269A1 (zh) 语音活动检测方法、系统、语音增强方法以及系统
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
CN116110421A (zh) 语音活动检测方法、系统、语音增强方法以及系统
CN115966215A (zh) 语音存在概率计算方法、系统、语音增强方法以及耳机
WO2023082134A1 (zh) 语音活动检测方法、系统、语音增强方法以及系统
CN116364100A (zh) 语音活动检测方法、系统、语音增强方法以及系统
Ayrapetian et al. Asynchronous acoustic echo cancellation over wireless channels
US20240203437A1 (en) Electronic device for outputting sound and operating method thereof
CN113808606B (zh) 语音信号处理方法和装置
JP5224950B2 (ja) 信号処理装置
KR20230067427A (ko) 빔포밍을 제어하는 전자 장치 및 이의 동작 방법
CN114333876A (zh) 信号处理的方法和装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202180077272.X

Country of ref document: CN

ENP Entry into the national phase

Ref document number: 2021960151

Country of ref document: EP

Effective date: 20230511

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960151

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20237020638

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023542599

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE