CN115862656A

CN115862656A - Method, device, equipment and storage medium for enhancing bone-conduction microphone voice

Info

Publication number: CN115862656A
Application number: CN202310054459.9A
Authority: CN
Inventors: 梁山; 陶建华; 聂帅; 李冠君; 易江燕
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-03-28
Anticipated expiration: 2043-02-03
Also published as: CN115862656B

Abstract

The present disclosure relates to a method and apparatus for enhancing a bone-conduction microphone voice, a device and a storage medium, wherein the method comprises: acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band; and the two intercepted frequency domain signals are fused according to the fusion factor to obtain the speech enhancement speech of the bone conduction microphone, the two frequency domain signals are intercepted and fused firstly according to a preset cut-off frequency serving as a filter bank, and the whole signal is corrected through half-wave rectification, so that the effective suppression of noise is realized, the clear low-frequency signals of the bone conduction microphone can be reserved, and meanwhile, the missing medium and high-frequency information is supplemented to obtain the audio signals with higher perception quality.

Description

Method, device, equipment and storage medium for enhancing bone-conduction microphone voice

Technical Field

The present disclosure relates to the field of speech processing technologies, and in particular, to a method and an apparatus for enhancing speech of a bone-conduction microphone, a device and a storage medium.

Background

Currently, many smart headsets, head-worn VR/AR devices, all integrate a voice interaction microphone. The functions of voice communication, man-machine interaction and the like can be realized by picking up voice signals through the microphone and performing processing technologies such as enhancement, awakening, recognition and the like, and the method is one of key technologies for improving the man-machine interaction efficiency and the voice communication quality. The purity degree of the voice picked up by the microphone or the interference degree of the noise is the key factor affecting the actual interactive experience. Bone conduction microphones are not interfered by environmental noise because they transmit sound waves through human skull, and are currently receiving wide attention from the industry. However, the bone conduction microphone can only pick up signals below 2Khz, and cannot effectively pick up medium-high frequency signals, so that the voice perception is greatly different from real voice.

It is a common way to use bone conduction microphone and traditional microphone to realize voice pick-up and enhancement. The most common approach is to perform voice activity area detection based on bone conduction microphones and use the detection results for traditional microphone voice signal enhancement. Because the bone conduction microphone is not interfered by environmental noise, the voice activity detection is more accurate, and the voice enhancement effect of the traditional microphone can be improved. With the wide application of deep learning, more and more schemes for fusing a deep learning solid bone conduction microphone signal and a traditional microphone signal are adopted. However, deep learning requires extensive data to ensure the effect, and bone conduction speech is difficult to be widely collected, which limits the effect of practical application.

Bone conduction microphones transmit sound waves through the human skull, bone labyrinth, auditory center by converting sound into mechanical vibrations of different frequencies. Compared with a classical sound transmission mode of generating sound waves through a vibrating diaphragm, the bone conduction omits a plurality of sound wave transmission steps, and clear sound restoration can be realized in a noisy environment.

A method for carrying out voice activity detection based on bone conduction microphone signals and guiding traditional microphone signals to be voice enhanced as guide information is to realize noise masking on the traditional microphone signals. Because the noise has strong interference at low frequency in practical application, the masking information directly obtained on the traditional microphone signal has very large distortion, which can affect the quality of voice interaction; the scheme based on deep learning generally faces the problem of insufficient generalization in practical application due to insufficient data training.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a method and an apparatus for enhancing a speech of a bone microphone, a device, and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a method for enhancing a bone-conduction microphone voice, the method including:

acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;

respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;

determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;

and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.

In one possible implementation, the acquiring two frequency domain signals includes:

acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone element and the traditional microphone array by adopting the same preset clock and sampling rate;

and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.

In a possible implementation manner, the respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals includes:

for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;

for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;

and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.

In one possible implementation, for a frequency domain signal obtained by picking up a corresponding by a bone conduction microphone element, a signal smaller than a preset cutoff frequency is intercepted as a first intercepted frequency domain signal by the following expression:

wherein ,

for the first intercepted frequency domain signal, is greater than>

For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>

Is the sampling rate->

Is a pre-set cut-off frequency,

for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:

wherein ,

for the second truncated frequency-domain signal, <' > H>

In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>

Is the sampling rate->

Is a preset cut-off frequency.

In a possible implementation manner, the half-wave rectifying the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals respectively includes:

respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;

and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.

In a possible embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:

wherein ,

half-wave rectified time domain signal corresponding to bone conduction microphone elementHorn,. Beta., et>

For the intercepted time domain signal corresponding to the bone conduction microphone unit, then>

wherein ,

is a half-wave rectified time domain signal corresponding to the traditional microphone array>

And intercepting the corresponding intercepted time domain signal of the traditional microphone array.

In a possible implementation manner, the determining a fusion factor according to the two half-wave rectified time domain signals according to the preset intermediate frequency band includes:

carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;

calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;

and calculating a fusion factor according to the two accumulated energies.

In a possible implementation, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:

wherein ,

corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->

For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>

Is the smoothing factor between adjacent time instants.

In one possible embodiment, the fusion factor is calculated from the two cumulative energies by the following expression:

wherein ,

Is a fusion factor.

In a possible embodiment, the fusing the two types of truncated frequency domain signals according to the fusion factor to obtain the bone microphone speech enhanced speech includes:

fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;

and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.

In a possible embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:

wherein ,

is a fused frequency domain signal>

For the first intercepted frequency domain signal, is greater than>

For the second truncated frequency-domain signal, <' > H>

Is a fusion factor.

In a second aspect, an embodiment of the present disclosure provides a bone-microphone speech enhancement apparatus, including:

the intercepting module is used for acquiring two frequency domain signals, respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the traditional microphone array;

the rectification module is used for respectively carrying out half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;

the determining module is used for determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;

and the fusion module is used for fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone microphone voice enhanced voice.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the bone microphone voice enhancement method when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the bone microphone speech enhancement method described above.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages:

the method for enhancing the speech of the bone microphone according to the embodiment of the disclosure obtains two frequency domain signals, and respectively intercepts the two frequency domain signals according to a preset cutoff frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone microphone element and a traditional microphone array; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency; the two intercepted frequency domain signals are fused according to the fusion factor to obtain the bone conduction microphone voice enhancement voice, the two intercepted frequency domain signals are intercepted and fused firstly according to the preset cut-off frequency serving as a filter bank, the whole signal is corrected through half-wave rectification, effective suppression on noise is achieved, clear low-frequency signals of the bone conduction microphone can be reserved, meanwhile, missing medium and high-frequency information can be supplemented, and audio signals with higher perception quality are obtained.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 schematically illustrates a flowchart of a method for enhancing speech of a bone microphone according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a block diagram of a bone microphone speech enhancement device according to an embodiment of the present disclosure; and

fig. 3 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Referring to fig. 1, an embodiment of the present disclosure provides a method for enhancing a bone-microphone voice, the method including:

the method comprises the following steps of S1, acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;

in practical applications, the bone microphone collects sound signals by using slight vibration of bones in the head and neck caused by a person speaking, and the traditional microphone collects sound signals through air conduction.

S2, performing half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;

s3, according to a preset intermediate frequency band, determining a fusion factor according to the two half-wave rectified time domain signals, wherein the minimum value of the intermediate frequency band is the cut-off frequency;

and S4, fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhancement voice.

In this embodiment, in step S1, the acquiring two frequency domain signals includes:

acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone array and the traditional microphone array by adopting the same preset clock and sampling rate;

In this embodiment, the two time domain signals are fourier transformed by the following expression to obtain two frequency domain signals:

wherein ,

for picking up pairs by bone conduction microphone elementsThe frequency spectrum of the kth frequency band of the l frame, which should be obtained in the frequency domain signal, is->

For picking up the frequency spectrum of the kth frequency band of the correspondingly obtained frequency-domain signal by means of a conventional microphone array, a decision is made as to whether the frequency spectrum of the kth frequency band of the l frame is present in the frequency-domain signal>

Is the frame length 512->

Is a hamming window of length 512, l is a time frame number, k is a frequency number, R>

For the time domain signal picked up by the bone conduction microphone unit>

Time domain signals picked up by a conventional microphone array.

In this embodiment, in step S1, the intercepting the two frequency domain signals according to a preset cut-off frequency respectively to obtain two intercepted frequency domain signals, including:

In this embodiment, for the frequency domain signal obtained by picking up the corresponding bone conduction microphone element, a signal smaller than a preset cut-off frequency is intercepted as a first intercepted frequency domain signal by the following expression:

wherein ,

for the first intercepted frequency domain signal>

Is the sampling rate->

Is a pre-set cut-off frequency,

wherein ,

for the second truncated frequency-domain signal, <' > H>

Is the sampling rate->

Is a preset cut-off frequency.

Respectively intercepting the two frequency domain signals according to a preset cut-off frequency, and only keeping the bone conduction signals smaller than

For easily lowered frequency band signalThe conventional microphone signal disturbed by the frequent noise is only kept above ≥ h>

The signal of the frequency band ensures the purity of the signal.

In this embodiment, in step S2, the half-wave rectification is performed on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals, including:

In this embodiment, the two types of truncated frequency domain signals are respectively subjected to fourier transform by the following expression to obtain two types of truncated time domain signals:

/>

wherein ,

For the intercepted time domain signal corresponding to the traditional microphone array, then>

For the first intercepted frequency domain signal, is greater than>

For the second truncated frequency-domain signal, <' > H>

Is the frame length 512->

For a Hamming window, l is the time frame number, and k is the frequency number.

In this embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:

wherein ,

for the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>

The intercepted time domain signals corresponding to the bone conduction microphone elements,

wherein ,

And intercepting the time domain signal corresponding to the traditional microphone array.

Since most of the energy of the speech signal has a clear harmonic structure, the present disclosure enhances adjacent harmonics by performing half-wave rectification.

In this embodiment, in step S3, determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band includes:

and calculating a fusion factor according to the two accumulated energies.

In this embodiment, the two half-wave rectified time domain signals are subjected to fourier transform by the following expression to obtain two half-wave rectified frequency domain signals:

wherein ,

Is the frame length 512->

A hamming window of length 512, l a time frame number and k a frequency number. />

In this embodiment, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:

wherein ,

corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>

For a smoothing factor between adjacent instants>

Preferably 0.96, to ensure sufficient prediction accuracy while avoiding large variations in the fusion factor over the time series.

The method and the device use the preset intermediate frequency band as a reference frequency band for matching the bone conduction microphone signal with the traditional microphone signal, namely, a fusion factor is searched to enable the energy of the bone conduction microphone signal and the traditional microphone signal to be matched as much as possible in the intermediate frequency band, and the integrity of fusion is ensured.

In the present embodiment, the fusion factor is calculated from the two accumulated energies by the following expression:

wherein ,

Is a fusion factor.

In this embodiment, in step S4, the fusing the two types of intercepted frequency domain signals according to the fusion factor to obtain the bone microphone speech enhancement speech includes:

In this embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:

wherein ,

for the fused frequency-domain signal,>

For the first intercepted frequency domain signal>

For the second truncated frequency-domain signal, <' > H>

Is a fusion factor.

In this embodiment, the fused frequency domain signal is subjected to fourier transform by the following expression to obtain a fused time domain signal:

wherein ,

for fused time domain signal, <' >>

For the fused frequency-domain signal,>

for a frame length 512, <' > based on>

For a Hamming window, l is the time frame number and k is the frequency number.

In the embodiment of the disclosure, the larger the energy of the traditional signal is seen from the calculation mode of the fusion factor, the smaller the fusion factor is, otherwise, the higher the fusion factor is, the dynamic fusion factor can ensure that the output final frequency spectrum Z (l, k) is formed by fusing two signals with approximately the same amplitude, and as can be seen from Z (l, k), the bone conduction microphone signal is reserved at low frequency and hardly interfered by background noise, and at medium and high frequency bands, the matched medium and high frequency energy is reserved according to the fusion factor, so that the voice is ensured to contain a complete harmonic structure, and the perception quality of the output voice signal is improved.

The bone conduction microphone voice enhancement method aims at solving the problem that the low frequency of a bone conduction microphone signal is not interfered by environmental noise but lacks effective medium-high frequency signals, and can effectively integrate the low frequency and medium-high frequency voice signals in a mode of fusing with the traditional microphone signal, so that the integrity and the definition of a voice harmonic structure are ensured, and the perception quality is higher.

The bone conduction microphone voice enhancement method aims to calculate dynamic fusion factors according to two-path signal half-wave rectification results and achieve effective fusion of bone conduction microphone signals and traditional microphone signals.

The bone conduction microphone voice enhancement method disclosed by the invention is combined with the traditional microphone signal to repair the related frequency band signal, the auditory perception of picking up voice is ensured, on the one hand, as the bone conduction microphone picks up clear voice signals under a strong noise scene, through the medium-high frequency compensation method disclosed by the invention, the bone conduction microphone voice enhancement method can adapt to a very complex strong interference acoustic environment, the application range is wider, on the other hand, fusion is realized through fusion factors, two paths of signals can be ensured to be matched very in amplitude, the signal mismatch caused by direct addition is avoided, and therefore, the perception quality is higher.

The core of the bone microphone voice enhancement method disclosed by the invention lies in the calculation of fusion factors, the main calculation amount is embodied in the calculation of Fourier, and the Fourier transform has more acceleration means at present, so that the method can be suitable for many head-wearing VR and AR products with strict requirements on power consumption, and the application range is wider.

Referring to fig. 2, an embodiment of the present disclosure provides a bone-conduction microphone speech enhancement device, including:

the intercepting module 21 is configured to acquire two frequency domain signals, and respectively intercept the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, where the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the conventional microphone array;

the rectification module 22 is configured to perform half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;

the determining module 23 is configured to determine a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, where a minimum value of the intermediate frequency band is the cutoff frequency;

and the fusion module 24 is configured to fuse the two intercepted frequency domain signals according to the fusion factor to obtain a bone microphone speech enhancement speech.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

In the second embodiment, any plurality of the intercepting module 21, the rectifying module 22, the determining module 23 and the fusing module 24 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. At least one of intercept module 21, rectifier module 22, determination module 23, and fusion module 24 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the parsing module 11, the determining module 12 and the synchronizing module 13 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

Referring to fig. 3, an electronic device provided by an embodiment of the present disclosure includes a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140, where the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140;

a memory 1130 for storing computer programs;

the processor 1110, when executing the program stored in the memory 1130, implements the bone microphone speech enhancement method as follows:

The communication bus 1140 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface 1120 is used for communication between the electronic device and other devices.

The Memory 1130 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 1130 may also be at least one memory device located remotely from the processor 1110.

The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

Embodiments of the present disclosure also provide a computer-readable storage medium. The above computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the bone microphone speech enhancement method as described above.

The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The computer readable storage medium carries one or more programs which, when executed, implement a method for bone conduction microphone speech enhancement according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for bone-conduction microphone speech enhancement, the method comprising:

2. The method of claim 1, wherein the obtaining two frequency domain signals comprises:

3. The method according to claim 1, wherein the respectively truncating the two kinds of frequency domain signals according to a preset cut-off frequency to obtain two kinds of truncated frequency domain signals comprises:

4. The method according to claim 3, characterized in that for the frequency domain signals obtained by picking up the corresponding bone conduction microphone elements, signals smaller than a preset cut-off frequency are intercepted as first intercepted frequency domain signals by the following expressions:

wherein ,

for the first intercepted frequency domain signal, is greater than>

Is the sampling rate->

Is a pre-set cut-off frequency,

wherein ,

for the second intercepted frequency domain signal>

Is the sampling rate->

Is a preset cut-off frequency.

5. The method of claim 1, wherein the half-wave rectifying the two truncated frequency domain signals to obtain two half-wave rectified time domain signals respectively comprises:

6. The method of claim 5, wherein the two truncated time domain signals are half-wave rectified respectively by the following expression to obtain two half-wave rectified time domain signals:

wherein ,

wherein ,

7. The method of claim 1, wherein determining a fusion factor from the two half-wave rectified time domain signals according to a preset intermediate frequency band comprises:

and calculating a fusion factor according to the two accumulated energies.

8. The method according to claim 7, wherein the predetermined intermediate frequency band is [1500Hz,2000Hz ], and the two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the predetermined intermediate frequency band are calculated by the following expression:

wherein ,

the half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>

The half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding cumulative energy->

Is the smoothing factor between adjacent time instants.

9. The method of claim 7, wherein a fusion factor is calculated from the two cumulative energies by the following expression:

wherein ,

Is a fusion factor.

10. The method according to claim 1, wherein said fusing the two types of truncated frequency-domain signals according to a fusion factor to obtain an osteoinductive microphone speech-enhanced speech, comprises:

11. The method according to claim 10, wherein the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:

wherein ,

is a fused frequency domain signal>

For the first intercepted frequency domain signal, is greater than>

For the second truncated frequency-domain signal, <' > H>

Is a fusion factor.

12. A bone conduction microphone speech enhancement device, comprising:

13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the bone microphone speech enhancement method of any one of claims 1-11 when executing a program stored on a memory.

14. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the bone conduction microphone speech enhancement method of any one of claims 1-11.