CN115862656A - Method, device, equipment and storage medium for enhancing bone-conduction microphone voice - Google Patents
Method, device, equipment and storage medium for enhancing bone-conduction microphone voice Download PDFInfo
- Publication number
- CN115862656A CN115862656A CN202310054459.9A CN202310054459A CN115862656A CN 115862656 A CN115862656 A CN 115862656A CN 202310054459 A CN202310054459 A CN 202310054459A CN 115862656 A CN115862656 A CN 115862656A
- Authority
- CN
- China
- Prior art keywords
- frequency domain
- domain signals
- frequency
- intercepted
- signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000002708 enhancing effect Effects 0.000 title abstract description 9
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 102
- 230000004927 fusion Effects 0.000 claims abstract description 59
- 230000014509 gene expression Effects 0.000 claims description 22
- 238000004891 communication Methods 0.000 claims description 20
- 238000001228 spectrum Methods 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 17
- 230000001186 cumulative effect Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000009499 grossing Methods 0.000 claims description 3
- 230000002138 osteoinductive effect Effects 0.000 claims 1
- 230000008447 perception Effects 0.000 abstract description 7
- 230000005236 sound signal Effects 0.000 abstract description 4
- 230000001629 suppression Effects 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 210000003625 skull Anatomy 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Circuit For Audible Band Transducer (AREA)
Abstract
The present disclosure relates to a method and apparatus for enhancing a bone-conduction microphone voice, a device and a storage medium, wherein the method comprises: acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band; and the two intercepted frequency domain signals are fused according to the fusion factor to obtain the speech enhancement speech of the bone conduction microphone, the two frequency domain signals are intercepted and fused firstly according to a preset cut-off frequency serving as a filter bank, and the whole signal is corrected through half-wave rectification, so that the effective suppression of noise is realized, the clear low-frequency signals of the bone conduction microphone can be reserved, and meanwhile, the missing medium and high-frequency information is supplemented to obtain the audio signals with higher perception quality.
Description
Technical Field
The present disclosure relates to the field of speech processing technologies, and in particular, to a method and an apparatus for enhancing speech of a bone-conduction microphone, a device and a storage medium.
Background
Currently, many smart headsets, head-worn VR/AR devices, all integrate a voice interaction microphone. The functions of voice communication, man-machine interaction and the like can be realized by picking up voice signals through the microphone and performing processing technologies such as enhancement, awakening, recognition and the like, and the method is one of key technologies for improving the man-machine interaction efficiency and the voice communication quality. The purity degree of the voice picked up by the microphone or the interference degree of the noise is the key factor affecting the actual interactive experience. Bone conduction microphones are not interfered by environmental noise because they transmit sound waves through human skull, and are currently receiving wide attention from the industry. However, the bone conduction microphone can only pick up signals below 2Khz, and cannot effectively pick up medium-high frequency signals, so that the voice perception is greatly different from real voice.
It is a common way to use bone conduction microphone and traditional microphone to realize voice pick-up and enhancement. The most common approach is to perform voice activity area detection based on bone conduction microphones and use the detection results for traditional microphone voice signal enhancement. Because the bone conduction microphone is not interfered by environmental noise, the voice activity detection is more accurate, and the voice enhancement effect of the traditional microphone can be improved. With the wide application of deep learning, more and more schemes for fusing a deep learning solid bone conduction microphone signal and a traditional microphone signal are adopted. However, deep learning requires extensive data to ensure the effect, and bone conduction speech is difficult to be widely collected, which limits the effect of practical application.
Bone conduction microphones transmit sound waves through the human skull, bone labyrinth, auditory center by converting sound into mechanical vibrations of different frequencies. Compared with a classical sound transmission mode of generating sound waves through a vibrating diaphragm, the bone conduction omits a plurality of sound wave transmission steps, and clear sound restoration can be realized in a noisy environment.
A method for carrying out voice activity detection based on bone conduction microphone signals and guiding traditional microphone signals to be voice enhanced as guide information is to realize noise masking on the traditional microphone signals. Because the noise has strong interference at low frequency in practical application, the masking information directly obtained on the traditional microphone signal has very large distortion, which can affect the quality of voice interaction; the scheme based on deep learning generally faces the problem of insufficient generalization in practical application due to insufficient data training.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a method and an apparatus for enhancing a speech of a bone microphone, a device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for enhancing a bone-conduction microphone voice, the method including:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
In one possible implementation, the acquiring two frequency domain signals includes:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone element and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
In a possible implementation manner, the respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals includes:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
In one possible implementation, for a frequency domain signal obtained by picking up a corresponding by a bone conduction microphone element, a signal smaller than a preset cutoff frequency is intercepted as a first intercepted frequency domain signal by the following expression:
wherein ,for the first intercepted frequency domain signal, is greater than>For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>Is the sampling rate->Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
wherein ,for the second truncated frequency-domain signal, <' > H>In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>Is the sampling rate->Is a preset cut-off frequency.
In a possible implementation manner, the half-wave rectifying the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals respectively includes:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
In a possible embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:
wherein ,half-wave rectified time domain signal corresponding to bone conduction microphone elementHorn,. Beta., et>For the intercepted time domain signal corresponding to the bone conduction microphone unit, then>
wherein ,is a half-wave rectified time domain signal corresponding to the traditional microphone array>And intercepting the corresponding intercepted time domain signal of the traditional microphone array.
In a possible implementation manner, the determining a fusion factor according to the two half-wave rectified time domain signals according to the preset intermediate frequency band includes:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
In a possible implementation, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:
wherein ,corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>Is the smoothing factor between adjacent time instants.
In one possible embodiment, the fusion factor is calculated from the two cumulative energies by the following expression:
wherein ,corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Is a fusion factor.
In a possible embodiment, the fusing the two types of truncated frequency domain signals according to the fusion factor to obtain the bone microphone speech enhanced speech includes:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
In a possible embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
wherein ,is a fused frequency domain signal>For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>For the first intercepted frequency domain signal, is greater than>For the second truncated frequency-domain signal, <' > H>Is a fusion factor.
In a second aspect, an embodiment of the present disclosure provides a bone-microphone speech enhancement apparatus, including:
the intercepting module is used for acquiring two frequency domain signals, respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the traditional microphone array;
the rectification module is used for respectively carrying out half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
the determining module is used for determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and the fusion module is used for fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone microphone voice enhanced voice.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the bone microphone voice enhancement method when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the bone microphone speech enhancement method described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages:
the method for enhancing the speech of the bone microphone according to the embodiment of the disclosure obtains two frequency domain signals, and respectively intercepts the two frequency domain signals according to a preset cutoff frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone microphone element and a traditional microphone array; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency; the two intercepted frequency domain signals are fused according to the fusion factor to obtain the bone conduction microphone voice enhancement voice, the two intercepted frequency domain signals are intercepted and fused firstly according to the preset cut-off frequency serving as a filter bank, the whole signal is corrected through half-wave rectification, effective suppression on noise is achieved, clear low-frequency signals of the bone conduction microphone can be reserved, meanwhile, missing medium and high-frequency information can be supplemented, and audio signals with higher perception quality are obtained.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 schematically illustrates a flowchart of a method for enhancing speech of a bone microphone according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a block diagram of a bone microphone speech enhancement device according to an embodiment of the present disclosure; and
fig. 3 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Referring to fig. 1, an embodiment of the present disclosure provides a method for enhancing a bone-microphone voice, the method including:
the method comprises the following steps of S1, acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
in practical applications, the bone microphone collects sound signals by using slight vibration of bones in the head and neck caused by a person speaking, and the traditional microphone collects sound signals through air conduction.
S2, performing half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;
s3, according to a preset intermediate frequency band, determining a fusion factor according to the two half-wave rectified time domain signals, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and S4, fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhancement voice.
In this embodiment, in step S1, the acquiring two frequency domain signals includes:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone array and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
In this embodiment, the two time domain signals are fourier transformed by the following expression to obtain two frequency domain signals:
wherein ,for picking up pairs by bone conduction microphone elementsThe frequency spectrum of the kth frequency band of the l frame, which should be obtained in the frequency domain signal, is->For picking up the frequency spectrum of the kth frequency band of the correspondingly obtained frequency-domain signal by means of a conventional microphone array, a decision is made as to whether the frequency spectrum of the kth frequency band of the l frame is present in the frequency-domain signal>Is the frame length 512->Is a hamming window of length 512, l is a time frame number, k is a frequency number, R>For the time domain signal picked up by the bone conduction microphone unit>Time domain signals picked up by a conventional microphone array.
In this embodiment, in step S1, the intercepting the two frequency domain signals according to a preset cut-off frequency respectively to obtain two intercepted frequency domain signals, including:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
In this embodiment, for the frequency domain signal obtained by picking up the corresponding bone conduction microphone element, a signal smaller than a preset cut-off frequency is intercepted as a first intercepted frequency domain signal by the following expression:
wherein ,for the first intercepted frequency domain signal>For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>Is the sampling rate->Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
wherein ,for the second truncated frequency-domain signal, <' > H>In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>Is the sampling rate->Is a preset cut-off frequency.
Respectively intercepting the two frequency domain signals according to a preset cut-off frequency, and only keeping the bone conduction signals smaller thanFor easily lowered frequency band signalThe conventional microphone signal disturbed by the frequent noise is only kept above ≥ h>The signal of the frequency band ensures the purity of the signal.
In this embodiment, in step S2, the half-wave rectification is performed on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals, including:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
In this embodiment, the two types of truncated frequency domain signals are respectively subjected to fourier transform by the following expression to obtain two types of truncated time domain signals:
wherein ,for the intercepted time domain signal corresponding to the bone conduction microphone unit, then>For the intercepted time domain signal corresponding to the traditional microphone array, then>For the first intercepted frequency domain signal, is greater than>For the second truncated frequency-domain signal, <' > H>Is the frame length 512->For a Hamming window, l is the time frame number, and k is the frequency number.
In this embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:
wherein ,for the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>The intercepted time domain signals corresponding to the bone conduction microphone elements,
wherein ,is a half-wave rectified time domain signal corresponding to the traditional microphone array>And intercepting the time domain signal corresponding to the traditional microphone array.
Since most of the energy of the speech signal has a clear harmonic structure, the present disclosure enhances adjacent harmonics by performing half-wave rectification.
In this embodiment, in step S3, determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band includes:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
In this embodiment, the two half-wave rectified time domain signals are subjected to fourier transform by the following expression to obtain two half-wave rectified frequency domain signals:
wherein ,for the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>Is a half-wave rectified time domain signal corresponding to the traditional microphone array>Is the frame length 512->A hamming window of length 512, l a time frame number and k a frequency number. />
In this embodiment, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:
wherein ,corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For a smoothing factor between adjacent instants>Preferably 0.96, to ensure sufficient prediction accuracy while avoiding large variations in the fusion factor over the time series.
The method and the device use the preset intermediate frequency band as a reference frequency band for matching the bone conduction microphone signal with the traditional microphone signal, namely, a fusion factor is searched to enable the energy of the bone conduction microphone signal and the traditional microphone signal to be matched as much as possible in the intermediate frequency band, and the integrity of fusion is ensured.
In the present embodiment, the fusion factor is calculated from the two accumulated energies by the following expression:
wherein ,corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Is a fusion factor.
In this embodiment, in step S4, the fusing the two types of intercepted frequency domain signals according to the fusion factor to obtain the bone microphone speech enhancement speech includes:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
In this embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
wherein ,for the fused frequency-domain signal,>for picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>For the first intercepted frequency domain signal>For the second truncated frequency-domain signal, <' > H>Is a fusion factor.
In this embodiment, the fused frequency domain signal is subjected to fourier transform by the following expression to obtain a fused time domain signal:
wherein ,for fused time domain signal, <' >>For the fused frequency-domain signal,>for a frame length 512, <' > based on>For a Hamming window, l is the time frame number and k is the frequency number.
In the embodiment of the disclosure, the larger the energy of the traditional signal is seen from the calculation mode of the fusion factor, the smaller the fusion factor is, otherwise, the higher the fusion factor is, the dynamic fusion factor can ensure that the output final frequency spectrum Z (l, k) is formed by fusing two signals with approximately the same amplitude, and as can be seen from Z (l, k), the bone conduction microphone signal is reserved at low frequency and hardly interfered by background noise, and at medium and high frequency bands, the matched medium and high frequency energy is reserved according to the fusion factor, so that the voice is ensured to contain a complete harmonic structure, and the perception quality of the output voice signal is improved.
The bone conduction microphone voice enhancement method aims at solving the problem that the low frequency of a bone conduction microphone signal is not interfered by environmental noise but lacks effective medium-high frequency signals, and can effectively integrate the low frequency and medium-high frequency voice signals in a mode of fusing with the traditional microphone signal, so that the integrity and the definition of a voice harmonic structure are ensured, and the perception quality is higher.
The bone conduction microphone voice enhancement method aims to calculate dynamic fusion factors according to two-path signal half-wave rectification results and achieve effective fusion of bone conduction microphone signals and traditional microphone signals.
The bone conduction microphone voice enhancement method disclosed by the invention is combined with the traditional microphone signal to repair the related frequency band signal, the auditory perception of picking up voice is ensured, on the one hand, as the bone conduction microphone picks up clear voice signals under a strong noise scene, through the medium-high frequency compensation method disclosed by the invention, the bone conduction microphone voice enhancement method can adapt to a very complex strong interference acoustic environment, the application range is wider, on the other hand, fusion is realized through fusion factors, two paths of signals can be ensured to be matched very in amplitude, the signal mismatch caused by direct addition is avoided, and therefore, the perception quality is higher.
The core of the bone microphone voice enhancement method disclosed by the invention lies in the calculation of fusion factors, the main calculation amount is embodied in the calculation of Fourier, and the Fourier transform has more acceleration means at present, so that the method can be suitable for many head-wearing VR and AR products with strict requirements on power consumption, and the application range is wider.
Referring to fig. 2, an embodiment of the present disclosure provides a bone-conduction microphone speech enhancement device, including:
the intercepting module 21 is configured to acquire two frequency domain signals, and respectively intercept the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, where the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the conventional microphone array;
the rectification module 22 is configured to perform half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;
the determining module 23 is configured to determine a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, where a minimum value of the intermediate frequency band is the cutoff frequency;
and the fusion module 24 is configured to fuse the two intercepted frequency domain signals according to the fusion factor to obtain a bone microphone speech enhancement speech.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
In the second embodiment, any plurality of the intercepting module 21, the rectifying module 22, the determining module 23 and the fusing module 24 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. At least one of intercept module 21, rectifier module 22, determination module 23, and fusion module 24 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the parsing module 11, the determining module 12 and the synchronizing module 13 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Referring to fig. 3, an electronic device provided by an embodiment of the present disclosure includes a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140, where the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140;
a memory 1130 for storing computer programs;
the processor 1110, when executing the program stored in the memory 1130, implements the bone microphone speech enhancement method as follows:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
The communication bus 1140 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 1120 is used for communication between the electronic device and other devices.
The Memory 1130 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 1130 may also be at least one memory device located remotely from the processor 1110.
The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Embodiments of the present disclosure also provide a computer-readable storage medium. The above computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the bone microphone speech enhancement method as described above.
The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The computer readable storage medium carries one or more programs which, when executed, implement a method for bone conduction microphone speech enhancement according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (14)
1. A method for bone-conduction microphone speech enhancement, the method comprising:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
2. The method of claim 1, wherein the obtaining two frequency domain signals comprises:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone element and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
3. The method according to claim 1, wherein the respectively truncating the two kinds of frequency domain signals according to a preset cut-off frequency to obtain two kinds of truncated frequency domain signals comprises:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
4. The method according to claim 3, characterized in that for the frequency domain signals obtained by picking up the corresponding bone conduction microphone elements, signals smaller than a preset cut-off frequency are intercepted as first intercepted frequency domain signals by the following expressions:
wherein ,for the first intercepted frequency domain signal, is greater than>For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>Is the sampling rate->Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
wherein ,for the second intercepted frequency domain signal>In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>Is the sampling rate->Is a preset cut-off frequency.
5. The method of claim 1, wherein the half-wave rectifying the two truncated frequency domain signals to obtain two half-wave rectified time domain signals respectively comprises:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
6. The method of claim 5, wherein the two truncated time domain signals are half-wave rectified respectively by the following expression to obtain two half-wave rectified time domain signals:
wherein ,for the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>The intercepted time domain signals corresponding to the bone conduction microphone elements,
7. The method of claim 1, wherein determining a fusion factor from the two half-wave rectified time domain signals according to a preset intermediate frequency band comprises:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
8. The method according to claim 7, wherein the predetermined intermediate frequency band is [1500Hz,2000Hz ], and the two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the predetermined intermediate frequency band are calculated by the following expression:
wherein ,the half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>The half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding cumulative energy->For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>Is the smoothing factor between adjacent time instants.
9. The method of claim 7, wherein a fusion factor is calculated from the two cumulative energies by the following expression:
wherein ,the half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding cumulative energy->Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->Is a fusion factor.
10. The method according to claim 1, wherein said fusing the two types of truncated frequency-domain signals according to a fusion factor to obtain an osteoinductive microphone speech-enhanced speech, comprises:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
11. The method according to claim 10, wherein the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
wherein ,is a fused frequency domain signal>For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>For the first intercepted frequency domain signal, is greater than>For the second truncated frequency-domain signal, <' > H>Is a fusion factor.
12. A bone conduction microphone speech enhancement device, comprising:
the intercepting module is used for acquiring two frequency domain signals, respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the traditional microphone array;
the rectification module is used for respectively carrying out half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
the determining module is used for determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and the fusion module is used for fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone microphone voice enhanced voice.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the bone microphone speech enhancement method of any one of claims 1-11 when executing a program stored on a memory.
14. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the bone conduction microphone speech enhancement method of any one of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310054459.9A CN115862656B (en) | 2023-02-03 | 2023-02-03 | Bone-conduction microphone voice enhancement method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310054459.9A CN115862656B (en) | 2023-02-03 | 2023-02-03 | Bone-conduction microphone voice enhancement method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115862656A true CN115862656A (en) | 2023-03-28 |
CN115862656B CN115862656B (en) | 2023-06-02 |
Family
ID=85657487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310054459.9A Active CN115862656B (en) | 2023-02-03 | 2023-02-03 | Bone-conduction microphone voice enhancement method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115862656B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008257110A (en) * | 2007-04-09 | 2008-10-23 | Nippon Telegr & Teleph Corp <Ntt> | Object signal section estimation device, method, and program, and recording medium |
US20120130154A1 (en) * | 2010-11-23 | 2012-05-24 | Richie Sajan | Voice Volume Modulator |
CN110782912A (en) * | 2019-10-10 | 2020-02-11 | 安克创新科技股份有限公司 | Sound source control method and speaker device |
CN112767963A (en) * | 2021-01-28 | 2021-05-07 | 歌尔科技有限公司 | Voice enhancement method, device and system and computer readable storage medium |
CN114360560A (en) * | 2022-01-17 | 2022-04-15 | 随锐科技集团股份有限公司 | Speech enhancement post-processing method and device based on harmonic structure prediction |
-
2023
- 2023-02-03 CN CN202310054459.9A patent/CN115862656B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008257110A (en) * | 2007-04-09 | 2008-10-23 | Nippon Telegr & Teleph Corp <Ntt> | Object signal section estimation device, method, and program, and recording medium |
US20120130154A1 (en) * | 2010-11-23 | 2012-05-24 | Richie Sajan | Voice Volume Modulator |
CN110782912A (en) * | 2019-10-10 | 2020-02-11 | 安克创新科技股份有限公司 | Sound source control method and speaker device |
CN112767963A (en) * | 2021-01-28 | 2021-05-07 | 歌尔科技有限公司 | Voice enhancement method, device and system and computer readable storage medium |
CN114360560A (en) * | 2022-01-17 | 2022-04-15 | 随锐科技集团股份有限公司 | Speech enhancement post-processing method and device based on harmonic structure prediction |
Non-Patent Citations (1)
Title |
---|
罗怡珊,汪源源,王威琪: "骨传导超声助听技术的研究" * |
Also Published As
Publication number | Publication date |
---|---|
CN115862656B (en) | 2023-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111988690B (en) | Earphone wearing state detection method and device and earphone | |
CN110010143B (en) | Voice signal enhancement system, method and storage medium | |
CN106648527A (en) | Volume control method, device and playing equipment | |
US8498429B2 (en) | Acoustic correction apparatus, audio output apparatus, and acoustic correction method | |
CN111988692A (en) | Earphone wearing state detection method and device, earphone and storage medium | |
US20160247518A1 (en) | Apparatus and method for improving a perception of a sound signal | |
JP6204312B2 (en) | Sound collector | |
CN109493883A (en) | A kind of audio time-delay calculation method and apparatus of smart machine and its smart machine | |
CN110956973A (en) | Echo cancellation method and device and intelligent terminal | |
CN113194372A (en) | Earphone control method and device and related components | |
CN111402910B (en) | Method and equipment for eliminating echo | |
CN115862656B (en) | Bone-conduction microphone voice enhancement method, device, equipment and storage medium | |
CN113205824A (en) | Sound signal processing method, device, storage medium, chip and related equipment | |
TW201312551A (en) | Speech enhancement method | |
CN113014460B (en) | Voice processing method, home master control device, voice system and storage medium | |
CN110827845B (en) | Recording method, device, equipment and storage medium | |
CN115410593A (en) | Audio channel selection method, device, equipment and storage medium | |
US11445324B2 (en) | Audio rendering method and apparatus | |
CN114067817A (en) | Bass enhancement method, bass enhancement device, electronic equipment and storage medium | |
US20230352039A1 (en) | Audio signal processing method, electronic device and storage medium | |
CN111757211B (en) | Noise reduction method, terminal device and storage medium | |
US10997984B2 (en) | Sounding device, audio transmission system, and audio analysis method thereof | |
CN107197403A (en) | A kind of terminal audio frequency parameter management method, apparatus and system | |
CN202634674U (en) | Denoising device under the state of listening to music via earphone | |
JP2016127458A (en) | Sound pickup device, program and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |