CN115862656A - Method, device, equipment and storage medium for enhancing bone-conduction microphone voice - Google Patents

Method, device, equipment and storage medium for enhancing bone-conduction microphone voice Download PDF

Info

Publication number
CN115862656A
CN115862656A CN202310054459.9A CN202310054459A CN115862656A CN 115862656 A CN115862656 A CN 115862656A CN 202310054459 A CN202310054459 A CN 202310054459A CN 115862656 A CN115862656 A CN 115862656A
Authority
CN
China
Prior art keywords
frequency domain
domain signals
frequency
intercepted
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310054459.9A
Other languages
Chinese (zh)
Other versions
CN115862656B (en
Inventor
梁山
陶建华
聂帅
李冠君
易江燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202310054459.9A priority Critical patent/CN115862656B/en
Publication of CN115862656A publication Critical patent/CN115862656A/en
Application granted granted Critical
Publication of CN115862656B publication Critical patent/CN115862656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The present disclosure relates to a method and apparatus for enhancing a bone-conduction microphone voice, a device and a storage medium, wherein the method comprises: acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band; and the two intercepted frequency domain signals are fused according to the fusion factor to obtain the speech enhancement speech of the bone conduction microphone, the two frequency domain signals are intercepted and fused firstly according to a preset cut-off frequency serving as a filter bank, and the whole signal is corrected through half-wave rectification, so that the effective suppression of noise is realized, the clear low-frequency signals of the bone conduction microphone can be reserved, and meanwhile, the missing medium and high-frequency information is supplemented to obtain the audio signals with higher perception quality.

Description

Method, device, equipment and storage medium for enhancing bone-conduction microphone voice
Technical Field
The present disclosure relates to the field of speech processing technologies, and in particular, to a method and an apparatus for enhancing speech of a bone-conduction microphone, a device and a storage medium.
Background
Currently, many smart headsets, head-worn VR/AR devices, all integrate a voice interaction microphone. The functions of voice communication, man-machine interaction and the like can be realized by picking up voice signals through the microphone and performing processing technologies such as enhancement, awakening, recognition and the like, and the method is one of key technologies for improving the man-machine interaction efficiency and the voice communication quality. The purity degree of the voice picked up by the microphone or the interference degree of the noise is the key factor affecting the actual interactive experience. Bone conduction microphones are not interfered by environmental noise because they transmit sound waves through human skull, and are currently receiving wide attention from the industry. However, the bone conduction microphone can only pick up signals below 2Khz, and cannot effectively pick up medium-high frequency signals, so that the voice perception is greatly different from real voice.
It is a common way to use bone conduction microphone and traditional microphone to realize voice pick-up and enhancement. The most common approach is to perform voice activity area detection based on bone conduction microphones and use the detection results for traditional microphone voice signal enhancement. Because the bone conduction microphone is not interfered by environmental noise, the voice activity detection is more accurate, and the voice enhancement effect of the traditional microphone can be improved. With the wide application of deep learning, more and more schemes for fusing a deep learning solid bone conduction microphone signal and a traditional microphone signal are adopted. However, deep learning requires extensive data to ensure the effect, and bone conduction speech is difficult to be widely collected, which limits the effect of practical application.
Bone conduction microphones transmit sound waves through the human skull, bone labyrinth, auditory center by converting sound into mechanical vibrations of different frequencies. Compared with a classical sound transmission mode of generating sound waves through a vibrating diaphragm, the bone conduction omits a plurality of sound wave transmission steps, and clear sound restoration can be realized in a noisy environment.
A method for carrying out voice activity detection based on bone conduction microphone signals and guiding traditional microphone signals to be voice enhanced as guide information is to realize noise masking on the traditional microphone signals. Because the noise has strong interference at low frequency in practical application, the masking information directly obtained on the traditional microphone signal has very large distortion, which can affect the quality of voice interaction; the scheme based on deep learning generally faces the problem of insufficient generalization in practical application due to insufficient data training.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a method and an apparatus for enhancing a speech of a bone microphone, a device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for enhancing a bone-conduction microphone voice, the method including:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
In one possible implementation, the acquiring two frequency domain signals includes:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone element and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
In a possible implementation manner, the respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals includes:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
In one possible implementation, for a frequency domain signal obtained by picking up a corresponding by a bone conduction microphone element, a signal smaller than a preset cutoff frequency is intercepted as a first intercepted frequency domain signal by the following expression:
Figure SMS_1
wherein ,
Figure SMS_2
for the first intercepted frequency domain signal, is greater than>
Figure SMS_3
For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure SMS_4
Is the sampling rate->
Figure SMS_5
Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
Figure SMS_6
wherein ,
Figure SMS_7
for the second truncated frequency-domain signal, <' > H>
Figure SMS_8
In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>
Figure SMS_9
Is the sampling rate->
Figure SMS_10
Is a preset cut-off frequency.
In a possible implementation manner, the half-wave rectifying the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals respectively includes:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
In a possible embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:
Figure SMS_11
wherein ,
Figure SMS_12
half-wave rectified time domain signal corresponding to bone conduction microphone elementHorn,. Beta., et>
Figure SMS_13
For the intercepted time domain signal corresponding to the bone conduction microphone unit, then>
Figure SMS_14
wherein ,
Figure SMS_15
is a half-wave rectified time domain signal corresponding to the traditional microphone array>
Figure SMS_16
And intercepting the corresponding intercepted time domain signal of the traditional microphone array.
In a possible implementation manner, the determining a fusion factor according to the two half-wave rectified time domain signals according to the preset intermediate frequency band includes:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
In a possible implementation, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:
Figure SMS_17
wherein ,
Figure SMS_18
corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_19
Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_20
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_21
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_22
Is the smoothing factor between adjacent time instants.
In one possible embodiment, the fusion factor is calculated from the two cumulative energies by the following expression:
Figure SMS_23
wherein ,
Figure SMS_24
corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_25
Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_26
Is a fusion factor.
In a possible embodiment, the fusing the two types of truncated frequency domain signals according to the fusion factor to obtain the bone microphone speech enhanced speech includes:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
In a possible embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
Figure SMS_27
wherein ,
Figure SMS_28
is a fused frequency domain signal>
Figure SMS_29
For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure SMS_30
For the first intercepted frequency domain signal, is greater than>
Figure SMS_31
For the second truncated frequency-domain signal, <' > H>
Figure SMS_32
Is a fusion factor.
In a second aspect, an embodiment of the present disclosure provides a bone-microphone speech enhancement apparatus, including:
the intercepting module is used for acquiring two frequency domain signals, respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the traditional microphone array;
the rectification module is used for respectively carrying out half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
the determining module is used for determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and the fusion module is used for fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone microphone voice enhanced voice.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the bone microphone voice enhancement method when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the bone microphone speech enhancement method described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages:
the method for enhancing the speech of the bone microphone according to the embodiment of the disclosure obtains two frequency domain signals, and respectively intercepts the two frequency domain signals according to a preset cutoff frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone microphone element and a traditional microphone array; respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals; determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency; the two intercepted frequency domain signals are fused according to the fusion factor to obtain the bone conduction microphone voice enhancement voice, the two intercepted frequency domain signals are intercepted and fused firstly according to the preset cut-off frequency serving as a filter bank, the whole signal is corrected through half-wave rectification, effective suppression on noise is achieved, clear low-frequency signals of the bone conduction microphone can be reserved, meanwhile, missing medium and high-frequency information can be supplemented, and audio signals with higher perception quality are obtained.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 schematically illustrates a flowchart of a method for enhancing speech of a bone microphone according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a block diagram of a bone microphone speech enhancement device according to an embodiment of the present disclosure; and
fig. 3 schematically shows a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Referring to fig. 1, an embodiment of the present disclosure provides a method for enhancing a bone-microphone voice, the method including:
the method comprises the following steps of S1, acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
in practical applications, the bone microphone collects sound signals by using slight vibration of bones in the head and neck caused by a person speaking, and the traditional microphone collects sound signals through air conduction.
S2, performing half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;
s3, according to a preset intermediate frequency band, determining a fusion factor according to the two half-wave rectified time domain signals, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and S4, fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhancement voice.
In this embodiment, in step S1, the acquiring two frequency domain signals includes:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone array and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
In this embodiment, the two time domain signals are fourier transformed by the following expression to obtain two frequency domain signals:
Figure SMS_33
Figure SMS_34
wherein ,
Figure SMS_35
for picking up pairs by bone conduction microphone elementsThe frequency spectrum of the kth frequency band of the l frame, which should be obtained in the frequency domain signal, is->
Figure SMS_36
For picking up the frequency spectrum of the kth frequency band of the correspondingly obtained frequency-domain signal by means of a conventional microphone array, a decision is made as to whether the frequency spectrum of the kth frequency band of the l frame is present in the frequency-domain signal>
Figure SMS_37
Is the frame length 512->
Figure SMS_38
Is a hamming window of length 512, l is a time frame number, k is a frequency number, R>
Figure SMS_39
For the time domain signal picked up by the bone conduction microphone unit>
Figure SMS_40
Time domain signals picked up by a conventional microphone array.
In this embodiment, in step S1, the intercepting the two frequency domain signals according to a preset cut-off frequency respectively to obtain two intercepted frequency domain signals, including:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
In this embodiment, for the frequency domain signal obtained by picking up the corresponding bone conduction microphone element, a signal smaller than a preset cut-off frequency is intercepted as a first intercepted frequency domain signal by the following expression:
Figure SMS_41
wherein ,
Figure SMS_42
for the first intercepted frequency domain signal>
Figure SMS_43
For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure SMS_44
Is the sampling rate->
Figure SMS_45
Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
Figure SMS_46
wherein ,
Figure SMS_47
for the second truncated frequency-domain signal, <' > H>
Figure SMS_48
In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>
Figure SMS_49
Is the sampling rate->
Figure SMS_50
Is a preset cut-off frequency.
Respectively intercepting the two frequency domain signals according to a preset cut-off frequency, and only keeping the bone conduction signals smaller than
Figure SMS_51
For easily lowered frequency band signalThe conventional microphone signal disturbed by the frequent noise is only kept above ≥ h>
Figure SMS_52
The signal of the frequency band ensures the purity of the signal.
In this embodiment, in step S2, the half-wave rectification is performed on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals, including:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
In this embodiment, the two types of truncated frequency domain signals are respectively subjected to fourier transform by the following expression to obtain two types of truncated time domain signals:
Figure SMS_53
/>
Figure SMS_54
wherein ,
Figure SMS_55
for the intercepted time domain signal corresponding to the bone conduction microphone unit, then>
Figure SMS_56
For the intercepted time domain signal corresponding to the traditional microphone array, then>
Figure SMS_57
For the first intercepted frequency domain signal, is greater than>
Figure SMS_58
For the second truncated frequency-domain signal, <' > H>
Figure SMS_59
Is the frame length 512->
Figure SMS_60
For a Hamming window, l is the time frame number, and k is the frequency number.
In this embodiment, the two truncated time domain signals are respectively half-wave rectified by the following expression to obtain two half-wave rectified time domain signals:
Figure SMS_61
wherein ,
Figure SMS_62
for the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>
Figure SMS_63
The intercepted time domain signals corresponding to the bone conduction microphone elements,
Figure SMS_64
wherein ,
Figure SMS_65
is a half-wave rectified time domain signal corresponding to the traditional microphone array>
Figure SMS_66
And intercepting the time domain signal corresponding to the traditional microphone array.
Since most of the energy of the speech signal has a clear harmonic structure, the present disclosure enhances adjacent harmonics by performing half-wave rectification.
In this embodiment, in step S3, determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band includes:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
In this embodiment, the two half-wave rectified time domain signals are subjected to fourier transform by the following expression to obtain two half-wave rectified frequency domain signals:
Figure SMS_67
Figure SMS_68
wherein ,
Figure SMS_69
for the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_70
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_71
For the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>
Figure SMS_72
Is a half-wave rectified time domain signal corresponding to the traditional microphone array>
Figure SMS_73
Is the frame length 512->
Figure SMS_74
A hamming window of length 512, l a time frame number and k a frequency number. />
In this embodiment, the preset middle frequency band is [1500hz,2000hz ], and two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the preset middle frequency band are calculated by the following expression:
Figure SMS_75
wherein ,
Figure SMS_76
corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>
Figure SMS_77
Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_78
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_79
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure SMS_80
For a smoothing factor between adjacent instants>
Figure SMS_81
Preferably 0.96, to ensure sufficient prediction accuracy while avoiding large variations in the fusion factor over the time series.
The method and the device use the preset intermediate frequency band as a reference frequency band for matching the bone conduction microphone signal with the traditional microphone signal, namely, a fusion factor is searched to enable the energy of the bone conduction microphone signal and the traditional microphone signal to be matched as much as possible in the intermediate frequency band, and the integrity of fusion is ensured.
In the present embodiment, the fusion factor is calculated from the two accumulated energies by the following expression:
Figure SMS_82
wherein ,
Figure SMS_83
corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_84
Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure SMS_85
Is a fusion factor.
In this embodiment, in step S4, the fusing the two types of intercepted frequency domain signals according to the fusion factor to obtain the bone microphone speech enhancement speech includes:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
In this embodiment, the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
Figure SMS_86
wherein ,
Figure SMS_87
for the fused frequency-domain signal,>
Figure SMS_88
for picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure SMS_89
For the first intercepted frequency domain signal>
Figure SMS_90
For the second truncated frequency-domain signal, <' > H>
Figure SMS_91
Is a fusion factor.
In this embodiment, the fused frequency domain signal is subjected to fourier transform by the following expression to obtain a fused time domain signal:
Figure SMS_92
wherein ,
Figure SMS_93
for fused time domain signal, <' >>
Figure SMS_94
For the fused frequency-domain signal,>
Figure SMS_95
for a frame length 512, <' > based on>
Figure SMS_96
For a Hamming window, l is the time frame number and k is the frequency number.
In the embodiment of the disclosure, the larger the energy of the traditional signal is seen from the calculation mode of the fusion factor, the smaller the fusion factor is, otherwise, the higher the fusion factor is, the dynamic fusion factor can ensure that the output final frequency spectrum Z (l, k) is formed by fusing two signals with approximately the same amplitude, and as can be seen from Z (l, k), the bone conduction microphone signal is reserved at low frequency and hardly interfered by background noise, and at medium and high frequency bands, the matched medium and high frequency energy is reserved according to the fusion factor, so that the voice is ensured to contain a complete harmonic structure, and the perception quality of the output voice signal is improved.
The bone conduction microphone voice enhancement method aims at solving the problem that the low frequency of a bone conduction microphone signal is not interfered by environmental noise but lacks effective medium-high frequency signals, and can effectively integrate the low frequency and medium-high frequency voice signals in a mode of fusing with the traditional microphone signal, so that the integrity and the definition of a voice harmonic structure are ensured, and the perception quality is higher.
The bone conduction microphone voice enhancement method aims to calculate dynamic fusion factors according to two-path signal half-wave rectification results and achieve effective fusion of bone conduction microphone signals and traditional microphone signals.
The bone conduction microphone voice enhancement method disclosed by the invention is combined with the traditional microphone signal to repair the related frequency band signal, the auditory perception of picking up voice is ensured, on the one hand, as the bone conduction microphone picks up clear voice signals under a strong noise scene, through the medium-high frequency compensation method disclosed by the invention, the bone conduction microphone voice enhancement method can adapt to a very complex strong interference acoustic environment, the application range is wider, on the other hand, fusion is realized through fusion factors, two paths of signals can be ensured to be matched very in amplitude, the signal mismatch caused by direct addition is avoided, and therefore, the perception quality is higher.
The core of the bone microphone voice enhancement method disclosed by the invention lies in the calculation of fusion factors, the main calculation amount is embodied in the calculation of Fourier, and the Fourier transform has more acceleration means at present, so that the method can be suitable for many head-wearing VR and AR products with strict requirements on power consumption, and the application range is wider.
Referring to fig. 2, an embodiment of the present disclosure provides a bone-conduction microphone speech enhancement device, including:
the intercepting module 21 is configured to acquire two frequency domain signals, and respectively intercept the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, where the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the conventional microphone array;
the rectification module 22 is configured to perform half-wave rectification on the two intercepted frequency domain signals respectively to obtain two half-wave rectified time domain signals;
the determining module 23 is configured to determine a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, where a minimum value of the intermediate frequency band is the cutoff frequency;
and the fusion module 24 is configured to fuse the two intercepted frequency domain signals according to the fusion factor to obtain a bone microphone speech enhancement speech.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
In the second embodiment, any plurality of the intercepting module 21, the rectifying module 22, the determining module 23 and the fusing module 24 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. At least one of intercept module 21, rectifier module 22, determination module 23, and fusion module 24 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the parsing module 11, the determining module 12 and the synchronizing module 13 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Referring to fig. 3, an electronic device provided by an embodiment of the present disclosure includes a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140, where the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140;
a memory 1130 for storing computer programs;
the processor 1110, when executing the program stored in the memory 1130, implements the bone microphone speech enhancement method as follows:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
The communication bus 1140 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 1120 is used for communication between the electronic device and other devices.
The Memory 1130 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 1130 may also be at least one memory device located remotely from the processor 1110.
The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Embodiments of the present disclosure also provide a computer-readable storage medium. The above computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the bone microphone speech enhancement method as described above.
The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The computer readable storage medium carries one or more programs which, when executed, implement a method for bone conduction microphone speech enhancement according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for bone-conduction microphone speech enhancement, the method comprising:
acquiring two frequency domain signals, and respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up and corresponding the same signal by using the same preset clock and sampling rate through a bone conduction microphone element and a traditional microphone array;
respectively performing half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone-conduction microphone voice enhanced voice.
2. The method of claim 1, wherein the obtaining two frequency domain signals comprises:
acquiring two time domain signals, wherein the two time domain signals are obtained by respectively picking up the same signal by the bone conduction microphone element and the traditional microphone array by adopting the same preset clock and sampling rate;
and carrying out Fourier transform on the two time domain signals to obtain two frequency domain signals.
3. The method according to claim 1, wherein the respectively truncating the two kinds of frequency domain signals according to a preset cut-off frequency to obtain two kinds of truncated frequency domain signals comprises:
for the frequency domain signals correspondingly obtained by the bone conduction microphone element pickup, intercepting signals smaller than a preset cut-off frequency as first intercepted frequency domain signals;
for the frequency domain signals correspondingly obtained by the traditional microphone array pickup, intercepting signals larger than or equal to a preset cut-off frequency as second intercepted frequency domain signals;
and taking the first intercepted frequency domain signal and the second intercepted frequency domain signal as two intercepted frequency domain signals.
4. The method according to claim 3, characterized in that for the frequency domain signals obtained by picking up the corresponding bone conduction microphone elements, signals smaller than a preset cut-off frequency are intercepted as first intercepted frequency domain signals by the following expressions:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
for the first intercepted frequency domain signal, is greater than>
Figure QLYQS_3
For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure QLYQS_4
Is the sampling rate->
Figure QLYQS_5
Is a pre-set cut-off frequency,
for the frequency domain signals correspondingly obtained by the conventional microphone array pickup, intercepting signals greater than or equal to a preset cut-off frequency as second intercepted frequency domain signals by the following expression:
Figure QLYQS_6
wherein ,
Figure QLYQS_7
for the second intercepted frequency domain signal>
Figure QLYQS_8
In order to pick up the spectrum of the kth frequency band of the corresponding l-th frame in the resulting frequency domain signal by means of a conventional microphone array, device for selecting or keeping>
Figure QLYQS_9
Is the sampling rate->
Figure QLYQS_10
Is a preset cut-off frequency.
5. The method of claim 1, wherein the half-wave rectifying the two truncated frequency domain signals to obtain two half-wave rectified time domain signals respectively comprises:
respectively carrying out Fourier transform on the two intercepted frequency domain signals to obtain two intercepted time domain signals;
and respectively carrying out half-wave rectification on the two intercepted time domain signals to obtain two half-wave rectified time domain signals.
6. The method of claim 5, wherein the two truncated time domain signals are half-wave rectified respectively by the following expression to obtain two half-wave rectified time domain signals:
Figure QLYQS_11
wherein ,
Figure QLYQS_12
for the half-wave rectified time domain signal corresponding to the bone conduction microphone unit>
Figure QLYQS_13
The intercepted time domain signals corresponding to the bone conduction microphone elements,
Figure QLYQS_14
wherein ,
Figure QLYQS_15
is a half-wave rectified time domain signal corresponding to the traditional microphone array>
Figure QLYQS_16
And intercepting the time domain signal corresponding to the traditional microphone array.
7. The method of claim 1, wherein determining a fusion factor from the two half-wave rectified time domain signals according to a preset intermediate frequency band comprises:
carrying out Fourier transform on the two half-wave rectified time domain signals to obtain two half-wave rectified frequency domain signals;
calculating two accumulated energies corresponding to the two half-wave rectified frequency domain signals in a preset intermediate frequency band;
and calculating a fusion factor according to the two accumulated energies.
8. The method according to claim 7, wherein the predetermined intermediate frequency band is [1500Hz,2000Hz ], and the two accumulated energies corresponding to the two half-wave rectified frequency domain signals in the predetermined intermediate frequency band are calculated by the following expression:
Figure QLYQS_17
wherein ,
Figure QLYQS_18
the half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding accumulated energy, <' > based on>
Figure QLYQS_19
The half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure QLYQS_20
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure QLYQS_21
For the half-wave rectified frequency domain signal corresponding to the bone microphone unit>
Figure QLYQS_22
Is the smoothing factor between adjacent time instants.
9. The method of claim 7, wherein a fusion factor is calculated from the two cumulative energies by the following expression:
Figure QLYQS_23
wherein ,
Figure QLYQS_24
the half-wave rectified frequency domain signal corresponding to the bone conduction microphone element is at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure QLYQS_25
Corresponding half-wave rectified frequency domain signals of the bone conduction microphone element are at [1500Hz,2000Hz]Corresponding cumulative energy->
Figure QLYQS_26
Is a fusion factor.
10. The method according to claim 1, wherein said fusing the two types of truncated frequency-domain signals according to a fusion factor to obtain an osteoinductive microphone speech-enhanced speech, comprises:
fusing the two intercepted frequency domain signals according to the fusion factor to obtain fused frequency domain signals;
and performing Fourier transform on the fused frequency domain signal to obtain a fused time domain signal which is used as the bone microphone voice enhancement voice.
11. The method according to claim 10, wherein the two truncated frequency domain signals are fused according to a fusion factor by the following expression to obtain a fused frequency domain signal:
Figure QLYQS_27
wherein ,
Figure QLYQS_28
is a fused frequency domain signal>
Figure QLYQS_29
For picking up the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal by the bone conduction microphone element, the frequency spectrum of the kth frequency band of the corresponding obtained frequency domain signal is picked up and is/are selected>
Figure QLYQS_30
For the first intercepted frequency domain signal, is greater than>
Figure QLYQS_31
For the second truncated frequency-domain signal, <' > H>
Figure QLYQS_32
Is a fusion factor.
12. A bone conduction microphone speech enhancement device, comprising:
the intercepting module is used for acquiring two frequency domain signals, respectively intercepting the two frequency domain signals according to a preset cut-off frequency to obtain two intercepted frequency domain signals, wherein the two frequency domain signals are obtained by respectively picking up the same signal by using the same preset clock and sampling rate through the bone conduction microphone array and the traditional microphone array;
the rectification module is used for respectively carrying out half-wave rectification on the two intercepted frequency domain signals to obtain two half-wave rectified time domain signals;
the determining module is used for determining a fusion factor according to the two half-wave rectified time domain signals according to a preset intermediate frequency band, wherein the minimum value of the intermediate frequency band is the cut-off frequency;
and the fusion module is used for fusing the two intercepted frequency domain signals according to the fusion factor to obtain the bone microphone voice enhanced voice.
13. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the bone microphone speech enhancement method of any one of claims 1-11 when executing a program stored on a memory.
14. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the bone conduction microphone speech enhancement method of any one of claims 1-11.
CN202310054459.9A 2023-02-03 2023-02-03 Bone-conduction microphone voice enhancement method, device, equipment and storage medium Active CN115862656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310054459.9A CN115862656B (en) 2023-02-03 2023-02-03 Bone-conduction microphone voice enhancement method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310054459.9A CN115862656B (en) 2023-02-03 2023-02-03 Bone-conduction microphone voice enhancement method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115862656A true CN115862656A (en) 2023-03-28
CN115862656B CN115862656B (en) 2023-06-02

Family

ID=85657487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310054459.9A Active CN115862656B (en) 2023-02-03 2023-02-03 Bone-conduction microphone voice enhancement method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115862656B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008257110A (en) * 2007-04-09 2008-10-23 Nippon Telegr & Teleph Corp <Ntt> Object signal section estimation device, method, and program, and recording medium
US20120130154A1 (en) * 2010-11-23 2012-05-24 Richie Sajan Voice Volume Modulator
CN110782912A (en) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 Sound source control method and speaker device
CN112767963A (en) * 2021-01-28 2021-05-07 歌尔科技有限公司 Voice enhancement method, device and system and computer readable storage medium
CN114360560A (en) * 2022-01-17 2022-04-15 随锐科技集团股份有限公司 Speech enhancement post-processing method and device based on harmonic structure prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008257110A (en) * 2007-04-09 2008-10-23 Nippon Telegr & Teleph Corp <Ntt> Object signal section estimation device, method, and program, and recording medium
US20120130154A1 (en) * 2010-11-23 2012-05-24 Richie Sajan Voice Volume Modulator
CN110782912A (en) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 Sound source control method and speaker device
CN112767963A (en) * 2021-01-28 2021-05-07 歌尔科技有限公司 Voice enhancement method, device and system and computer readable storage medium
CN114360560A (en) * 2022-01-17 2022-04-15 随锐科技集团股份有限公司 Speech enhancement post-processing method and device based on harmonic structure prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗怡珊,汪源源,王威琪: "骨传导超声助听技术的研究" *

Also Published As

Publication number Publication date
CN115862656B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN111988690B (en) Earphone wearing state detection method and device and earphone
CN110010143B (en) Voice signal enhancement system, method and storage medium
CN106648527A (en) Volume control method, device and playing equipment
US8498429B2 (en) Acoustic correction apparatus, audio output apparatus, and acoustic correction method
CN111988692A (en) Earphone wearing state detection method and device, earphone and storage medium
US20160247518A1 (en) Apparatus and method for improving a perception of a sound signal
JP6204312B2 (en) Sound collector
CN109493883A (en) A kind of audio time-delay calculation method and apparatus of smart machine and its smart machine
CN110956973A (en) Echo cancellation method and device and intelligent terminal
CN113194372A (en) Earphone control method and device and related components
CN111402910B (en) Method and equipment for eliminating echo
CN115862656B (en) Bone-conduction microphone voice enhancement method, device, equipment and storage medium
CN113205824A (en) Sound signal processing method, device, storage medium, chip and related equipment
TW201312551A (en) Speech enhancement method
CN113014460B (en) Voice processing method, home master control device, voice system and storage medium
CN110827845B (en) Recording method, device, equipment and storage medium
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
US11445324B2 (en) Audio rendering method and apparatus
CN114067817A (en) Bass enhancement method, bass enhancement device, electronic equipment and storage medium
US20230352039A1 (en) Audio signal processing method, electronic device and storage medium
CN111757211B (en) Noise reduction method, terminal device and storage medium
US10997984B2 (en) Sounding device, audio transmission system, and audio analysis method thereof
CN107197403A (en) A kind of terminal audio frequency parameter management method, apparatus and system
CN202634674U (en) Denoising device under the state of listening to music via earphone
JP2016127458A (en) Sound pickup device, program and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant