CN116721657A - Head wearing device for sound enhanced recording - Google Patents

Head wearing device for sound enhanced recording Download PDF

Info

Publication number
CN116721657A
CN116721657A CN202310627833.XA CN202310627833A CN116721657A CN 116721657 A CN116721657 A CN 116721657A CN 202310627833 A CN202310627833 A CN 202310627833A CN 116721657 A CN116721657 A CN 116721657A
Authority
CN
China
Prior art keywords
signal
sound signal
sound
head
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310627833.XA
Other languages
Chinese (zh)
Inventor
周玉军
刘志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Oriole Intelligent Technology Co ltd
Original Assignee
Shenzhen Oriole Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Oriole Intelligent Technology Co ltd filed Critical Shenzhen Oriole Intelligent Technology Co ltd
Priority to CN202310627833.XA priority Critical patent/CN116721657A/en
Publication of CN116721657A publication Critical patent/CN116721657A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C11/00Non-optical adjuncts; Attachment thereof
    • G02C11/10Electronic devices other than hearing aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a head wearing device for sound enhancement recording, which comprises a front part, a side part and a signal synthesis module. The front part is provided with two forward microphone modules for recording the own speaking voice of the user and the speaking voice of the opposite speaker of the user, and the side part is provided with two lateral microphone modules for recording the environmental noise. The signal synthesis module carries out enhancement processing on the two forward sound signals to obtain enhanced forward sound signals, and generates masking thresholds of the enhanced forward sound signals on different frequency bands by utilizing each forward sound signal and each lateral sound signal; and carrying out noise reduction processing on the enhanced forward sound signal by utilizing the masking threshold value to obtain an output voice signal. The invention can effectively separate and reduce the noise of the user and the opposite speaker, and improve the recording quality of the sound.

Description

Head wearing device for sound enhanced recording
Technical Field
The present invention relates to a sound recording device, and more particularly to a device with a sound enhancement recording function, such as glasses with a sound receiving/reproducing function, suitable for wearing on the head.
Background
Head wear devices with sound recording function are not uncommon. For portability, there are known recording elements integrated in various portable items, such as watches, cell phones, headphones, helmets, glasses, etc. The solution in which the sound recording element is integrated in the glasses has good portability and controllability of the recording direction. Glasses for wireless communication are proposed, for example, in US patent publication No. US7792552B 2. As shown in fig. 1, it comprises a front frame 1 and two bendable arms, a left arm 2 and a right arm 3, located on both sides of the front frame 1. The left support arm 2 and the right support arm 3 are folded together with the front frame 1 after being bent. The front frame is used for holding two lenses. A microphone 4 is provided in the front frame 1 at the middle of the two lenses, which microphone 4 is used for recording sound. However, the glasses cannot effectively remove environmental noise, which makes the recording quality low, and is inconvenient to process sound to improve the quality of sound, and is also unfavorable for subsequent advanced operations (such as voice recognition, etc.).
As shown in fig. 2, chinese patent application publication No. CN114339524a proposes a head-mounted device including two microphones at the front and two microphones at the sides. However, the microphone is positioned for recording left and right stereo sound, and is not used for noise reduction or speech enhancement.
Disclosure of Invention
The invention aims to solve the problems that the existing head wearing device can not effectively remove environmental noise and can not effectively improve the recording sound quality during reception.
In order to solve the above technical problems, the present invention provides a head wearing device for sound enhancement recording, comprising a front portion and a side portion, when the head wearing device is worn on the head of a user, the front portion faces the front of the face of the user, the side portion is located at two ends of the front portion and faces two sides of the face of the user respectively, a first forward microphone module and a second forward microphone module are arranged in the front portion, and the two forward microphone modules are used for recording the own speaking voice of the user and the speaking voice of the opposite speaker of the user; the two sides of the side face part are respectively provided with a first side microphone module and a second microphone module, and the two side microphone modules are used for recording environmental noise; the head wearing device further comprises a signal synthesis module for: acquiring a first forward sound signal and a second forward sound signal generated by a first forward microphone module and a second forward microphone module, and a first side sound signal and a second side sound signal generated by a first side microphone module and a second side microphone module; performing enhancement processing on the first forward sound signal and the second forward sound signal in combination to obtain an enhanced forward sound signal; generating masking thresholds of the enhanced forward sound signal on different frequency bands using each forward sound signal and each side sound signal; and carrying out noise reduction processing on the enhanced forward sound signal by utilizing the masking threshold value to obtain an output voice signal.
According to a preferred embodiment of the present invention, the first and second forward microphone modules are located at the same horizontal plane when the head-worn device is worn on the head of a user; and the step of combining the first forward sound signal and the second forward sound signal for enhancement processing to obtain an enhanced forward sound signal comprises: adding and dividing the sound signals generated by the first forward microphone module and the second forward microphone module by two to obtain a signal d (n), subtracting and dividing by two to obtain a signal x (n), wherein n represents a time sequence number; processing the signal x (n) by an adaptive filter to obtain a signal y (n); the enhanced output signal e (n) obtained by subtracting the signal y (n) from the signal d (n) is used as an enhanced forward sound signal, and the parameter updating module which inputs the signal e (n) to the adaptive filter is used for updating the parameters of the adaptive filter.
According to a preferred embodiment of the present invention, the step of generating masking thresholds of the enhanced forward sound signal on different frequency bands using each forward sound signal and each side sound signal comprises:
<1> time-frequency-decomposing each of the forward sound signals and each of the side sound signals, and converting from a time-domain signal to a frequency signal;
<2> adding time-frequency units of the same frequency band of the first and second forward sound signals to obtain each time-frequency unit of the forward sound signal; adding the time-frequency units of the same frequency band of the first lateral sound signal and the second lateral sound signal to obtain each time-frequency unit of the lateral sound signal;
<3> performing time-frequency compensation for each time-frequency unit of the side-direction sound signal;
<4> the IID value and the ITD value are calculated for each time-frequency unit of the forward sound signal and the side sound signal, thereby generating the preliminary masking threshold of the enhanced forward sound signal.
According to a preferred embodiment of the present invention, the first and second forward sound signals are subjected to equalization processing before adding time-frequency units of the same frequency band of the first and second forward sound signals so that energy values of the first and second forward sound signals in each frequency band tend to coincide.
According to a preferred embodiment of the present invention, the step of performing noise reduction processing on the enhanced forward sound signal using the masking threshold to obtain an output speech signal includes:
<1> performing voice activation detection on the forward sound signal;
<2> obtaining a final masking threshold of the enhanced forward sound signal based on the voice activation detection result and the preliminary masking threshold;
<3> smoothing the final masking threshold and masking the enhanced speech signal using the smoothed final masking threshold;
<4> each time-frequency unit of the enhanced speech signal subjected to the masking processing is converted into the time domain, resulting in an output speech signal.
According to a preferred embodiment of the present invention, the step of performing time-frequency compensation on each time-frequency unit of the side-direction sound signal includes: and updating the time-frequency compensation parameters according to the voice detection result.
According to a preferred embodiment of the present invention, the head wearing device is a pair of glasses, the front portion of which is a front frame for fixing lenses, and the side portion of which is two temples located on both sides of the front frame;
the first forward microphone module and the second forward microphone module are positioned in the middle of the front frame, and the first lateral microphone module, the second lateral microphone module and the signal synthesis module are positioned in the glasses legs.
According to a preferred embodiment of the present invention, the head wearing apparatus further includes a sound playing module for playing sound to a user according to the output signal.
According to a preferred embodiment of the present invention, the head wearing apparatus further comprises a voice recognition module for performing voice recognition on the output signal to convert the sound signal into text information.
According to a preferred embodiment of the present invention, the head wearing apparatus further includes an information display module for displaying the text information.
According to a preferred embodiment of the present invention, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and send the text information to the information display module for display.
According to a preferred embodiment of the present invention, the information display module is a retina projection device for projecting the text information onto the retina of the user.
According to a preferred embodiment of the present invention, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and then convert the converted text information into a sound signal, and send the sound signal to the sound playing module for playing.
The invention can effectively separate and reduce the noise of the user and the opposite speaker, and improve the recording quality of the sound.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects achieved more clear, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted, however, that the drawings described below are merely illustrative of exemplary embodiments of the present invention and that other embodiments of the drawings may be derived from these drawings by those skilled in the art without undue effort.
FIG. 1 is a schematic diagram of a prior art wireless communication eyewear;
fig. 2 is a schematic structural view of a head-cutting device provided with four microphones in the prior art;
FIG. 3 is a schematic diagram of an NLMS algorithm employed by one embodiment of the present invention to obtain an enhanced forward sound signal;
FIG. 4 is a front view of a head wearable device with sound enhancement recording capabilities in accordance with an embodiment of the present invention;
FIG. 5 is a schematic view showing a state in which a head-wearing apparatus having a sound-enhancement recording function of an embodiment of the present invention is worn on a user's head;
FIG. 6 is a schematic side view of a head-mounted device with sound collection capabilities in accordance with an embodiment of the present invention;
fig. 7 is a flow diagram of one embodiment of a signal synthesis module according to the present invention generating an output speech signal.
Fig. 8 is a block diagram of a radio receiver according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown, although the exemplary embodiments may be practiced in various specific ways. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, capabilities, effects, or other features described in a particular embodiment may be incorporated in one or more other embodiments in any suitable manner without departing from the spirit of the present invention.
In describing particular embodiments, specific details of construction, performance, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by those skilled in the art. It is not excluded, however, that one skilled in the art may implement the present invention in a particular situation in a solution that does not include the structures, properties, effects, or other characteristics described above.
The flow diagrams in the figures are merely exemplary flow illustrations and do not represent that all of the elements, operations, and steps in the flow diagrams must be included in the aspects of the invention, nor that the steps must be performed in the order shown in the figures. For example, some operations/steps in the flowcharts may be decomposed, some operations/steps may be combined or partially combined, etc., and the order of execution shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or," "and/or" is meant to include all combinations of any one or more of the items listed.
The invention provides a head wearing device for sound enhancement recording. The term head-worn device as used herein refers to any device that can be worn on the head of a human being and is not limited to the type of device. However, in order to implement the sound recording method of the present invention, the head wearing device in the present invention must have a front face portion for setting the microphone, which faces the front of the user's face, and side face portions, which face both sides of the user's face, when the head wearing device is worn on the user's head. Therefore, it can also be said that the side portions are located at both ends of the front portion. Thus, when the device is worn by a user, the front portion and the side portions at both ends of the front portion enclose the front and both sides of the user's face. Thus, typical examples of the head wearing device in the present invention are frame glasses, VR glasses, a hood, a helmet, a headband, and a hat, an earphone, and the like having portions on the front and side of the head.
According to the invention, two microphone modules are arranged in the front part, which means a microphone function module with a sound-receiving element and a signal-preprocessing element. These two microphones are referred to herein as a first forward microphone module and a second forward microphone module, respectively. The two forward microphone modules may be located at the same horizontal plane when the head-worn device is worn on the head of a user, and are primarily intended to capture the user's own speech and the user's speech of the opposite speaker. For example, when the head-wearing device is eyeglasses, the two forward microphones are located at the middle position of the front frame of the eyeglasses.
According to the invention, two microphone modules are also provided in the side part, which are referred to herein as a first side microphone module and a second side microphone module, respectively. The two lateral microphone modules may also be located at the same level when the head-worn device is worn on the head of a user, which is primarily intended to capture ambient noise.
In the present invention, the two side microphones are preferably located on both sides of the face of the user, respectively, whereby the two side microphones are located on the side portions of the different ends of the front portion, respectively. For example, when the head-wearing device is glasses, the two lateral microphones are respectively located on the different two temples.
According to the present invention, the head wearing device further includes a signal synthesizing module for acquiring sound signals recorded by the first and second forward microphone modules, i.e., the first and second forward sound signals, and simultaneously acquiring sound signals generated by the first and second side microphone modules, i.e., the first and second side sound signals. Meanwhile, the signal synthesis module performs enhancement processing on the combination of the first forward sound signal and the second forward sound signal to increase the signal in the opposite direction of the user and eliminate noise in other directions, so that the obtained signal is also called an enhanced forward sound signal. Then, the signal synthesis module also utilizes the lateral sound signals to filter the environmental noise, specifically, utilizes each forward sound signal and each lateral sound signal to generate masking thresholds of the enhanced forward sound signals on different frequency bands; and carrying out noise reduction processing on the enhanced forward sound signal by utilizing the masking threshold value to obtain an output voice signal. Thus, the invention can clearly record the sound in front of the wearer wearing the head wearing device of the invention, and the lateral and surrounding environmental noise is reduced. Therefore, for glasses, helmets and the like with the recording function, the invention can improve the sound recording quality and the user experience. The signal synthesis module is formed by elements with signal processing functions, such as a singlechip, a DSP, an FPGA and the like.
More preferably, the signal synthesis module of the present invention may use classical NLMS (normalized least mean square adaptive filter) algorithm for speech enhancement for the first and second forward sound signals.
Fig. 3 is a schematic diagram of an NLMS algorithm employed by an embodiment of the present invention to obtain an enhanced forward sound signal. As shown in fig. 3, the sound signals (time-domain sampling signals) obtained by the first and second forward microphone modules are added and divided by two to obtain a signal d (n), and the sound signals (time-domain sampling signals) obtained by the first and second forward microphone modules are subtracted and divided by two to obtain a signal x (n), and the signal x (n) is processed by an M-order adaptive filter to obtain a signal y (n). The signal d (n) minus the signal y (n) yields an enhanced output signal e (n) (enhanced forward sound signal of the signal synthesis module) whileThe signal e (n) is input to the parameter updating module of the adaptive filter to update the parameters of the M-order adaptive filter. In the figure, n represents the time sequence number, z -1 Represents a delay unit, M is the order of the filter, W 1 、W 2 、…W M M delay signals respectively representing parameters of M-order filter, x (n), x (n-1), x (n-2), … … and x (n-M+1) signals x (n) are respectively marked as x 1 (n)、x 2 (n)、……、x M (n). In this embodiment, the adaptive filter may be an NLMS (normalized least mean square error) filter, and the algorithm adopted by the parameter updating module of the adaptive filter is an NLMS algorithm. As one embodiment, the step of generating the masking threshold of the enhanced forward sound signal on different frequency bands by the signal synthesis module using each forward sound signal and each side sound signal includes:
(1) Performing time-frequency decomposition on each forward sound signal and each side sound signal, and converting the time-domain signals into frequency signals;
(2) Adding the time-frequency units of the same frequency band of the first forward sound signal and the second forward sound signal to obtain each time-frequency unit of the forward sound signal; and adding the time-frequency units of the same frequency band of the first lateral sound signal and the second lateral sound signal to obtain each time-frequency unit of the lateral sound signal. And, before adding the time-frequency units of the same frequency band of the first and second forward sound signals, it is preferable to equalize the first and second forward sound signals
(3) And performing time-frequency compensation on each time-frequency unit of the lateral sound signal, and preferably updating the parameters of the time-frequency compensation according to the voice detection result.
(4) The IID value and the ITD value are calculated for each time-frequency unit of the forward sound signal and the side sound signal, thereby generating a preliminary masking threshold for the enhanced forward sound signal.
According to a preferred embodiment of the present invention, the step of performing noise reduction processing on the enhanced forward sound signal using the masking threshold to obtain an output speech signal includes:
(1) Performing voice activation detection on the forward sound signal;
(2) Obtaining a final masking threshold of the enhanced forward sound signal according to the voice activation detection result and the preliminary masking threshold;
(3) Smoothing the final masking threshold, and masking the enhanced speech signal by using the smoothed final masking threshold;
(4) And converting each time-frequency unit of the enhanced voice signal subjected to masking processing into a time domain to obtain an output voice signal.
Fig. 4 is a front view of a head mounted device for sound enhancement recording in accordance with an embodiment of the present invention. As shown in fig. 4, the head wearing device of this embodiment is an eyeglass comprising two temples, a first temple 1 and a second temple 2, located on both sides of a front frame 3, wherein the front frame 3 is used to fix two lenses, and a first forward microphone module mic1 and a second forward microphone module mic2 located between the two lenses. The first side microphone module mic3 and the second side microphone module mic4 are respectively arranged in the first glasses leg 1 and the second glasses leg 2. The signal synthesizing module 5 in this embodiment is located in the second earpiece 2 (not shown in fig. 4).
Fig. 5 is a schematic view showing a state in which the head-wearing apparatus of the embodiment of the present invention is worn on the head of a user. As shown in fig. 5, when the eyeglass as the head wearing device is worn on the head of the user, the first and second forward microphone modules mic1 and mic2 are located in the middle of the eyeglass front frame 3, and thus are located directly above the bridge of the nose. And the first forward microphone module mic1 and the second forward microphone module mic2 are positioned on the same horizontal plane, namely, the connecting line of the first forward microphone module mic1 and the second forward microphone module mic2 is a horizontal line. And the first lateral microphone module mic3 and the second lateral microphone module mic4 are located in the first earpiece 1 and the second earpiece 2, respectively.
In the glasses of this embodiment, a sound reproduction function is added to the glasses, that is, a sound is reproduced to the user. Thus, the head-wearing device of this embodiment of the invention further includes a sound playing module. The sound source to be played may in principle be any sound source, but the present invention is preferably to receive as the sound source signal the output speech signal obtained by subjecting the forward sound signal to noise reduction processing as described above, as this is particularly useful for people with hearing impairment.
Fig. 6 is a schematic side view of eyeglasses as a head-mounted device in accordance with an embodiment of the present invention. As shown in fig. 6, the signal synthesis module 4 is disposed in the second earpiece 2, the sound playing module 6 is disposed on and fixed to the two earpieces, and the sound playing module 6 is located at a portion of the earpiece near the ear of the user, which may be, for example, a bone conduction hearing aid element, and is electrically connected to the signal synthesis module 5 to receive the output signal. It should be appreciated that the present invention is not limited to a particular type of sound playing device, and any sound playing element suitable for playing the processed output signal may be utilized to implement the solution of the present invention and achieve the corresponding advantageous effects.
As further shown in fig. 6, the head wearing apparatus of the present invention further preferably includes a voice recognition module 7 for performing voice recognition on the output signal to convert the sound signal into text information. Particularly, for the occasions such as frame glasses, VR glasses, helmets and the like which are suitable for placing display elements in front of eyes of users, the invention has a preferable scheme that the voice recognition is carried out on output signals, and word information after the voice recognition can be utilized in various application scenes.
The application scene is still an application scene for people with hearing impairment, and for users with hearing impairment or almost hearing impairment, the method of converting the voice of the opposite speaker into text information and displaying the text information is a scheme which has to be adopted, and is the most intuitive way of better solving the real-time understanding of the words of the opposite speaker. In this case, the glasses of the above embodiments further include an information display module for displaying the text information. In one embodiment shown in fig. 7, the information display module includes a light source 8, a projection element 9, and a special lens that is displayed to accept the projection of the projection element 9. Thus, the information display module is a retina projection device consisting of a source 8, a projection element 9 and a lens, which can project the text information onto the retina of the user. After the user wears the glasses, the words of the opposite speaker converted by the voice recognition module 7 are displayed in real time in front of the eyes of the user. In this embodiment, the projection element 9 receives the converted phonetic text information from the speech recognition module 7.
It will be appreciated that the invention is not limited to the type of information display module provided, and that existing components suitable for displaying images in front of the eyes of a user may be employed.
Another application scenario is real-time translation. At this time, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and send the text information to the information display module for display. If the user is not in the same language as the opposite speaker, the present invention may convert the recognized spoken words of the opposite speaker into words in the language specified by the user. For example, after the user wears the glasses of the present invention, the language words converted from the foreign language, dialect, etc. uttered by the opposite speaker to the familiar language words can be displayed in real time in front of the user. Besides the converted language characters can be displayed in front of eyes of a user through retina projection, the converted language characters can be converted into voice, and then sent to a sound playing module to be played for the user to listen. The text-to-speech function may be implemented in the speech recognition module or by a separate module.
Fig. 7 is a flow diagram of one embodiment of a signal synthesis module according to the present invention generating an output speech signal.
As shown in fig. 7, first, the signal synthesis module performs time-frequency decomposition on the first forward sound signal and the second forward sound signal generated by the first forward microphone module and the second forward microphone module, and the first side sound signal and the second side sound signal generated by the first side microphone module and the second side microphone module, so as to convert four paths of sound signals from a time domain to a frequency domain, and obtain four paths of frequency signals, wherein the four paths of frequency signals are divided into time-frequency units according to frames and frequency bands.That is, the frequency signal of the first forward sound signal acquired by the first forward microphone module mic1 in the jth frequency band of the ith frame is denoted as a time-frequency unit and is denoted as h 1 (i, j); similarly, the time-frequency unit of the second forward sound signal in the jth frequency band of the ith frame is denoted as h 2 (i, j) the time-frequency unit of the first side-facing sound signal in the j-th frequency band of the i-th frame is denoted as h 3 (i, j) the time-frequency unit of the second side-to-side sound signal in the j-th frequency band of the i-th frame is denoted as h 4 (i,j)。
Next, the time-frequency units h of the two forward sound signals 1 (i,j)、h 2 Each frequency band of (i, j) is first frequency band equalized, which is a guaranteed time-frequency unit h, and then added to each other 1 (i,j)、h 2 (i, j) the energy values in each band tend to be uniform. I.e. h m (i,j)=h 1 (i,j)+h 2 (i, j) b (i, j), wherein h m (i, j) is a synthesized forward sound signal, and b (i, j) is a band equalization coefficient.
Time-frequency unit h for two side-to-side sound signals 3 (i,j)、h 4 The time-frequency units of each frequency band of (i, j) are also added, i.e. h p (i,j)=h 3 (i,j)+h 4 (i,j),h p (i, j) is a synthesized lateral sound signal. Then, for the synthesized lateral sound signal h p Time-frequency compensation h by time-frequency unit of (i, j) g (i,j)=h p (i,j)*g(i,j),h g (i, j) is a time-frequency compensated side-to-side sound signal, and g (i, j) is a time-frequency compensation parameter. The time-frequency compensation parameter g (i, j) can be updated according to the result of the subsequent Voice Activation Detection (VAD).
Next, according to the two-way processed time-frequency unit, the synthesized side sound signal h of the time-frequency compensation g (i, j) and the synthesized forward sound signal h m (i, j) to calculate IID and ITD values. The IID and ITD are calculated using time-frequency units of the time-frequency compensated synthesized lateral signal and the synthesized forward signal. The masking threshold ITD, interaural Time Differences, otherwise known as the two-channel time difference, is then generated based on the IID and the threshold of the ITD, which can be simply understood as the time difference between the arrival of the sound between the ears. IID, i.e. Interaural Intensity DifferenThe two-channel energy difference, or "two-channel energy difference", is simply understood to be the difference in intensity of the sound between the two ears. In this embodiment, IID and ITD for the time-frequency unit of the jth band of the ith frame are denoted as the two-channel energy difference IID (i, j) and the two-channel time difference ITD (i, j), respectively. At the same time, for the synthesized forward sound signal h m The time-frequency unit of (i, j) performs Voice Activation Detection (VAD).
Specifically, the invention can generate three preliminary masking thresholds, and select one masking threshold as the optimal masking threshold G (i, j), i.e., the final masking threshold, based on the comprehensive judgment result. First, an initial masking threshold can be obtained according to a preset masking threshold based on a two-channel energy difference IID (i, j) and a two-channel time difference ITD (i, j) of a jth frequency band of an ith frame, and the first masking threshold can be marked as G 1 (i, j). Second, a steady-state noise reduction mode can be utilized, and a frequency band spectral subtraction is adopted to obtain a second masking threshold, and the second masking threshold can be marked as G 2 (i, j); the speech classification result obtained by the VAD can then be used to take different masking thresholds, and a third masking threshold can be determined, which can be denoted as G 3 (i,j)。
In particular embodiments, in determining the forward sound signal h m When the voice classification result is noise signal, the voice masking threshold G (i, j) can be directly determined as the third masking threshold G 3 (i, j); in determining the forward sound signal h m (i, j) when the speech classification result is speech signal, it is necessary to compare the first masking threshold G 1 (i, j) and a second masking threshold G 2 (i, j), and determining an optimal speech masking threshold based on the comparison result, which may be that a value of the first masking threshold and the second masking threshold, which is the smallest value, is selected as G (i, j).
Finally, smoothing the final masking threshold and synthesizing the final masking threshold with the synthesized forward sound signal h m (i, j) obtaining a noise-reduced time-frequency unit together, and converting the noise-reduced time-frequency unit into a time domain signal to obtain a final output voice signal. The enhanced forward sound signal contains the sound of the wearer and the opposite speaker. The four microphones output only the sound of the wearer through the calculation of the above-described embodiments of the present invention.
Those skilled in the art will appreciate that all or part of the steps implementing the above-described embodiments are implemented as a program, i.e., a computer program, executed by a data processing apparatus (including a computer). The above-described method provided by the present invention can be implemented when the computer program is executed. Moreover, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, for example, a magnetic disk or a tape storage array. The storage medium is not limited to a centralized storage, but may be a distributed storage, such as cloud storage based on cloud computing.
The following describes apparatus embodiments of the invention that may be used to perform method embodiments of the invention. Details described in the embodiments of the device according to the invention should be regarded as additions to the embodiments of the method described above; for details not disclosed in the embodiments of the device according to the invention, reference may be made to the above-described method embodiments.
Fig. 8 is a block diagram of a radio receiver according to an embodiment of the present invention. As shown in fig. 8, the signal synthesizing module acquires sound signals generated by the first, second, third and fourth microphone modules and processes the signals to generate final processed sound signals as output signals, which can enhance the voice of the opposite speaker, and eliminate and attenuate the environmental noise. The voice recognition module is connected to the signal synthesis module to acquire output voice signals and then carries out voice recognition, and character information obtained after recognition can be directly sent to the retina projection module for display, and can also be sent to the voice playing module for playing after processing of converting characters into voices. Optionally, the voice recognition module is further configured to convert the text information obtained by recognition into text information of a user finger language, and send the text information to the retina projection module or voice, or send the text information to the voice playing module for playing after processing the text to the voice.
It will be appreciated by those skilled in the art that the modules in the above embodiments may be distributed in a device as described, or may be distributed in one or more devices different from the above embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules. For example, in the above embodiment, the text translation and text-to-speech functions of the speech recognition module may be implemented together, or may be implemented by a sub-module thereof, or may be implemented by another module independent of the speech recognition module.
The above-described specific embodiments further describe the objects, technical solutions and advantageous effects of the present invention in detail, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic apparatus, and various general-purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (13)

1. A head wearing apparatus for sound enhancement recording, comprising a front portion and side portions, the front portion facing the front of the user's face when the head wearing apparatus is worn on the user's head, the side portions being located at both ends of the front portion, respectively facing both sides of the user's face, characterized in that:
a first forward microphone module and a second forward microphone module are arranged in the front part, and the two forward microphone modules are used for recording the own speaking voice of the user and the speaking voice of the user to the opposite speaker;
the two sides of the side face part are respectively provided with a first side microphone module and a second microphone module, and the two side microphone modules are used for recording environmental noise;
the head wearing device further comprises a signal synthesis module for: acquiring a first forward sound signal and a second forward sound signal generated by a first forward microphone module and a second forward microphone module, and a first side sound signal and a second side sound signal generated by a first side microphone module and a second side microphone module; performing enhancement processing on the first forward sound signal and the second forward sound signal in combination to obtain an enhanced forward sound signal; generating masking thresholds of the enhanced forward sound signal on different frequency bands using each forward sound signal and each side sound signal; and carrying out noise reduction processing on the enhanced forward sound signal by utilizing the masking threshold value to obtain an output voice signal.
2. The head-worn device for sound enhancement recording as in claim 1, wherein:
the first and second forward microphone modules are positioned in a same horizontal plane when the head-worn device is worn on the head of a user; and is also provided with
The step of combining the first forward sound signal and the second forward sound signal for enhancement processing to obtain an enhanced forward sound signal comprises:
adding and dividing the sound signals generated by the first forward microphone module and the second forward microphone module by two to obtain a signal d (n), subtracting and dividing by two to obtain a signal x (n), wherein n represents a time sequence number;
processing the signal x (n) by an adaptive filter to obtain a signal y (n);
the enhanced output signal e (n) obtained by subtracting the signal y (n) from the signal d (n) is used as an enhanced forward sound signal, and the parameter updating module which inputs the signal e (n) to the adaptive filter is used for updating the parameters of the adaptive filter.
3. The head-worn device for sound enhancement recording as in claim 1, wherein: the step of generating masking thresholds for the enhanced forward sound signal over different frequency bands using each forward sound signal and each side sound signal comprises:
<1> time-frequency-decomposing each of the forward sound signals and each of the side sound signals, and converting from a time-domain signal to a frequency signal;
<2> adding time-frequency units of the same frequency band of the first and second forward sound signals to obtain each time-frequency unit of the forward sound signal; adding the time-frequency units of the same frequency band of the first lateral sound signal and the second lateral sound signal to obtain each time-frequency unit of the lateral sound signal;
<3> performing time-frequency compensation for each time-frequency unit of the side-direction sound signal;
<4> the IID value and the ITD value are calculated for each time-frequency unit of the forward sound signal and the side sound signal, thereby generating the preliminary masking threshold of the enhanced forward sound signal.
4. A head-worn device for sound enhancement recording as in claim 3, wherein: before adding the time-frequency units of the same frequency band of the first forward sound signal and the second forward sound signal, the first forward sound signal and the second forward sound signal are subjected to equalization processing so that the energy values of the first forward sound signal and the second forward sound signal in each frequency band tend to be consistent.
5. A head-worn device for sound enhancement recording as in claim 3, wherein: the step of performing noise reduction processing on the enhanced forward sound signal by using the masking threshold to obtain an output voice signal includes:
<1> performing voice activation detection on the forward sound signal;
<2> obtaining a final masking threshold of the enhanced forward sound signal based on the voice activation detection result and the preliminary masking threshold;
<3> smoothing the final masking threshold and masking the enhanced speech signal using the smoothed final masking threshold;
<4> each time-frequency unit of the enhanced speech signal subjected to the masking processing is converted into the time domain, resulting in an output speech signal.
6. The head-worn device for sound enhancement recording as in claim 5, wherein: the step of performing time-frequency compensation on each time-frequency unit of the side direction sound signal comprises the following steps: and updating the time-frequency compensation parameters according to the voice detection result.
7. A head wear device for sound enhancement recording as in any one of claims 1-6, wherein:
the head wearing device is a pair of glasses, the front part of the head wearing device is a front frame for fixing lenses, and the side part of the head wearing device is two glasses legs positioned at two sides of the front frame;
the first forward microphone module and the second forward microphone module are positioned in the middle of the front frame, and the first lateral microphone module, the second lateral microphone module and the signal synthesis module are positioned in the glasses legs.
8. A head wear device for sound enhancement recording according to any one of claims 1 to 6, wherein the head wear device further comprises a sound playing module for playing sound to a user in accordance with the output signal.
9. A head wear device for sound enhancement recording according to any one of claims 1 to 6, further comprising a speech recognition module for speech recognition of the output signal to convert the sound signal into text information.
10. The head wear device for sound enhancement recording of claim 9, wherein the head wear device further comprises an information display module for displaying the text information.
11. The head wear device for sound enhancement recording as in claim 10, wherein the voice recognition module is further configured to convert the recognized text information into text information in a user-specified language and send the text information to the information display module for display.
12. The head wear device for sound enhancement recording of claim 11, wherein the information display module is a retina projection device for projecting the text information onto a retina of a user.
13. The head wear device for sound enhancement recording as in claim 8, wherein the voice recognition module is further configured to convert the recognized text information into text information in a user-specified language, and then convert the converted text information into a sound signal, and send the sound signal to the sound playing module for playing.
CN202310627833.XA 2023-05-31 2023-05-31 Head wearing device for sound enhanced recording Pending CN116721657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310627833.XA CN116721657A (en) 2023-05-31 2023-05-31 Head wearing device for sound enhanced recording

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310627833.XA CN116721657A (en) 2023-05-31 2023-05-31 Head wearing device for sound enhanced recording

Publications (1)

Publication Number Publication Date
CN116721657A true CN116721657A (en) 2023-09-08

Family

ID=87865257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310627833.XA Pending CN116721657A (en) 2023-05-31 2023-05-31 Head wearing device for sound enhanced recording

Country Status (1)

Country Link
CN (1) CN116721657A (en)

Similar Documents

Publication Publication Date Title
JP6009619B2 (en) System, method, apparatus, and computer readable medium for spatially selected speech enhancement
US11068668B2 (en) Natural language translation in augmented reality(AR)
CN106023983B (en) Multi-user voice exchange method and device based on Virtual Reality scene
CN107710784B (en) System and method for audio creation and delivery
US20180206028A1 (en) Wearable communication enhancement device
US20140236594A1 (en) Assistive device for converting an audio signal into a visual representation
US20140278385A1 (en) Noise Cancelling Microphone Apparatus
CN111833896A (en) Voice enhancement method, system, device and storage medium for fusing feedback signals
US11849274B2 (en) Systems, apparatus, and methods for acoustic transparency
CN113544775B (en) Audio signal enhancement for head-mounted audio devices
US20220232342A1 (en) Audio system for artificial reality applications
Guiraud et al. An introduction to the speech enhancement for augmented reality (spear) challenge
US11410669B2 (en) Asymmetric microphone position for beamforming on wearables form factor
Li et al. Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition
Corey Microphone array processing for augmented listening
CN116721657A (en) Head wearing device for sound enhanced recording
US20110261971A1 (en) Sound Signal Compensation Apparatus and Method Thereof
US20220122630A1 (en) Real-time augmented hearing platform
CN116594197A (en) Head wearing device based on three-microphone directional sound recording
Lüke et al. Creation of a Lombard speech database using an acoustic ambiance simulation with loudspeakers
US20220180885A1 (en) Audio system including for near field and far field enhancement that uses a contact transducer
US11587578B2 (en) Method for robust directed source separation
Magadum et al. An Innovative Method for Improving Speech Intelligibility in Automatic Sound Classification Based on Relative-CNN-RNN
CN117636836A (en) Headset voice processing method and headset
JP2011112671A (en) Telescopic feedback device for one&#39;s own voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination