CN116594197A - Head wearing device based on three-microphone directional sound recording - Google Patents

Head wearing device based on three-microphone directional sound recording Download PDF

Info

Publication number
CN116594197A
CN116594197A CN202310537930.XA CN202310537930A CN116594197A CN 116594197 A CN116594197 A CN 116594197A CN 202310537930 A CN202310537930 A CN 202310537930A CN 116594197 A CN116594197 A CN 116594197A
Authority
CN
China
Prior art keywords
signal
sound
module
microphone module
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310537930.XA
Other languages
Chinese (zh)
Inventor
周玉军
刘志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Oriole Intelligent Technology Co ltd
Original Assignee
Shenzhen Oriole Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Oriole Intelligent Technology Co ltd filed Critical Shenzhen Oriole Intelligent Technology Co ltd
Priority to CN202310537930.XA priority Critical patent/CN116594197A/en
Publication of CN116594197A publication Critical patent/CN116594197A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C11/00Non-optical adjuncts; Attachment thereof
    • G02C11/06Hearing aids
    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C5/00Constructions of non-optical parts
    • G02C5/02Bridges; Browbars; Intermediate bars
    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C5/00Constructions of non-optical parts
    • G02C5/14Side-members
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Optics & Photonics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Signal Processing (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a head wearing device based on three-microphone directional sound recording, which comprises a front frame provided with three microphone modules and a signal synthesis module, wherein when the head wearing device is worn on the head of a user, two microphone modules are positioned on the same horizontal plane, and the other microphone module is positioned above or below to form an isosceles triangle, and the vertex angles of the isosceles triangle point to the mouth of the user. The signal synthesis module is used for acquiring and processing sound signals generated by the three microphone modules, enhancing sound from the direction of the mouth and obtaining a first sound signal. The first sound signal is for processing with a second sound signal of a specific area in front of the user to enhance the second sound signal. The invention can effectively separate and enhance the voice of the user and the opposite speaker, and is suitable for the occasions with hearing impairment or real-time language translation.

Description

Head wearing device based on three-microphone directional sound recording
Technical Field
The invention relates to a directional sound receiving device, in particular to a directional sound receiving device suitable for wearing on the head, such as glasses with sound receiving/playing functions, which can provide directional sound recording for the hearing impaired and play or speech recognition.
Background
Head wear devices with sound recording function are not uncommon. For portability, there are known recording elements integrated in various portable items, such as watches, cell phones, headphones, helmets, glasses, etc. The scheme of storing the recording element in the glasses has good portability and controllability of the recording direction. Glasses for wireless communication are proposed, for example, in US patent publication No. US7792552B 2. As shown in fig. 1, it comprises a front frame 1 and two bendable arms, a left arm 2 and a right arm 3, located on both sides of the front frame 1. The left support arm 2 and the right support arm 3 are folded together with the front frame 1 after being bent. The front frame is used for holding two lenses. A microphone 4 is provided in the front frame 1 at the middle of the two lenses, which microphone 4 is used for recording sound. However, since the glasses only use one microphone, the directional recording of the sound cannot be realized, and the effective removal of the environmental noise cannot be performed, so that the recording quality is low, the sound is inconvenient to process to improve the sound quality, and the follow-up advanced operation (such as voice recognition and the like) is also inconvenient.
In order to enhance the quality of sound recording and improve the directionality, chinese patent application publication No. CN114339524a proposes a headset, in which three or four microphones are disposed on a front frame, respectively, see fig. 2 and fig. 3, when three microphones are used, two microphones form a talk microphone group to perform noise reduction processing, and when four microphones are used, a left-right stereo sound recording mode can be realized.
However, none of the above-mentioned prior art glasses or head-mounted devices can realize directional sound reception, especially for specific speaker, and thus are not suitable for the occasions with the directional sound reception requirements. For example, for a person with hearing impairment, the main requirement is to be able to hear and hear the voice of the opposite speaker without the need to reproduce his own speech, while also minimizing the impact of ambient noise on the voice as much as possible. For example, when it is necessary to translate speech in real time, it is necessary to record the voice of the opposite speaker in real time with high quality. However, the existing glasses or head-wearing device with the radio function still cannot realize the point, so if the head-wearing device and the corresponding radio method thereof which can meet the requirements of the hearing impaired or real-time translation can be realized, the glasses or head-wearing device with the radio function can provide great convenience and practicability.
Disclosure of Invention
The invention aims to solve the problems that the existing headset with the sound receiving function can not meet the requirements of people with hearing impairment or real-time voice translation and can not realize directional sound receiving or sound receiving of specific speakers.
In order to solve the technical problems, the invention provides a head wearing device based on three-microphone directional sound recording, which comprises a front frame, wherein when the head wearing device is worn on the head of a user, the front frame faces the front of the face of the user, a first microphone module, a second microphone module and a third microphone module are arranged in the front frame, when the head wearing device is worn on the head of the user, the second microphone module and the third microphone module are positioned on the same horizontal plane, the first microphone module is positioned above or below a connecting line of the second microphone module and the third microphone module, the three microphone modules form an isosceles triangle taking the first microphone module as a vertex angle, and the vertex angle of the isosceles triangle points to the mouth of the user; the head wearing device further comprises a signal synthesis module, wherein the signal synthesis module is used for obtaining sound signals generated by the first microphone module, the second microphone module and the third microphone module, and processing the sound signals generated by the first microphone module, the second microphone module and the third microphone module so as to enhance sound from the direction of the mouth to obtain a first sound signal.
According to a preferred embodiment of the present invention, the signal synthesis module is further configured to process the sound signals generated by the second microphone module and the third microphone module, so as to enhance the sound from a specific area in front of the face of the user, and obtain a second sound signal.
According to a preferred embodiment of the present invention, the signal synthesis module is further configured to process the first sound signal and the second sound signal to remove a first sound portion in the second sound signal as an output signal.
According to a preferred embodiment of the present invention, the step of processing the sound signals generated by the second and third microphone modules to enhance sound from a specific area in front of the face of the user, to obtain a second sound signal includes: adding and dividing the sound signals generated by the second microphone module and the third microphone module by two to obtain a signal d (n), subtracting and dividing by two to obtain a signal x (n), wherein n represents a time sequence number; processing the signal x (n) by an adaptive filter to obtain a signal y (n); the signal d (n) minus the signal y (n) results in an enhanced output signal e (n) as the second sound signal, while the signal e (n) is reused for updating the adaptive filter parameters by the parameter updating module input to the adaptive filter.
According to a preferred embodiment of the present invention, the head wearing device is an eyeglass comprising two temples located at both sides of a front frame, wherein the front frame is used for fixing two lenses, and a first microphone module, a second microphone module and a third microphone module located between the two lenses, and the signal synthesizing module is located in the front frame or the temples.
According to a preferred embodiment of the present invention, the head wearing apparatus further includes a sound playing module for playing sound to a user according to the output signal.
According to a preferred embodiment of the present invention, the sound playing module is located at a portion of the glasses leg near the ear of the user, and is electrically connected to the signal synthesizing module to receive the output signal.
According to a preferred embodiment of the present invention, the head wearing apparatus further comprises a voice recognition module for performing voice recognition on the output signal to convert the sound signal into text information.
According to a preferred embodiment of the present invention, the head wearing apparatus further includes an information display module for displaying the text information.
According to a preferred embodiment of the present invention, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and send the text information to the information display module for display.
According to a preferred embodiment of the present invention, the information display module is a retina projection device for projecting the text information onto the retina of the user.
According to a preferred embodiment of the present invention, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and then convert the converted text information into a sound signal, and send the sound signal to the sound playing module for playing.
According to a preferred embodiment of the invention, the first microphone module and the second microphone module form a first channel, and the first microphone module and the third microphone module form a second channel;
the signal synthesis module is used for: acquiring sound signals generated by the first microphone module, the second microphone module and the third microphone module; respectively calculating delay subtraction beam forming signals of the first channel and the second channel, wherein the delay subtraction beam forming signals comprise a first channel forward forming signal, a first channel backward forming signal, a second channel forward forming signal and a second channel backward forming signal; dividing each time-delay subtraction beam forming signal into a preset number of frequency spectrum sub-band signals, calculating signal energy E of each same frequency spectrum sub-band of a forward forming signal and a backward forming signal of the same channel fij And E is bij F represents forward direction, b represents backward direction, i is channel number, j is frequency spectrum subband number; calculating a relative energy statistical parameter according to the energy of the spectrum subband signals of the forward forming signal and the backward forming signal; imparting a first gain G to each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal according to the relative energy statistic parameters 1ij Where i is a channel number and j is a spectrum subband number; calculating a second gain G when the first channel forward forming signal and the second channel forward forming signal attenuate to a steady state noise energy level 2ij Where i is a channel number and j is a spectrum subband number;for each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal, comparing the first gain G 1ij And a second gain G 2ij And taking the maximum value of the two as the channel final gain G of the frequency spectrum subband signal ij The method comprises the steps of carrying out a first treatment on the surface of the Calculating the energy value E 'after noise reduction of each spectrum subband signal of the first channel forward forming signal and the second channel forward forming signal' fij =E fij *G ij And comparing energy values E 'of spectral subband signals of the same spectral subband of the two forward forming signals' f1j And E' f2j The final gain G of the spectrum subband j j Gain of a channel with small energy value is obtained; after the gain value of the spectrum sub-band is smoothed, the final complex spectrum of the spectrum sub-band is calculated, and then the time domain signal, namely the first sound signal, is obtained through inverse Fourier transform.
According to a preferred embodiment of the invention, the delay subtracted beamformed signal is calculated using a first order delay beamforming algorithm.
According to a preferred embodiment of the present invention, the method for calculating the relative energy statistics parameter value is: initializing a relative energy statistical parameter value to be 0, comparing the energy of each same frequency spectrum sub-band signal of the forward forming signal and the backward forming signal one by one, and adding 1 when the energy of the frequency spectrum sub-band signal of the forward forming signal is larger, otherwise subtracting 1.
According to a preferred embodiment of the present invention,
the signal synthesis module is also used for carrying out steady-state noise reduction processing on the time domain signal obtained by the Fourier transform.
The invention can effectively separate and enhance the voice of the user and the opposite speaker, and is suitable for the occasions with hearing impairment or real-time language translation.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects achieved more clear, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted, however, that the drawings described below are merely illustrative of exemplary embodiments of the present invention and that other embodiments of the drawings may be derived from these drawings by those skilled in the art without undue effort.
Fig. 1 is a schematic structural view of a pair of wireless communication glasses according to the prior art.
Fig. 2 is a schematic diagram of a structure of a head-cutting device provided with three microphones in the related art.
Fig. 3 is a schematic diagram of a structure of a head-cutting device provided with four microphones in the related art.
Fig. 4 is a schematic diagram of an NLMS algorithm employed by an embodiment of the present invention to obtain a second sound signal.
Fig. 5 is a front view of a head-mounted device with sound recording function according to an embodiment of the present invention.
Fig. 6 is a schematic view showing a state in which a head wearing apparatus having a sound recording function according to an embodiment of the present invention is worn on a user's head.
Fig. 7 is a schematic side view of a head-wearing apparatus having a sound recording function according to an embodiment of the present invention.
Fig. 8 is a layout view of three-microphone radio of a head-mounted device with a sound recording function according to an embodiment of the present invention.
Fig. 9 is a schematic diagram of delay-subtracted beamforming.
Fig. 10 is a directivity pattern of a dual microphone time-lapse subtracted beamformed signal.
Fig. 11 is a block diagram of a head wearing apparatus with a sound recording function according to the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown, although the exemplary embodiments may be practiced in various specific ways. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, capabilities, effects, or other features described in a particular embodiment may be incorporated in one or more other embodiments in any suitable manner without departing from the spirit of the present invention.
In describing particular embodiments, specific details of construction, performance, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by those skilled in the art. It is not excluded, however, that one skilled in the art may implement the present invention in a particular situation in a solution that does not include the structures, properties, effects, or other characteristics described above.
The flow diagrams in the figures are merely exemplary flow illustrations and do not represent that all of the elements, operations, and steps in the flow diagrams must be included in the aspects of the invention, nor that the steps must be performed in the order shown in the figures. For example, some operations/steps in the flowcharts may be decomposed, some operations/steps may be combined or partially combined, etc., and the order of execution shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or," "and/or" is meant to include all combinations of any one or more of the items listed.
The invention provides a head wearing device for directional sound recording based on three microphones. The term head-worn device as used herein refers to any device that can be worn on the head of a human being and is not limited to the type of device. However, in order to implement the sound recording method of the present invention, the head wearing device of the present invention must have a frame for disposing the microphone, which faces the front of the user's face when the head wearing device is worn on the user's head, and is therefore also referred to herein as a front frame. Thus, when the device is worn by the user, the microphone is positioned on the side of the user's face. Thus, typical examples of the head wearing device in the present invention are frame glasses, VR glasses, a hood, a helmet, a headband, and a hat, an earphone, and the like having a portion on the front side of the head.
According to the invention, three microphone modules are arranged in the front frame, wherein the microphone modules refer to microphone function modules with sound receiving elements and signal preprocessing elements. The three microphones are referred to herein as a first microphone module, a second microphone module, and a third microphone module, respectively. When the head wearing device is worn on the head of a user, two microphone modules are positioned on the same horizontal plane. Here we refer to the microphone modules at the same level as the second and third microphone modules, while the other microphone not at the level is referred to as the first microphone. The first microphone module is positioned above or below the connecting line of the second microphone module and the third microphone module, and therefore, the three microphone modules form an isosceles triangle taking the first microphone module as a vertex angle.
According to the invention, the vertex angle of the isosceles triangle is directed towards the mouth of the user. That is, one vertex of the triangle formed by the three microphone modules points to the user's own sounding site, which is specifically set in order to match the sound processing algorithm of the present invention. According to the sound processing algorithm provided by the invention, the sound recorded in the direction pointed by the vertex angles of the isosceles triangles can be enhanced, and meanwhile, the sound recorded in the direction pointed by the back vertex angles can be weakened.
The first microphone module, the second microphone module and the third microphone module are used for sensing sound to generate sound signals, and under special conditions, the distances between every two microphone modules are equal.
According to the invention, the head wearing device further comprises a signal synthesis module, wherein the signal synthesis module is used for acquiring sound signals recorded by the first microphone module, the second microphone module and the third microphone module, and processing the sound signals recorded by the first microphone module, the second microphone module and the third microphone module so as to enhance sound from the direction of the mouth of the user and obtain a first sound signal. The processed first sound signal cancels or attenuates noise in the original sound signal and may therefore also be referred to as an enhanced sound signal. It can be seen that the present invention allows for a clear recording of the sound of the wearer (i.e., the user himself) wearing the head-worn device of the present invention, and the surrounding or other speech sounds are attenuated. Therefore, for glasses, helmets and the like with the recording function, the invention can improve the sound recording quality and the user experience. The signal synthesis module is formed by elements with signal processing functions, such as a singlechip, a DSP, an FPGA and the like.
More preferably, the signal synthesis module of the present invention is further configured to process the sound signals generated by the second microphone module and the third microphone module, so as to enhance the sound from the specific area in front of the face of the user, and obtain a second sound signal. Speech enhancement can generally be performed using classical NLMS (normalized least mean square adaptive filter) algorithms.
Fig. 4 is a schematic diagram of an adaptive filtering algorithm employed by an embodiment of the present invention to obtain a second sound signal. As shown in fig. 4, the sound signals (time-domain sampling signals) obtained by the second microphone module and the third microphone module are added and divided by two to obtain a signal d (n), the sound signals (time-domain sampling signals) obtained by the second microphone module and the third microphone module are subtracted and divided by two to obtain a signal x (n), and the signal x (n) is adaptively changed by an M-orderThe filter processing results in the signal y (n). The signal d (n) minus the signal y (n) is an enhanced output signal e (n) (the second sound signal of the signal synthesis module), and the signal e (n) is input to the parameter updating module of the adaptive filter to update the parameters of the M-order adaptive filter. In the figure, n represents the time sequence number, z -1 Represents a delay unit, M is the order of the filter, W 1 、W 2 、…W M M delay signals respectively representing parameters of M-order filter, x (n), x (n-1), x (n-2), … … and x (n-M+1) signals x (n) are respectively marked as x 1 (n)、x 2 (n)、……、x M (n). In this embodiment, the adaptive filter may be an NLMS (normalized least mean square error) filter, and the algorithm adopted by the parameter updating module of the adaptive filter is an NLMS algorithm.
As an embodiment, the signal synthesis module is further configured to process the first sound signal and the second sound signal to remove a first sound portion in the second sound signal, as the output signal. As the first sound signal only contains the user's own voice, as another implementation way, a voice detection algorithm based on a signal energy value can be used for the first sound signal, the voice detection result is applied to the second sound signal, when the user speaks himself, the second sound signal outputs silence, and when the user does not speak himself, the second sound signal outputs the voice of the opposite speaker.
Fig. 5 is a front view of a head-mounted device with sound recording capabilities in accordance with an embodiment of the present invention. As shown in fig. 5, the head wearing device of this embodiment is an eyeglass comprising two temples, a first temple 1 and a second temple 2, located on both sides of a front frame 3, wherein the front frame 3 is used to fix two lenses, and a first microphone module mic1, a second microphone module mic2 and a third microphone module mic3 located between the two lenses. The signal synthesizing module 5 in this embodiment is located in the second earpiece 2 (not shown in fig. 5).
Fig. 6 is a schematic view showing a state in which the head-wearing apparatus of the embodiment of the present invention is worn on the head of a user. As shown in fig. 6, when the eyeglass as the head wearing device is worn on the head of the user, three microphones are positioned in the middle of the eyeglass front frame 3, and thus are positioned directly above the bridge of the nose. The second microphone module mic2 and the third microphone module mic3 are positioned on the same horizontal plane, namely, the connecting line of the second microphone module mic2 and the third microphone module mic3 is a horizontal line. The first microphone module mic1 is located below the connecting line of the second microphone module mic2 and the third microphone module mic3, and the three microphone modules form an isosceles triangle taking the first microphone module as a vertex angle, so that the vertex angle of the isosceles triangle just points to the mouth of a user.
In the glasses of this embodiment, a sound reproduction function is added to the glasses, that is, a sound is reproduced to the user. Thus, the head-wearing device of this embodiment of the invention further includes a sound playing module. The sound source to be played may in principle be any sound source, but the present invention preferably receives as sound source signals the output signals from the above-mentioned processing of the first sound signal and the second sound signal, as this is particularly useful for people with hearing impairment. As described above, the microphone module and the signal synthesis module of the present invention can enhance the voice of the user and the opposite speaker, thereby further processing the voice signal to reduce or eliminate the own voice of the user. Thus, when the sound reproduction element with hearing aid function is adopted by the sound reproduction element, the sound signal of the opposite speaker can be enhanced in real time and sent into the ear of the user, so that the hearing impaired user can be helped to hear and hear the sound of the opposite speaker.
Fig. 7 is a schematic side view of eyeglasses as a head-mounted device in accordance with an embodiment of the present invention. As shown in fig. 7, the signal synthesis module 4 is disposed in the second earpiece 2, the sound playing module 6 is disposed on and fixed to the two earpieces, and the sound playing module 6 is located at a portion of the earpiece near the ear of the user, which may be, for example, a bone conduction hearing aid element, and is electrically connected to the signal synthesis module 5 to receive the output signal. It should be appreciated that the present invention is not limited to a particular type of sound playing device, and any sound playing element suitable for playing the processed output signal may be utilized to implement the solution of the present invention and achieve the corresponding advantageous effects.
As further shown in fig. 7, the head wearing apparatus of the present invention further preferably includes a voice recognition module 7 for performing voice recognition on the output signal to convert a sound signal into text information. Particularly, for the occasions such as frame glasses, VR glasses, helmets and the like which are suitable for placing display elements in front of eyes of users, the invention has a preferable scheme that the voice recognition is carried out on output signals, and word information after the voice recognition can be utilized in various application scenes.
The application scene is still an application scene for people with hearing impairment, and for users with hearing impairment or almost hearing impairment, the method of converting the voice of the opposite speaker into text information and displaying the text information is a scheme which has to be adopted, and is the most intuitive way of better solving the real-time understanding of the words of the opposite speaker. In this case, the glasses of the above embodiments further include an information display module for displaying the text information. In one embodiment shown in fig. 7, the information display module includes a light source 8, a projection element 9, and a special lens that is displayed to accept the projection of the projection element 9. Thus, the information display module is a retina projection device consisting of a source 8, a projection element 9 and a lens, which can project the text information onto the retina of the user. After the user wears the glasses, the words of the opposite speaker converted by the voice recognition module 7 are displayed in real time in front of the eyes of the user. In this embodiment, the projection element 9 receives the converted phonetic text information from the speech recognition module 7.
It will be appreciated that the invention is not limited to the type of information display module provided, and that existing components suitable for displaying images in front of the eyes of a user may be employed.
Another application scenario is real-time translation. At this time, the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and send the text information to the information display module for display. If the user is not in the same language as the opposite speaker, the present invention may convert the recognized spoken words of the opposite speaker into words in the language specified by the user. For example, after the user wears the glasses of the present invention, the language words converted from the foreign language, dialect, etc. uttered by the opposite speaker to the familiar language words can be displayed in real time in front of the user. Besides the converted language characters can be displayed in front of eyes of a user through retina projection, the converted language characters can be converted into voice, and then sent to a sound playing module to be played for the user to listen. The text-to-speech function may be implemented in the speech recognition module or by a separate module.
According to the invention, the first microphone module and the second microphone module are called a first microphone module group, the first microphone module and the third microphone module are called a second microphone module group, and the included angles of the two groups of microphone modules can be changed, but more preferably 60 degrees are adopted, that is, the distances between the first microphone module, the second microphone module and the third microphone module are equal, so that a better directivity selection effect and a better noise reduction effect are achieved. The steps of acquiring and first sound signals are described in detail below.
The invention firstly acquires sound signals generated by the first microphone module, the second microphone module and the third microphone module, and then processes the sound signals by a beam forming method. Specifically, the first microphone module and the second microphone module form a first channel, the first microphone module and the second microphone module are called a first microphone module group, the third microphone module group and the first microphone module group form a second channel, and delay subtraction beam forming signals of the first channel and the second channel are calculated respectively, namely the delay subtraction beam forming signals comprise a first channel forward forming signal, a first channel backward forming signal, a second channel forward forming signal and a second channel backward forming signal.
Next, the present invention calculates a forward forming signal and a backward forming signal of the same channel for each of the time-lapse subtraction beam forming signals divided into a predetermined number of spectrum subband signalsSignal energy E for each same spectral subband fij And E is bij . Wherein f represents forward direction, b represents backward direction, i is channel number, 1 or 2 is taken, j is spectrum subband number, the values are 1 to n, n is the number of preset spectrum subbands, and the number is a natural number greater than or equal to 2. In the present invention, n is preferably 64 to 256, more preferably 128.
The invention innovatively proposes that the energy of the spectrum sub-band signals of the forward forming signal and the backward forming signal are compared and counted to obtain relative energy statistical parameters, and the first gain G is given to each spectrum sub-band signal of the first channel forward forming signal and the second channel forward forming signal according to the relative energy statistical parameters 1ij Where i is the channel number and j is the spectral subband number. Imparting a first gain G 1ij When the relative energy statistical parameter value is larger, the larger gain coefficient is given, the smaller the relative energy statistical parameter value is, the smaller the gain coefficient is given, and the gain coefficient is between 0 and 1.
The relative energy statistics parameter in the present invention indicates whether the forward energy is high or the backward energy is high in the current period of time, or indicates the relative magnitude between the forward energy and the backward energy of the sound. According to the invention, when the sound is in front, the forward energy is larger than the backward energy, the statistic value is larger, the corresponding gain value is also larger, and the sound is reserved; when the sound is in the rear, the backward energy is larger than the forward energy, the statistic value becomes smaller, the corresponding gain value becomes smaller, and the sound is restrained.
For each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal, the present invention compares the first gain G 1ij And a second gain G 2ij And taking the maximum value of the two as the channel final gain G of the frequency spectrum subband signal ij The method comprises the steps of carrying out a first treatment on the surface of the The second gain is the gain at which the spectral subband signals decay to a steady state noise energy level. That is, the present invention also requires first calculating a second gain G when the first channel forward forming signal and the second channel forward forming signal attenuate to a steady state noise energy level 2ij This can be calculated by spectral subtraction with stationary noise reduction for each spectral subband.
Then, the present invention calculates a denoised energy value E 'of each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal' fij =E fij *G ij And comparing energy values E 'of spectral subband signals of the same spectral subband of the two forward forming signals' f1j And E' f2j The final gain G of the spectrum subband j j Gain of a channel with small energy value is obtained; and combining the subband spectrum with low energy in each frequency spectrum subband of the forward signals in the two channels with gain, and finally outputting one channel.
Finally, after the gain value of the spectrum sub-band is smoothed, the final complex spectrum of the spectrum sub-band is calculated, and then the time domain signal is obtained through inverse Fourier transform. Specifically, gain G for each spectral subband j And performing front-back frame smoothing. The smoothed gain is G jn The previous smoothing gain is G j(n-1)
If G jn >=G j(n-1)
G jn =a*G j(n-1) +(1-a)*G j
If G jn <G j(n-1) ,G jn =b*G j(n-1) +(1-b)*G j
Specific embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 8 is a layout diagram of a three microphone module sound reception according to an embodiment of the present invention. As shown in fig. 8, the first microphone module mic1, the second microphone module mic2 and the third microphone module mic3 are arranged in an equidistant manner. In this example, the spacing is 20mm. The first microphone module mic1, the second microphone module mic2 and the third microphone module mic3 may be integrated in one entity device or may be implemented in different entity devices, respectively.
The three microphone modules form a microphone module array, and the first microphone module mic1, the second microphone module mic2, the first microphone module mic1 and the third microphone module mic3 respectively form two first-order delay subtraction beam forming. The sound receiving signal channel formed by the first microphone module mic1 and the second microphone module mic2 is called a first channel, and the sound receiving signal channel formed by the first microphone module mic1 and the third microphone module mic3 is called a second channel. Using the delay τ=d/c, it can be seen that the radio patterns of the first channel and the second channel are heart-shaped. The combined sound pickup direction of the three microphone modules is the overlapping portion of the two hearts. The beamforming algorithm has different gains for sounds in different directions. The beamforming can well apply spatial information of sound, and thus has a good suppression effect on spatial noise.
The first-order delay subtraction is a simpler and classical beamforming technique, and is performed by using two microphones to collect speech, delaying one input signal, and then subtracting the other signal.
Fig. 9 is a schematic diagram of delay-subtracted beamforming. As shown in fig. 9, let the original sound signal be y (t), two microphone modules input sound signals s 1 (t)、s 2 (t), the enhanced output sound signal x (t) is:
x(t)=s 1 (t)-s 2 (t-τ),τ=d/c。
where τ is the delay value and d is the microphone module spacing.
The input signals of the two microphone modules are as follows:
s 1 (t)=y(t)
s 2 (t) =y (t- (d/c) ·cos θ), θ being the sound heading angle.
Substitution can be obtained:
x(t)=y(t)-y(t-(d/c)·cosθ-τ)
for the convenience of analysis, letThen it can be written as:
the power spectrum output by the microphone module array is:
when the distance between the microphone modules is smaller, taylor expansion is carried out, and the microphone modules are usedThe method can obtain the following steps:
as can be seen from the above equation, the directivity pattern of the microphone module array is related to the delay value τ.
When τ=d/c, the formula becomes:
when θ=0°, the gain is maximum; when θ=180°, the gain is minimum. The pattern diagram is called a Cardioid (Cardioid) diagram at this time, as shown in fig. 10.
In this embodiment, the first microphone module mic1 and the second microphone module mic2, and the first microphone module mic1 and the third microphone module mic3 respectively form two first-order delay subtraction beam forming. The delay τ=d/c is used, so that the acoustic patterns of the first microphone module mic1 and the second microphone module mic2, and the acoustic patterns of the first microphone module mic1 and the third microphone module mic3 are all heart-shaped. The combined sound pickup direction of the three microphone modules is the overlapping portion of the two hearts.
The invention uses a first-order delay beamforming algorithm to calculate the band energy of forward and backward delay subtraction beamforming signals of the first channel and the second channel respectively.
This embodiment divides 128 spectral subband signals for each acquired sound signal. In other embodiments, it may be divided into other predetermined numbers of spectral subband signals. Then, calculate the forward shape of the same channelSignal energy E of each identical spectral subband of the resultant signal and the backward-formed signal fij And E is bij . Wherein E represents energy, f represents forward direction, b represents backward direction, i is channel number, 1 or 2 is taken, and the two channels represent a first channel and a second channel respectively. j is a spectrum subband sequence number and takes a value of 1 to 128.
Calculating relative energy statistical parameters according to the energy of the spectrum subband signals of the forward forming signal and the backward forming signal, and giving a first gain G to each spectrum subband signal of the first channel forward forming signal and the second channel forward forming signal according to the relative energy statistical parameters 1ij Where i is the channel number and j is the spectral subband number. As described above, the relative energy statistics in the present invention indicate whether the forward energy is high or the backward energy is high for a current period of time, or, in other words, indicate the relative magnitude between the forward energy and the backward energy of the sound. According to the invention, when the sound is in front, the forward energy is larger than the backward energy, the statistic value is larger, the corresponding gain value is also larger, and the sound is reserved; when the sound is in the rear, the backward energy is larger than the forward energy, the statistic value becomes smaller, the corresponding gain value becomes smaller, and the sound is restrained.
For each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal, the present invention compares the first gain G 1ij And a second gain G 2ij And taking the maximum value of the two as the channel final gain G of the frequency spectrum subband signal ij The method comprises the steps of carrying out a first treatment on the surface of the The second gain is the gain at which the spectral subband signals decay to a steady state noise energy level. That is, the present invention also requires first calculating a second gain G when the first channel forward forming signal and the second channel forward forming signal attenuate to a steady state noise energy level 2ij
Then, the present invention calculates a denoised energy value E 'of each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal' fij =E fij *G ij And comparing energy values E 'of spectral subband signals of the same spectral subband of the two forward forming signals' f1j And E' f2j The final gain G of the spectrum subband j j Energy takingGain of the channel of small magnitude;
finally, after the gain value of the spectrum sub-band is smoothed, the final complex spectrum of the spectrum sub-band is calculated, and then the time domain signal is obtained through inverse Fourier transform.
Those skilled in the art will appreciate that all or part of the steps implementing the above-described embodiments are implemented as a program, i.e., a computer program, executed by a data processing apparatus (including a computer). The above-described method provided by the present invention can be implemented when the computer program is executed. Moreover, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, for example, a magnetic disk or a tape storage array. The storage medium is not limited to a centralized storage, but may be a distributed storage, such as cloud storage based on cloud computing.
The following describes apparatus embodiments of the invention that may be used to perform method embodiments of the invention. Details described in the embodiments of the device according to the invention should be regarded as additions to the embodiments of the method described above; for details not disclosed in the embodiments of the device according to the invention, reference may be made to the above-described method embodiments.
FIG. 11 is a block diagram of a radio receiver according to an embodiment of the present invention. As shown in fig. 11, the signal synthesizing module acquires sound signals generated by the first, second and third microphone modules and processes the signals to generate final processed sound signals as output signals, which can preserve the voice of the opposite speaker, eliminate and attenuate the voice of the opposite speaker, and environmental noise. The voice recognition module is connected to the signal synthesis module to acquire output voice signals and then carries out voice recognition, and character information obtained after recognition can be directly sent to the retina projection module for display, and can also be sent to the voice playing module for playing after processing of converting characters into voices. Optionally, the voice recognition module is further configured to convert the text information obtained by recognition into text information of a user finger language, and send the text information to the retina projection module or voice, or send the text information to the voice playing module for playing after processing the text to the voice.
It will be appreciated by those skilled in the art that the modules in the above embodiments may be distributed in a device as described, or may be distributed in one or more devices different from the above embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules. For example, in the above embodiment, the text translation and text-to-speech functions of the speech recognition module may be implemented together, or may be implemented by a sub-module thereof, or may be implemented by another module independent of the speech recognition module.
The above-described specific embodiments further describe the objects, technical solutions and advantageous effects of the present invention in detail, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic apparatus, and various general-purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (16)

1. A three-microphone directional sound recording based head wearable device comprising a front frame that faces forward of a user's face when the head wearable device is worn on the user's head, characterized in that:
The front frame is internally provided with a first microphone module, a second microphone module and a third microphone module, when the head wearing device is worn on the head of a user, the second microphone module and the third microphone module are positioned on the same horizontal plane, the first microphone module is positioned above or below a connecting line of the second microphone module and the third microphone module, the three microphone modules form an isosceles triangle taking the first microphone module as a vertex angle, and the vertex angle of the isosceles triangle points to the mouth of the user;
the head wearing device further comprises a signal synthesis module, wherein the signal synthesis module is used for obtaining sound signals generated by the first microphone module, the second microphone module and the third microphone module, and processing the sound signals generated by the first microphone module, the second microphone module and the third microphone module so as to enhance sound from the direction of the mouth to obtain a first sound signal.
2. The head wear device of claim 1, wherein,
the signal synthesis module is further configured to process the sound signals generated by the second microphone module and the third microphone module, so as to enhance the sound from the specific area in front of the face of the user, and obtain a second sound signal.
3. The head wear device of claim 2, wherein the signal synthesis module is further configured to process the first sound signal and the second sound signal to remove a first sound portion of the second sound signal as an output signal.
4. The head wear device of claim 2, wherein,
the step of processing the sound signals generated by the second microphone module and the third microphone module to enhance the sound from the front specific area of the face of the user, and obtaining a second sound signal includes:
adding and dividing the sound signals generated by the second microphone module and the third microphone module by two to obtain a signal d (n), subtracting and dividing by two to obtain a signal x (n), wherein n represents a time sequence number;
processing the signal x (n) by an adaptive filter to obtain a signal y (n);
the signal d (n) minus the signal y (n) results in an enhanced output signal e (n) as the second sound signal, while the signal e (n) is reused for updating the adaptive filter parameters by the parameter updating module input to the adaptive filter.
5. The head wear device of claim 3, wherein the head wear device is an eyeglass comprising two earpieces on either side of a front frame, wherein the front frame is configured to hold two lenses, and a first microphone module, a second microphone module, and a third microphone module positioned between the two lenses, wherein the signal combining module is positioned in either the front frame or the earpieces.
6. The head wear device of claim 5, further comprising a sound playing module for playing sound to a user based on the output signal.
7. The head wear device of claim 6, wherein,
the sound playing module is positioned at the part of the glasses leg, which is close to the ear of the user, and is electrically connected to the signal synthesizing module so as to receive the output signal.
8. The head wear device of claim 5, further comprising a voice recognition module for voice recognition of the output signal to convert the sound signal to text information.
9. The head wear device of claim 8, further comprising an information display module for displaying the text information.
10. The head wear device of claim 9, wherein the voice recognition module is further configured to convert the recognized text information into text information in a user-specified language and send the text information to the information display module for display.
11. The head wear device of claim 10, wherein the information display module is a retina projection device for projecting the text information onto a retina of a user.
12. The head wear device of claim 8, wherein the voice recognition module is further configured to convert the recognized text information into text information in a language specified by the user, and then convert the converted text information into a sound signal, and send the sound signal to the sound playing module for playing.
13. The head wear device of any one of claims 1 to 12, wherein the first microphone module and the second microphone module form a first channel, and the first microphone module and the third microphone module form a second channel;
the signal synthesis module is used for:
acquiring sound signals generated by the first microphone module, the second microphone module and the third microphone module;
respectively calculating delay subtraction beam forming signals of the first channel and the second channel, wherein the delay subtraction beam forming signals comprise a first channel forward forming signal, a first channel backward forming signal, a second channel forward forming signal and a second channel backward forming signal;
dividing each time-delay subtraction beam forming signal into a preset number of frequency spectrum sub-band signals, calculating signal energy of each same frequency spectrum sub-band of a forward forming signal and a backward forming signal of the same channel E fij AndE bij fthe forward direction is indicated as such,bthe backward direction is indicated as such,ifor the channel number,jsequence number for spectrum subband;
calculating a relative energy statistical parameter according to the energy of the spectrum subband signals of the forward forming signal and the backward forming signal;
imparting a first gain to each spectral subband signal of the first channel forward forming signal and the second channel forward forming signal according to the relative energy statistic parametersG ij1 WhereiniFor the channel number,jsequence number for spectrum subband;
calculating a second gain when the first channel forward forming signal and the second channel forward forming signal attenuate to a steady state noise energy levelG ij2 WhereiniFor the channel number,jsequence number for spectrum subband;
comparing the first gain for each spectral subband signal of the first channel forward forming signal and the second channel forward forming signalG ij1 And a second gainG ij2 And taking the maximum value of the two as the final gain of the channel of the frequency spectrum subband signalG ij
Calculating the energy value after noise reduction of each spectrum subband signal of the first channel forward forming signal and the second channel forward forming signalE’ fij = E fij * G ij And comparing energy values of spectral subband signals of the same spectral subband of the two forward forming signalsE’ f j1 AndE’ f j2 sub-band of the spectrumjIs the final gain of (2) G j Gain of a channel with small energy value is obtained;
after the gain value of the spectrum sub-band is smoothed, the final complex spectrum of the spectrum sub-band is calculated, and then the time domain signal, namely the first sound signal, is obtained through inverse Fourier transform.
14. The head wear device of claim 13, wherein: the delay subtracted beamformed signal is calculated using a first order delay beamforming algorithm.
15. The head wear device of claim 13, wherein: the method for calculating the relative energy statistical parameter value comprises the following steps: initializing a relative energy statistical parameter value to be 0, comparing the energy of each same frequency spectrum sub-band signal of the forward forming signal and the backward forming signal one by one, and adding 1 when the energy of the frequency spectrum sub-band signal of the forward forming signal is larger, otherwise subtracting 1.
16. The head wear device of claim 13, wherein:
the signal synthesis module is also used for carrying out steady-state noise reduction processing on the time domain signal obtained by the Fourier transform.
CN202310537930.XA 2023-05-15 2023-05-15 Head wearing device based on three-microphone directional sound recording Pending CN116594197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310537930.XA CN116594197A (en) 2023-05-15 2023-05-15 Head wearing device based on three-microphone directional sound recording

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310537930.XA CN116594197A (en) 2023-05-15 2023-05-15 Head wearing device based on three-microphone directional sound recording

Publications (1)

Publication Number Publication Date
CN116594197A true CN116594197A (en) 2023-08-15

Family

ID=87607451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310537930.XA Pending CN116594197A (en) 2023-05-15 2023-05-15 Head wearing device based on three-microphone directional sound recording

Country Status (1)

Country Link
CN (1) CN116594197A (en)

Similar Documents

Publication Publication Date Title
US10379386B2 (en) Noise cancelling microphone apparatus
JP6009619B2 (en) System, method, apparatus, and computer readable medium for spatially selected speech enhancement
US9838785B2 (en) Methods circuits devices systems and associated computer executable code for acquiring acoustic signals
US20180206028A1 (en) Wearable communication enhancement device
US11068668B2 (en) Natural language translation in augmented reality(AR)
US20220232342A1 (en) Audio system for artificial reality applications
CN116594197A (en) Head wearing device based on three-microphone directional sound recording
Miyahara et al. A hearing device with an adaptive noise canceller for noise-robust voice input
Lüke et al. Creation of a Lombard speech database using an acoustic ambiance simulation with loudspeakers
CN116721657A (en) Head wearing device for sound enhanced recording
Jeffet et al. Study of a generalized spherical array beamformer with adjustable binaural reproduction
TW201523064A (en) Eyewear spectacle with audio speaker in the temple
CN113132845A (en) Signal processing method and device, computer readable storage medium and earphone
Yong Speech enhancement in binaural hearing protection devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination