CN115240697A - Acoustic device - Google Patents

Acoustic device Download PDF

Info

Publication number
CN115240697A
CN115240697A CN202110486203.6A CN202110486203A CN115240697A CN 115240697 A CN115240697 A CN 115240697A CN 202110486203 A CN202110486203 A CN 202110486203A CN 115240697 A CN115240697 A CN 115240697A
Authority
CN
China
Prior art keywords
noise
signal
target
spatial location
microphone array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110486203.6A
Other languages
Chinese (zh)
Inventor
肖乐
郑金波
张承乾
廖风云
齐心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Voxtech Co Ltd
Original Assignee
Shenzhen Voxtech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Voxtech Co Ltd filed Critical Shenzhen Voxtech Co Ltd
Priority to TW111115388A priority Critical patent/TW202242855A/en
Publication of CN115240697A publication Critical patent/CN115240697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R9/00Transducers of moving-coil, moving-strip, or moving-wire type
    • H04R9/02Details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R9/00Transducers of moving-coil, moving-strip, or moving-wire type
    • H04R9/06Loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/13Hearing devices using bone conduction transducers

Abstract

An acoustic device is disclosed. The acoustic device may include a microphone array, a processor, and at least one speaker. The microphone array may be configured to pick up ambient noise. The processor may be configured to estimate a soundfield of a target spatial location using the microphone array. The target spatial location may be closer to an ear canal of a user than any of the microphones of the array of microphones. The processor may be further configured to generate a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location. The at least one speaker may be configured to output a target signal in accordance with the noise reduction signal. The target signal may be used to reduce the ambient noise. The microphone array may be positioned at a target area to minimize interference signals from the at least one speaker with the microphone array.

Description

Acoustic device
Cross-referencing
This application claims priority from international application No. PCT/CN2021/089670 filed on 25/4/2021, which is hereby incorporated by reference in its entirety.
Technical Field
The present application relates to the field of acoustics, and in particular, to an acoustic device.
Background
The acoustic device allows a user to listen to audio content and carry out voice call while ensuring privacy of user interaction content, and does not disturb people around during listening. Acoustic devices can be generally classified into two types, in-ear acoustic devices and open acoustic devices. The in-ear acoustic device blocks the ear of a user in the using process, and the user easily feels such as blockage, foreign matters and distending pain when wearing the device for a long time. Open acoustic device can open user's ear, is favorable to wearing for a long time, nevertheless when external noise is great, its noise reduction effect is not obvious, reduces and to be used the user to hear and experience.
It is therefore desirable to provide an acoustic device that opens up both of the user's ears and improves the user's listening experience.
Disclosure of Invention
One of the embodiments of the present application provides an acoustic device. The acoustic device may include a microphone array, a processor, and at least one speaker. The microphone array may be configured to pick up ambient noise. The processor may be configured to estimate a soundfield of a target spatial location using the microphone array. The target spatial location may be closer to the ear canal of the user than any of the microphones of the array of microphones. The processor may be further configured to generate a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location. The at least one speaker may be configured to output a target signal in accordance with the noise reduction signal. The target signal may be used to reduce the ambient noise. The microphone array may be positioned at a target area to minimize interference signals from the at least one speaker with the microphone array.
In some embodiments, the generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location may include estimating noise of the target spatial location based on the picked-up ambient noise and generating the noise reduction signal based on the noise of the target spatial location and the sound field estimate of the target spatial location.
In some embodiments, the acoustic device may further include one or more sensors for acquiring motion information of the acoustic device. The processor may be further configured to update the noise at the target spatial location and the sound field estimate at the target spatial location based on the motion information and generate the noise reduction signal based on the updated noise at the target spatial location and the sound field estimate at the updated target spatial location.
In some embodiments, said estimating noise at said target spatial location based on said picked up ambient noise may comprise determining one or more sources of spatial noise related to said picked up ambient noise and estimating noise at said target spatial location based on said sources of spatial noise.
In some embodiments, the estimating the soundfield at the target spatial location using the microphone array may include constructing a virtual microphone based on the microphone array, the virtual microphone including a mathematical model or a machine learning model representing audio data collected by the microphone if the microphone is included at the target spatial location, and estimating the soundfield at the target spatial location based on the virtual microphone.
In some embodiments, the generating a noise reduction signal based on the picked-up ambient noise and the soundfield estimate of the target spatial location may include estimating noise of the target spatial location based on the virtual microphone and generating the noise reduction signal based on the noise of the target spatial location and the soundfield estimate of the target spatial location.
In some embodiments, the at least one speaker may be a bone conduction speaker. The interference signal may include a sound leakage signal and a vibration signal of the bone conduction speaker. The target region may be a region where the total energy of the sound leakage signal and the vibration signal delivered to the bone conduction speaker of the microphone array is minimum.
In some embodiments, the position of the target area may be related to the orientation of the diaphragms of the microphones of the microphone array. The orientation of the diaphragm of the microphone may reduce the magnitude of the vibration signal of the bone conduction speaker received by the microphone. The diaphragm of the microphone may be oriented such that the vibration signal of the bone conduction speaker received by the microphone and the leakage signal of the bone conduction speaker received by the microphone at least partially cancel each other out. The vibration signal of the bone conduction speaker received by the microphone can reduce the sound leakage signal of the bone conduction speaker received by the microphone by 5-6dB.
In some embodiments, the at least one speaker may be an air conduction speaker. The target area may be a sound pressure level minimum area of a radiated sound field of the air conduction speaker.
In some embodiments, the processor may be further configured to process the noise reduction signal based on a transfer function. The transfer function may include a first transfer function and a second transfer function. The first transfer function may represent a change in a parameter of the target signal emanating from the at least one speaker to a location where the target signal and the ambient noise cancel. The second transfer function may represent a change in a parameter of the ambient noise from the target spatial location to a location where the target signal and the ambient noise cancel. The at least one speaker may be further configured to output the target signal in accordance with the processed noise reduction signal.
In some embodiments, the generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location may include dividing the picked-up ambient noise into a plurality of frequency bands, the plurality of frequency bands corresponding to different frequency ranges, and for at least one of the plurality of frequency bands, generating a noise reduction signal corresponding to each of the at least one frequency band.
In some embodiments, the processor may be further configured to amplitude and phase adjust the noise at the target spatial location based on the sound field estimate for the target spatial location to generate the noise reduction signal.
In some embodiments, the acoustic device may further comprise a securing structure configured to secure the acoustic device in a position proximate to the user's ear and not blocking the user's ear canal.
In some embodiments, the acoustic device may further comprise a housing structure configured to carry or house the microphone array, the processor and the at least one speaker.
One embodiment of the present application provides a noise reduction method. The noise reduction method may include picking up ambient noise by an array of microphones. The noise reduction method may include estimating, by a processor, a sound field at a target spatial location using the microphone array. The target spatial location may be closer to the ear canal of the user than any of the microphones of the array of microphones. The noise reduction method may include generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location. The noise reduction method may further include outputting, by at least one speaker, a target signal according to the noise reduction signal. The target signal may be used to reduce the ambient noise. The microphone array may be positioned at a target area to minimize interference signals from the at least one speaker with the microphone array.
Additional features of the present application will be set forth in part in the description which follows. Additional features of some aspects of the present application will be apparent to those of ordinary skill in the art in view of the following description and accompanying drawings, or in view of the production or operation of the embodiments. The features of the present application may be realized and obtained by means of the instruments and methods set forth in the following detailed examples.
Drawings
The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic structural diagram of an exemplary acoustic device according to some embodiments of the present application;
FIG. 2 is a block diagram of an exemplary processor shown in accordance with some embodiments of the present application;
FIG. 3 is an exemplary noise reduction flow diagram of an acoustic device shown in accordance with some embodiments of the present application;
FIG. 4 is an exemplary noise reduction flow diagram of an acoustic device shown in accordance with some embodiments of the present application;
5A-D are schematic diagrams of exemplary arrangements of microphone arrays according to some embodiments of the present application;
FIGS. 6A-B are schematic diagrams of exemplary arrangements of microphone arrays according to some embodiments of the present application;
FIG. 7 is an exemplary flow chart illustrating estimating noise at a target spatial location according to some embodiments of the present application;
FIG. 8 is a schematic illustration of estimating noise at a target spatial location according to some embodiments of the present application;
FIG. 9 is an exemplary flow diagram illustrating estimating a sound field and noise at a target spatial location according to some embodiments of the present application;
FIG. 10 is a schematic diagram of constructing a virtual microphone according to some embodiments of the present application;
FIG. 11 is a schematic diagram of a three-dimensional sound field leakage signal distribution for a bone conduction speaker at 1000Hz in accordance with some embodiments of the present application;
FIG. 12 is a schematic diagram of a two-dimensional sound field leakage signal distribution at 1000Hz for a bone conduction speaker according to some embodiments of the present application;
FIG. 13 is a frequency response diagram of the sum of the vibration signal and the leakage sound signal of a bone conduction speaker according to some embodiments of the present application;
14A-B are schematic illustrations of sound field distributions for air conduction speakers according to some embodiments of the present application;
FIG. 15 is an exemplary flow chart illustrating outputting a target signal based on a transfer function according to some embodiments of the present application; and
FIG. 16 is an exemplary flow chart illustrating estimating noise at a target spatial location according to some embodiments of the present application.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.
Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
An open acoustic device (e.g., an open acoustic earpiece) is an acoustic device that can open the ear of a user. Open acoustic devices may secure a speaker in a position near a user's ear and not blocking the user's ear canal by a securing structure (e.g., earhook, head hook, earpiece, etc.). When the user uses the open acoustic device, the external environment noise may also be heard by the user, which makes the user's listening experience worse. For example, in a place (e.g., a street, a scenic spot, etc.) where external environment noise is large, when a user plays music using an open acoustic device, the noise of the external environment may directly enter the ear canal of the user, so that the user hears the large ambient noise, and the ambient noise may interfere with the music listening experience of the user. For another example, when the user wears the open acoustic device to perform a call, the microphone may pick up not only the user's own speaking voice but also the ambient noise, so that the user's call experience is poor.
In view of the above problems, an acoustic device is provided in the embodiments of the present application. The acoustic device may include a microphone array, a processor, and at least one speaker. The microphone array may be configured to pick up ambient noise. The processor may be configured to estimate the soundfield at the target spatial location using the microphone array. The target spatial location may be closer to the ear canal of the user than any of the microphones of the array of microphones. It will be appreciated that the microphones of the array of microphones may be distributed at different locations near the ear canal of the user, with the microphones of the array of microphones estimating the soundfield at a location near the ear canal of the user (e.g., a target spatial location). The processor may be further configured to generate a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location. The at least one speaker may be configured to output the target signal in accordance with the noise reduction signal. The target signal may be used to reduce ambient noise. Additionally, the microphone array may be positioned at the target area to minimize interference signals from the at least one speaker to the microphone array. When the at least one speaker is a bone conduction speaker, the interference signal may include a leakage sound signal and a vibration signal of the bone conduction speaker, and the target region may be a region where total energy of the leakage sound signal and the vibration signal delivered to the bone conduction speaker of the microphone array is minimum. When the at least one loudspeaker is an air conduction loudspeaker, the target region may be a sound pressure level minimum region of a radiated sound field of the air conduction loudspeaker.
In the embodiment of the application, the target signal output by the at least one loudspeaker is utilized to reduce the ambient noise at the ear canal (for example, the target space position) of the user through the arrangement, so that the active noise reduction of the acoustic device is realized, and the hearing experience of the user in the process of using the acoustic device is improved.
Further, in embodiments of the present application, a microphone array (which may also be referred to as a feed-forward microphone) may enable both the pickup of ambient noise and the estimation of the sound field at the user's ear canal (e.g., the target spatial location).
In addition, in the embodiment of the application, the microphone array is arranged in the target area, so that the microphone array is prevented from picking up interference signals (for example, target signals) emitted by at least one loudspeaker, and active noise reduction of the open type acoustic device is guaranteed.
Fig. 1 is a schematic structural diagram of an exemplary acoustic device 100 shown in accordance with some embodiments of the present application. In some embodiments, the acoustic device 100 may be an open acoustic device. As shown in fig. 1, the acoustic device 100 may include a microphone array 110, a processor 120, and a speaker 130. In some embodiments, the microphone array 110 may pick up ambient noise and convert the picked-up ambient noise into electrical signals that are passed to the processor 120 for processing. The processor 120 may couple (e.g., electrically connect) the microphone array 110 and the speaker 130. The processor 120 may receive and process the electrical signals delivered by the microphone array 110 to generate noise reduction signals and deliver the generated noise reduction signals to the speaker 130. Speaker 130 may output the target signal based on the noise reduction signal. The target signal may be used to reduce or cancel ambient noise at the location of the user's ear canal (e.g., the target spatial location), thereby enabling active noise reduction of the acoustic device 100 and improving the user's hearing experience during use of the acoustic device 100.
The microphone array 110 may be configured to pick up ambient noise. In some embodiments, ambient noise may refer to a combination of multiple ambient sounds in the environment in which the user is located. By way of example only, the environmental noise may include one or more of traffic noise, industrial noise, construction noise, social noise, and the like. Traffic noise may include, but is not limited to, travel noise, whistle noise, etc. of a motor vehicle. Industrial noise may include, but is not limited to, plant power machine operating noise, and the like. Construction noise may include, but is not limited to, power machine digging noise, boring noise, stirring noise, and the like. The social living environment noise may include, but is not limited to, crowd meeting noise, entertainment promotion noise, crowd loud noise, household appliance noise, and the like. In some embodiments, the microphone array 110 may be disposed near the ear canal of the user for picking up ambient noise delivered to the ear canal of the user and converting the picked-up ambient noise into an electrical signal for delivery to the processor 120 for processing. In some embodiments, the microphone array 110 may be disposed at the left and/or right ear of the user. For example, the microphone array 110 may include a first sub-microphone array and a second sub-microphone array. The first sub-microphone array may be located at a left ear of the user and the second sub-microphone array may be located at a right ear of the user. The first sub-microphone array and the second sub-microphone array may be brought into operation simultaneously or one of them may be brought into operation.
In some embodiments, the ambient noise may include the sound of a user speaking. For example, the microphone array 110 may pick up ambient noise according to the call state of the acoustic device 100. When the acoustic device 100 is in an off-call state, the sound generated by the user speaking himself can be regarded as ambient noise, and the microphone array 110 can simultaneously pick up the sound of the user speaking himself and other ambient noise. When the acoustic device 100 is in a speech state, the sound generated by the user speaking himself may not be regarded as the ambient noise, and the microphone array 110 may pick up the ambient noise in addition to the sound of the user speaking himself. For example, the microphone array 110 may pick up noise emanating from noise sources that are some distance (e.g., 0.5 meters, 1 meter) away from the microphone array 110.
In some embodiments, the microphone array 110 may include one or more air conduction microphones. For example, when the user listens to music using the acoustic device 100, the air conduction microphone may simultaneously acquire noise of the external environment and sound when the user speaks and may take the acquired noise of the external environment and the sound when the user speaks together as the environmental noise. In some embodiments, the microphone array 110 may also include one or more bone conduction microphones. The bone conduction microphone can be directly contacted with the skin of the user, and the vibration signal generated by the bone or muscle of the user during speaking can be directly transmitted to the bone conduction microphone, so that the bone conduction microphone converts the vibration signal into an electric signal and transmits the electric signal to the processor 120 for processing. The bone conduction microphone may not be in direct contact with the human body, and the vibration signal generated by the bone or muscle when the user speaks may be transmitted to the shell structure of the acoustic device 100 first, and then transmitted to the bone conduction microphone by the shell structure. In some embodiments, when the user is in a call state, the processor 120 may use the sound signal collected by the air conduction microphone as the ambient noise and use the ambient noise to reduce noise, and transmit the sound signal collected by the bone conduction microphone as a voice signal to the terminal device, so as to ensure the call quality when the user calls.
In some embodiments, processor 120 may control the switching states of the bone conduction microphone and the air conduction microphone based on the operating state of acoustic device 100. The operating state of the acoustic device 100 may refer to a usage state used when the user wears the acoustic device 100. For example only, the operational state of the acoustic device 100 may include, but is not limited to, a talk state, an unlinked state (e.g., music playing state), a send voice message state, and the like. In some embodiments, when the microphone array 110 picks up ambient noise, the switching states of the bone conduction microphones and the air conduction microphones in the microphone array 110 may be determined according to the operating state of the acoustic device 100. For example, when the user wears the acoustic device 100 to play music, the on-off state of the bone conduction microphone may be a standby state, and the on-off state of the air conduction microphone may be an operating state. For another example, when the user wears the acoustic device 100 to transmit a voice message, the on/off state of the bone conduction microphone may be an operating state, and the on/off state of the air conduction microphone may be an operating state. In some embodiments, the processor 120 may control the switching state of the microphones (e.g., bone conduction microphone, air conduction microphone) in the microphone array 110 by sending control signals.
In some embodiments, when the operation state of the acoustic device 100 is the non-call state (e.g., music playing state), the processor 120 may control the bone conduction microphone to be in the standby state and the air conduction microphone to be in the operation state. In the non-talking state of the acoustic device 100, the voice signal of the user speaking himself can be regarded as the environmental noise. In this case, the voice signal of the user speaking himself, which is included in the environmental noise picked up by the air conduction microphone, may not be filtered, so that the voice signal of the user speaking himself, which is a part of the environmental noise, may also be canceled out with the target signal output by the speaker 130. When the operation state of the acoustic device 100 is a call state, the processor 120 may control the bone conduction microphone to be in the operation state and the air conduction microphone to be in the operation state. In the acoustic device 100, the voice signal of the user speaking itself needs to be preserved in the conversation state. In this case, the processor 120 may send a control signal to control the bone conduction microphone to be in an operating state, the bone conduction microphone picks up the voice signal of the user speaking, and the processor 120 removes the voice signal of the user speaking picked up by the bone conduction microphone from the ambient noise picked up by the air conduction microphone, so that the voice signal of the user speaking is not cancelled by the target signal output by the speaker 130, thereby ensuring a normal conversation state of the user.
In some embodiments, when the operation state of the acoustic device 100 is a call state, if the sound pressure of the ambient noise is greater than a preset threshold, the processor 120 may control the bone conduction microphone to maintain the operation state. The sound pressure of the ambient noise may reflect the intensity of the ambient noise. The preset threshold value here may be a value stored in advance in the acoustic device 100, for example, any other value such as 50dB, 60dB, or 70 dB. When the sound pressure of the environmental noise is greater than a preset threshold, the environmental noise may affect the call quality of the user. The processor 120 can control the bone conduction microphone to keep working state by sending a control signal, the bone conduction microphone can obtain the vibration signal of the facial muscle when the user speaks, and basically can not pick up the external environment noise, and at this moment, the vibration signal picked up by the bone conduction microphone is used as the voice signal when the user speaks, so that the normal conversation of the user is ensured.
In some embodiments, when the operating state of the acoustic device 100 is a call state, the processor 120 may control the bone conduction microphone to switch from the operating state to the standby state if the sound pressure of the ambient noise is smaller than a predetermined threshold. When the sound pressure of the environmental noise is smaller than the preset threshold, the sound pressure of the environmental noise is smaller than the sound pressure of the sound signal generated by the user speaking, the sound signal generated by the user speaking transmitted to a certain position of the ear of the user through the first sound path is offset by a part of the target signal output by the loudspeaker 130 and transmitted to the certain position of the ear of the user through the second sound path, and the rest sound signal generated by the user speaking can still be received by the auditory center of the user to ensure that the normal conversation of the user is ensured. In this case, the processor 120 may control the bone conduction microphone to switch from the active state to the standby state by sending the control signal, thereby reducing the complexity of signal processing and the power consumption of the acoustic device 100.
In some embodiments, the microphone array 110 may include a moving coil microphone, a ribbon microphone, a condenser microphone, an electret microphone, an electromagnetic microphone, a carbon particle microphone, or the like, or any combination thereof, depending on the principle of operation of the microphone. In some embodiments, the arrangement of the microphone array 110 may include a linear array (e.g., linear, curvilinear), a planar array (e.g., regular and/or irregular shapes such as a cross, a circle, a ring, a polygon, a mesh, etc.), a volumetric array (e.g., cylindrical, spherical, hemispherical, polyhedral, etc.), etc., or any combination thereof. Further reference may be made to the arrangement of the microphone array 110 elsewhere in this application, for example, fig. 5A-D, 6A-B, and their corresponding descriptions.
The processor 120 may be configured to estimate the sound field at the target spatial location using the microphone array 110. The acoustic field at the target spatial location may refer to the distribution and variation (e.g., variation with time, variation with location) of acoustic waves at or near the target spatial location. The physical quantities describing the sound field may include sound pressure, sound frequency, sound amplitude, sound phase, sound source vibration speed, or medium (e.g., air) density, and the like. Generally, these physical quantities may be a function of location and time. The target spatial position may refer to a spatial position that is close to the ear canal of the user by a certain distance. The target spatial location may be closer to the ear canal of the user than any of the microphones of the microphone array 110. The specific distance here may be a fixed distance, for example, 0.5cm, 1cm, 2cm, 3cm, or the like. In some embodiments, the target spatial location may be related to the number of microphones in the microphone array 110, the distributed location relative to the ear canal of the user. The target spatial position may be adjusted by adjusting the number of microphones in the microphone array 110 and/or the distribution position relative to the ear canal of the user. For example, the target spatial location may be brought closer to the user's ear canal by increasing the number of microphones in the microphone array 110. For another example, the target spatial location may also be brought closer to the ear canal of the user by reducing the spacing of the microphones in the microphone array 110. For another example, the target spatial location may be further close to the ear canal of the user by changing the arrangement of the microphones in the microphone array 110.
The processor 120 may be further configured to generate a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location. In particular, the processor 120 may receive and process the electrical signal converted from the ambient noise delivered by the microphone array 110 to obtain parameters (e.g., amplitude, phase, etc.) of the ambient noise. Processor 120 may further adjust parameters of the ambient noise (e.g., amplitude, phase, etc.) based on the sound field estimate for the target spatial location to generate a noise reduction signal. The parameters (e.g., amplitude, phase, etc.) of the noise reduction signal correspond to parameters of the ambient noise. For example only, the noise reduction signal may have an amplitude approximately equal to an amplitude of the ambient noise and a phase approximately opposite to the phase of the ambient noise. In some embodiments, processor 120 may include hardware modules and software modules. For example only, the hardware modules may include Digital Signal Processor (DSP) chips, advanced reduced instruction set Machines (ARM), and the software modules may include algorithm modules. Further reference to processor 120 may be had to other places in the application, such as fig. 2 and its corresponding description.
Speaker 130 may be configured to output a target signal based on the noise reduction signal. The target signal may be used to reduce or eliminate ambient noise delivered to a location of the user's ear (e.g., tympanic membrane, basement membrane). In some embodiments, the speaker 130 may be located in a position near the user's ear when the acoustic device 100 is worn by the user. In some embodiments, the speaker 130 may include one or more of an electrodynamic speaker (e.g., a moving coil speaker), a magnetic speaker, an ion speaker, an electrostatic speaker (or a capacitive speaker), a piezoelectric speaker, and the like, depending on the operating principle of the speaker. In some embodiments, speaker 130 may include an air conduction speaker and/or a bone conduction speaker depending on the manner in which sound output by the speaker propagates. In some embodiments, the number of speakers 130 may be one or more. When the number of speakers 130 is one, the speakers 130 may be used to output a target signal to eliminate ambient noise and may be used to deliver sound information to the user that the user needs to hear (e.g., device media audio, far-end-of-call audio). For example, when the number of speakers 130 is one and is an air conduction speaker, the air conduction speaker may be used to output a target signal to eliminate ambient noise. In this case, the target signal may be a sound wave (i.e., vibration of the air) that may be transmitted through the air to the target spatial location and cancel ambient noise at the target spatial location. Meanwhile, the air conduction loudspeaker can also be used for transmitting sound information which needs to be heard by the user to the user. For another example, when the number of speakers 130 is one and bone conduction speakers are used, the bone conduction speakers may be used to output a target signal to eliminate ambient noise. In this case, the target signal may be a vibration signal (e.g., vibration of the speaker housing) that may be transmitted through bone or tissue to the basilar membrane of the user and cancel out ambient noise at the basilar membrane of the user. Meanwhile, the bone conduction loudspeaker can also be used for transmitting sound information which needs to be heard by the user to the user. When the number of speakers 130 is plural, a part of the plural speakers 130 may be used to output a target signal to eliminate ambient noise, and another part may be used to deliver sound information (e.g., device media audio, far-end audio) to a user that the user needs to listen to. For example, when the number of speakers 130 is plural and includes a bone conduction speaker and an air conduction speaker, the air conduction speaker may be used to output sound waves to reduce or eliminate ambient noise, and the bone conduction speaker may be used to deliver sound information to the user that the user needs to hear. Bone conduction speakers may transmit mechanical vibrations directly through the user's body (e.g., bone, skin tissue, etc.) to the user's auditory nerve with less interference to the air conduction microphone picking up ambient noise than air conduction speakers.
It should be noted that speaker 130 may be a stand-alone functional device or may be part of a single device capable of multiple functions. For example only, the speaker 130 may be integrated and/or integrated with the processor 120. In some embodiments, when the number of the speakers 130 is multiple, the arrangement of the multiple speakers 130 may include a linear array (e.g., a straight line, a curved line), a planar array (e.g., a regular and/or irregular shape such as a cross, a net, a circle, a ring, a polygon, etc.), a stereoscopic array (e.g., a cylinder, a sphere, a hemisphere, a polyhedron, etc.), etc., or any combination thereof, which is not limited herein. In some embodiments, speaker 130 may be disposed at the left and/or right ear of the user. For example, the speaker 130 may include a first sub-speaker and a second sub-speaker. The first sub-speaker may be located at a left ear of the user and the second sub-speaker may be located at a right ear of the user. The first sub-speaker and the second sub-speaker may be brought into operation simultaneously or one of them may be brought into operation. In some embodiments, speaker 130 may be a speaker with a directed sound field whose main lobe is directed at the user's ear canal.
In some embodiments, the acoustic device 100 may also include one or more sensors 140. The one or more sensors 140 may be electrically connected to other components of the acoustic device 100 (e.g., the processor 120). One or more sensors 140 may be used to acquire physical position and/or motion information of the acoustic device 100. For example only, the one or more sensors 140 may include an Inertial Measurement Unit (IMU), a Global Positioning System (GPS), radar, or the like. The motion information may include a motion trajectory, a motion direction, a motion velocity, a motion acceleration, a motion angular velocity, motion-related time information (e.g., a motion start time, an end time), and the like, or any combination thereof. Taking the IMU as an example, the IMU may include a Micro Electro Mechanical System (MEMS). The micro-electro-mechanical system may include a multi-axis accelerometer, gyroscope, magnetometer, etc., or any combination thereof. The IMU may be used to detect physical position and/or motion information of the acoustic apparatus 100 to enable control of the acoustic apparatus 100 based on the physical position and/or motion information. Further reference to the control of the acoustic device 100 based on physical position and/or motion information may be made elsewhere in this application, for example, fig. 4 and its corresponding description.
In some embodiments, the acoustic device 100 may include a signal transceiver 150. The signal transceiver 150 may be electrically connected with other components of the acoustic device 100 (e.g., the processor 120). In some embodiments, signal transceiver 150 may include bluetooth, an antenna, and the like. The acoustic apparatus 100 may communicate with other external devices (e.g., mobile phone, tablet, smart watch) through the signal transceiver 150. For example, the acoustic apparatus 100 may wirelessly communicate with other devices through bluetooth.
In some embodiments, the acoustic device 100 may include a casing structure 160. The housing structure 160 may be configured to carry other components of the acoustic device 100 (e.g., the microphone array 110, the processor 120, the speaker 130, the one or more sensors 140, the signal transceiver 150). In some embodiments, the housing structure 160 may be a closed or semi-closed structure with a hollow interior, and other components of the acoustic device 100 are located within or on the housing structure. In some embodiments, the shape of the shell structure may be a regular or irregular solid structure such as a rectangular parallelepiped, a cylinder, a circular truncated cone, and the like. The shell structure may be located near the ear of the user when the acoustic device 100 is worn by the user. For example, the housing structure may be located on a peripheral side (e.g., a front side or a rear side) of the user's pinna. As another example, the shell structure may be positioned over the user's ear without occluding or covering the user's ear canal. In some embodiments, the acoustic device 100 may be a bone conduction headset and at least one side of the housing structure may be in contact with the skin of the user. An acoustic driver (e.g., a vibration speaker) in a bone conduction headset converts an audio signal into mechanical vibrations that can be transmitted through the housing structure and the user's bones to the user's auditory nerve. In some embodiments, the acoustic device 100 may be an air conduction earpiece, and at least one side of the housing structure may or may not be in contact with the skin of the user. The side wall of the shell structure comprises at least one sound guide hole, and a loudspeaker in the air guide earphone converts an audio signal into air guide sound which can radiate towards the direction of the ear of a user through the sound guide hole.
In some embodiments, the acoustic device 100 may include a fixed structure 170. The securing structure 170 may be configured to secure the acoustic device 100 in a position near the user's ear without blocking the user's ear canal. In some embodiments, the securing structure 170 may be physically connected (e.g., snapped, threaded, etc.) to the housing structure 160 of the acoustic device 100. In some embodiments, the casing structure 160 of the acoustic device 100 may be part of the fixed structure 170. In some embodiments, the securing structure 170 may include ear loops, back loops, elastic bands, temples, etc., so that the acoustic device 100 may be better secured in position near the user's ears to prevent the user from falling off while in use. For example, the securing structure 170 may be an earhook, which may be configured to be worn around an ear region. In some embodiments, the earhook may be a continuous hook and may be elastically stretched to fit over the user's ear, while the earhook may also exert pressure on the user's pinna such that the acoustic device 100 is securely fixed in a particular position on the user's ear or head. In some embodiments, the ear hook may be a discontinuous band. For example, an earhook may include a rigid portion and a flexible portion. The rigid portion may be made of a rigid material (e.g., plastic or metal) that may be secured to the housing structure 160 of the acoustic device 100 by way of a physical connection (e.g., snap fit, threaded connection, etc.). The flexible portion may be made of an elastic material (e.g., cloth, composite, or/and neoprene). As another example, the fixation structure 170 may be a neck strap configured to be worn around the neck/shoulder area. As another example, the securing structure 170 may be a temple that is configured to be mounted to the ear of a user as part of the eyewear.
In some embodiments, the acoustic apparatus 100 may further include an interaction module (not shown) for adjusting the sound pressure of the target signal. In some embodiments, the interaction module may include buttons, voice assistants, gesture sensors, and the like. The user may adjust the noise reduction mode of the acoustic apparatus 100 by controlling the interaction module. Specifically, the user may adjust (e.g., zoom in or zoom out) the amplitude information of the noise reduction signal by controlling the interaction module to change the sound pressure of the target signal emitted by the speaker array 130, thereby achieving different noise reduction effects. For example only, the noise reduction modes may include a strong noise reduction mode, a mid-level noise reduction mode, a weak noise reduction mode, and so on. For example, when the user wears the acoustic device 100 indoors, the external environment noise is low, and the user may turn off or adjust the noise reduction mode of the acoustic device 100 to the weak noise reduction mode through the interaction module. For another example, when the user wears the acoustic device 100 while walking on a public place such as street, the user needs to listen to audio signals (e.g., music, voice information) while maintaining a certain perception of the surrounding environment to cope with an emergency situation, at which time the user can select a medium noise reduction mode through an interaction module (e.g., a button or a voice assistant) to retain surrounding environmental noise (e.g., a siren sound, an impact sound, a car whistle sound, etc.). For another example, when the user is riding in a vehicle such as a subway or an airplane, the user may select the strong noise reduction mode through the interaction module to further reduce the ambient noise. In some embodiments, the processor 120 may also issue a prompt message to the acoustic apparatus 100 or a terminal device (e.g., a cell phone, a smart watch, etc.) communicatively connected to the acoustic apparatus 100 based on the ambient noise intensity range to remind the user to adjust the noise reduction mode.
It should be noted that the above description with respect to fig. 1 is provided for illustrative purposes only, and is not intended to limit the scope of the present application. Many variations and modifications will be apparent to those of ordinary skill in the art in light of the teachings herein. In some embodiments, one or more components (e.g., one or more sensors 140, signal transceivers 150, fixed structures 170, interaction modules, etc.) in the acoustic device 100 may be omitted. In some embodiments, one or more components of the acoustic device 100 may be replaced with other elements that perform similar functions. For example, the acoustic device 100 may not include the securing structure 170, and the casing structure 160 or a portion thereof may be a casing structure having a human ear-fitting shape (e.g., circular, oval, polygonal (regular or irregular), U-shaped, V-shaped, semi-circular) such that the casing structure may be hung near the ear of the user. In some embodiments, one component in the acoustic device 100 may be split into multiple sub-components, or multiple components may be combined into a single component. Such changes and modifications may be made without departing from the scope of the present application.
Fig. 2 is a block diagram of an exemplary processor 120, shown in accordance with some embodiments of the present application. As shown in fig. 2, the processor 120 may include an analog-to-digital conversion unit 210, a noise estimation unit 220, a magnitude-phase compensation unit 230, and a digital-to-analog conversion unit 240.
In some embodiments, the analog-to-digital conversion unit 210 may be configured to convert a signal input by the microphone array 110 into a digital signal. Specifically, the microphone array 110 picks up the ambient noise and converts the picked-up ambient noise into an electrical signal to be transmitted to the processor 120. Upon receiving the electrical signal of the ambient noise transmitted by the microphone array 110, the analog-to-digital conversion unit 210 may convert the electrical signal into a digital signal. In some embodiments, the analog-to-digital conversion unit 210 may be electrically connected with the microphone array 110 and further electrically connected with other components of the processor 120 (e.g., the noise estimation unit 220). Further, the analog-to-digital conversion unit 210 may transfer the converted digital signal of the ambient noise to the noise estimation unit 220.
In some embodiments, the noise estimation unit 220 may be configured to estimate the ambient noise from the received digital signal of the ambient noise. For example, the noise estimation unit 220 may estimate a parameter related to the ambient noise at the target spatial position from the received digital signal of the ambient noise. For example only, the parameters may include a noise source (e.g., location, orientation of the noise source), a direction of propagation, an amplitude, a phase, etc., or any combination thereof, of the noise at the target spatial location. In some embodiments, the noise estimation unit 220 may also be configured to estimate the sound field at the target spatial location using the microphone array 110. For more discussion on estimating the sound field of the target spatial location, reference may be made elsewhere in this application, e.g., to FIG. 4 and its corresponding description. In some embodiments, the noise estimation unit 220 may be electrically connected with other components of the processor 120 (e.g., the amplitude-phase compensation unit 230). Further, the noise estimation unit 220 may transfer the estimated ambient noise-related parameter and the sound field of the target spatial position to the amplitude-phase compensation unit 230.
In some embodiments, the amplitude and phase compensation unit 230 may be configured to compensate the estimated ambient noise related parameter according to the sound field of the target spatial position. For example, the amplitude and phase compensation unit 230 may compensate the amplitude and phase of the ambient noise according to the sound field of the target spatial position to obtain the digital noise reduction signal. In some embodiments, the amplitude and phase compensation unit 230 may adjust the amplitude of the ambient noise and inversely compensate the phase of the ambient noise to obtain the digital noise reduction signal. The magnitude of the digital noise reduction signal may be approximately equal to the magnitude of the digital signal corresponding to the ambient noise, and the phase of the digital noise reduction signal may be approximately opposite to the phase of the digital signal corresponding to the ambient noise. In some embodiments, the amplitude and phase compensation unit 230 may be electrically connected with other components of the processor 120 (e.g., the digital-to-analog conversion unit 240). Further, the amplitude and phase compensation unit 230 may transfer the digital noise reduction signal to the digital-to-analog conversion unit 240.
In some embodiments, the digital-to-analog conversion unit 240 may be configured to convert the digital noise reduction signal to an analog signal to obtain a noise reduction signal (e.g., an electrical signal). For example only, the digital-to-analog conversion unit 240 may include Pulse Width Modulation (PMW). In some embodiments, digital to analog conversion unit 240 may be electrically connected with other components of processor 120 (e.g., speaker 130). Further, the digital-to-analog conversion unit 240 may transfer the noise reduction signal to the speaker 130.
In some embodiments, the processor 120 may include a signal amplification unit 250. The signal amplifying unit 250 may be configured to amplify an input signal. For example, the signal amplifying unit 250 may amplify a signal input by the microphone array 110. For example only, the signal amplification unit 250 may be used to amplify the sound of the user speaking input by the microphone array 110 when the acoustic device 100 is in a talk state. For another example, the signal amplification unit 250 may amplify the magnitude of the ambient noise according to the sound field of the target spatial position. In some embodiments, the signal amplification unit 250 may be electrically connected with other components of the processor 120 (e.g., the microphone array 110, the noise estimation unit 220, the amplitude and phase compensation unit 230).
It should be noted that the above description with respect to fig. 2 is provided for illustrative purposes only, and is not intended to limit the scope of the present application. Many variations and modifications will be apparent to those of ordinary skill in the art in light of the teachings herein. In some embodiments, one or more components (e.g., signal amplification unit 250) in processor 120 may be omitted. In some embodiments, one component in processor 120 may be split into multiple subcomponents or multiple components may be combined into a single component. For example, the noise estimation unit 220 and the amplitude and phase compensation unit 230 may be integrated into one component for implementing the functions of the noise estimation unit 220 and the amplitude and phase compensation unit 230. Such changes and modifications may be made without departing from the scope of the present application.
Fig. 3 is an exemplary noise reduction flow diagram of an acoustic device shown in accordance with some embodiments of the present application. In some embodiments, the process 300 may be performed by the acoustic device 100. As shown in fig. 3, the process 300 may include:
in step 310, ambient noise is picked up. In some embodiments, this step may be performed by the microphone array 110.
According to the related description in fig. 1, the environmental noise may refer to a combination of various external sounds (e.g., traffic noise, industrial noise, construction noise, social noise) in the environment where the user is located. In some embodiments, the microphone array 110 may be located in close proximity to the ear canal of the user for picking up ambient noise delivered to the ear canal of the user. Further, the microphone array 110 may convert the picked-up ambient noise signals to electrical signals and pass to the processor 120 for processing.
In step 320, the noise at the target spatial location is estimated based on the picked-up ambient noise. In some embodiments, this step may be performed by processor 120.
In some embodiments, the processor 120 may perform signal separation on the picked-up ambient noise. In some embodiments, the ambient noise picked up by the microphone array 110 may include various sounds. The processor 120 may perform signal analysis on the ambient noise picked up by the microphone array 110 to separate the various sounds. Specifically, the processor 120 may adaptively adjust parameters of the filter according to statistical distribution characteristics and structural features of various sounds in different dimensions such as space, time domain, frequency domain, and the like, estimate parameter information of each sound signal in the environmental noise, and complete a signal separation process according to the parameter information of each sound signal. In some embodiments, the statistical distribution characteristics of the noise may include probability distribution density, power spectral density, autocorrelation function, probability density function, variance, mathematical expectation, and the like. In some embodiments, the structured features of the noise may include a noise distribution, a noise intensity, a global noise intensity, a noise rate, or the like, or any combination thereof. The global noise strength may refer to an average noise strength or a weighted average noise strength. The noise rate may refer to a degree of dispersion of the noise distribution. For example only, the ambient noise picked up by the microphone array 110 may include a first signal, a second signal, and a third signal. The processor 120 obtains differences of the first signal, the second signal, and the third signal in space (e.g., where the signals are located), time domain (e.g., delay), and frequency domain (e.g., amplitude and phase), and separates the first signal, the second signal, and the third signal according to the differences in three dimensions to obtain relatively pure first signal, second signal, and third signal. Further, the processor 120 may update the ambient noise according to parameter information (e.g., frequency information, phase information, amplitude information) of the separated signals. For example, the processor 120 may determine that the first signal is a call sound of the user according to the parameter information of the first signal, and remove the first signal from the ambient noise to update the ambient noise. In some embodiments, the removed first signal may be transmitted to the far end of the call. For example, when the user wears the acoustic device 100 for a voice call, the first signal may be transmitted to the far end of the call.
The target spatial location is a location at or near the user's ear canal determined based on the microphone array 110. According to the relevant description in fig. 1, the target spatial position may refer to a spatial position close to the ear canal (e.g. ear hole) of the user by a certain distance (e.g. 0.5cm, 1cm, 2cm, 3 cm). In some embodiments, the target spatial location is closer to the user's ear canal than any of the microphones of the microphone array 110. As described in relation to fig. 1, the target spatial position is related to the number of microphones in the microphone array 110 and the distribution position relative to the ear canal of the user, and the target spatial position can be adjusted by adjusting the number of microphones in the microphone array 110 and/or the distribution position relative to the ear canal of the user. In some embodiments, estimating the noise at the target spatial location based on the picked-up ambient noise (or the updated ambient noise) may further include determining one or more spatial noise sources related to the picked-up ambient noise, estimating the noise at the target spatial location based on the spatial noise sources. The ambient noise picked up by the microphone array 110 may be from different azimuths, different kinds of spatial noise sources. The parametric information (e.g., frequency information, phase information, amplitude information) for each spatial noise source is different. In some embodiments, the processor 120 may perform signal separation and extraction on the noise at the target spatial position according to the statistical distribution and the structural features of different types of noise in different dimensions (e.g., spatial domain, time domain, frequency domain, etc.), so as to obtain different types of noise (e.g., different frequencies, different phases, etc.), and estimate parameter information (e.g., amplitude information, phase information, etc.) corresponding to each type of noise. In some embodiments, the processor 120 may further determine the overall parameter information of the noise at the target spatial location according to the parameter information corresponding to different types of noise at the target spatial location. Reference may be made elsewhere in this specification, for example, to fig. 7-8 and their corresponding descriptions, for more on estimating the noise of a target spatial location based on one or more spatial noise sources.
In some embodiments, estimating the noise of the target spatial location based on the picked-up ambient noise (or the updated ambient noise) may further include constructing a virtual microphone based on the microphone array 110 and estimating the noise of the target spatial location based on the virtual microphone. For more on estimating the noise of the target spatial location based on the virtual microphones, reference may be made elsewhere in this specification, such as fig. 9-10 and their corresponding descriptions.
In step 330, a noise reduction signal is generated based on the noise at the target spatial location. In some embodiments, this step may be performed by processor 120.
In some embodiments, processor 120 may generate a noise reduction signal based on the parameter information (e.g., amplitude information, phase information, etc.) of the noise at the target spatial location obtained in step 320. In some embodiments, a phase difference between a phase of the noise reduction signal and a phase of the noise at the target spatial location may be less than or equal to a preset phase threshold. The preset phase threshold may be in the range of 90-180 degrees. The preset phase threshold may be adjusted within this range according to the needs of the user. For example, when the user does not wish to be disturbed by the sound of the surrounding environment, the preset phase threshold may be a large value, for example, 180 degrees, i.e., the phase of the noise reduction signal is opposite to the phase of the noise of the target spatial position. For another example, the preset phase threshold may be a small value, such as 90 degrees, when the user wishes to remain sensitive to the surrounding environment. It should be noted that the more ambient sounds the user wishes to receive, the closer to 90 degrees the preset phase threshold may be, and the less ambient sounds the user wishes to receive, the closer to 180 degrees the preset phase threshold may be. In some embodiments, when the phase of the noise reduction signal is constant (e.g., opposite to the phase of the noise at the target spatial location), the difference between the amplitude of the noise at the target spatial location and the amplitude of the noise reduction signal may be less than or equal to a preset amplitude threshold. For example, when the user does not wish to be disturbed by the sound of the surrounding environment, the preset amplitude threshold may be a small value, for example 0dB, i.e. the amplitude of the noise reduction signal is equal to the amplitude of the noise at the target spatial position. As another example, when the user wishes to remain sensitive to the surrounding environment, the preset magnitude threshold may be a large value, such as a magnitude approximately equal to the noise at the target spatial location. It is noted that the more ambient sounds the user wishes to receive, the closer the preset amplitude threshold may be to the amplitude of the noise at the target spatial location, and the less ambient sounds the user wishes to receive, the closer the preset amplitude threshold may be to 0dB.
In some embodiments, speaker 130 may output the target signal based on the noise reduction signal generated by processor 120. For example, speaker 130 may convert the noise reduction signal (e.g., an electrical signal) into a target signal (i.e., a vibration signal) based on the vibration components in speaker 130, which may cancel ambient noise. In some embodiments, when the noise at the target spatial location is a plurality of spatial noise sources, speaker 130 may output target signals corresponding to the plurality of spatial noise sources based on the noise reduction signal. For example, the plurality of spatial noise sources includes a first spatial noise source and a second spatial noise source, and the speaker 130 may output a first target signal having approximately the same phase and amplitude as the noise of the first spatial noise source to cancel the noise of the first spatial noise source and a second target signal having approximately the same phase and amplitude as the noise of the second spatial noise source to cancel the noise of the second spatial noise source. In some embodiments, when speaker 130 is an air conduction speaker, the location at which the target signal cancels out the ambient noise may be the target spatial location. The distance between the target space position and the ear canal of the user is small, and the noise at the target space position can be approximately regarded as the noise at the ear canal of the user, so that the noise reduction signal and the noise at the target space position cancel each other, and the ambient noise transmitted to the ear canal of the user can be approximately eliminated, thereby realizing the active noise reduction of the acoustic device 100. In some embodiments, when speaker 130 is a bone conduction speaker, the location where the target signal and the ambient noise cancel may be the basement membrane. The target signal and the ambient noise are cancelled out at the base film of the user, thereby achieving active noise reduction of the acoustic device 100.
It should be noted that the above description related to the flow 300 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 300 will be apparent to those skilled in the art in light of this disclosure. For example, steps in flow 300 may also be added, omitted, or combined. For another example, the environmental noise may be subjected to signal processing (e.g., filtering processing). Such modifications and variations are intended to be within the scope of the present application.
Fig. 4 is an exemplary noise reduction flow diagram of an acoustic device shown in accordance with some embodiments of the present application. In some embodiments, the flow 400 may be performed by the acoustic device 100. As shown in fig. 4, the process 400 may include:
in step 410, ambient noise is picked up. In some embodiments, this step may be performed by the microphone array 110. In some embodiments, step 410 may be performed in a similar manner as step 310, and the relevant description is not repeated here.
In step 420, the noise at the target spatial location is estimated based on the picked-up ambient noise. In some embodiments, this step may be performed by processor 120. In some embodiments, step 420 may be performed in a similar manner as step 320, and the relevant description is not repeated here.
In step 430, the sound field at the target spatial location is estimated. In some embodiments, this step may be performed by processor 120.
In some embodiments, the processor 120 may estimate the sound field at the target spatial location using the microphone array 110. In particular, the processor 120 may construct a virtual microphone based on the microphone array 110 and estimate a sound field at the target spatial location based on the virtual microphone. For more on the estimation of the sound field based on the virtual microphones for the target spatial location, reference may be made elsewhere in this specification, e.g., fig. 9-10 and their corresponding descriptions.
In step 440, a noise reduction signal is generated based on the noise at the target spatial location and the sound field estimate at the target spatial location. In some embodiments, step 440 may be performed by processor 120.
In some embodiments, the processor 120 may adjust the parameter information (e.g., frequency information, amplitude information, and phase information) of the noise at the target spatial location according to the sound field related physical quantity (e.g., sound pressure, sound frequency, sound amplitude, sound phase, sound source vibration speed, or medium (e.g., air) density, etc.) at the target spatial location obtained in step 430 to generate the noise reduction signal. For example, the processor 120 may determine whether the physical quantity (e.g., sound frequency, sound amplitude, sound phase) associated with the sound field is the same as the parameter information of the noise at the target spatial location. The processor 120 may not adjust the parameter information of the noise at the target spatial position if the physical quantity related to the sound field is the same as the parameter information of the noise at the target spatial position. If the physical quantity associated with the sound field is not identical to the parameter information of the noise at the target spatial position, the processor 120 may determine a difference value of the physical quantity associated with the sound field and the parameter information of the noise at the target spatial position and adjust the parameter information of the noise at the target spatial position based on the difference value. For example only, when the difference is greater than a certain range, the processor 120 may take an average value of the sound-field-related physical quantity and the parameter information of the noise at the target spatial position as the parameter information of the noise at the adjusted target spatial position and generate the noise reduction signal based on the parameter information of the noise at the adjusted target spatial position. For another example, since the noise in the environment is constantly changing, when the processor 120 generates the noise reduction signal, the noise at the target spatial position in the actual environment may have been slightly changed, and therefore, the processor 120 may estimate the amount of change in the parameter information of the environmental noise at the target spatial position from the time information and the current time information of the environmental noise picked up by the microphone array and the sound-field-related physical quantities (e.g., sound source vibration speed, medium (e.g., air) density) at the target spatial position, and adjust the parameter information of the noise at the target spatial position based on the amount of change. Through the adjustment, the amplitude information and the frequency information of the noise reduction signal are more consistent with the amplitude information and the frequency information of the environmental noise at the current target space position, and the phase information of the noise reduction signal is more consistent with the anti-phase information of the environmental noise at the current target space position, so that the noise reduction signal can more accurately eliminate the environmental noise, and the noise reduction effect and the auditory experience of a user are improved.
In some embodiments, when the position of the acoustic device 100 changes, for example, the head of the user wearing the acoustic device 100 rotates, the environmental noise (e.g., the direction, amplitude, and phase) changes, and the speed of the acoustic device 100 performing noise reduction cannot keep up with the speed of the change of the environmental noise, resulting in the active noise reduction function failing or even increasing the noise. To this end, the processor 120 may update the noise of the target spatial location and the sound field estimate of the target spatial location based on motion information (e.g., motion trajectory, motion direction, motion velocity, motion acceleration, motion angular velocity, motion-related temporal information) of the acoustic device 100 acquired by the one or more sensors 140 of the acoustic device 100. Further, based on the updated noise at the target spatial location and the sound field estimate at the target spatial location, the processor 120 may generate a noise reduction signal. The one or more sensors 140 may record the motion information of the acoustic device 100, and the processor 120 may perform a fast update on the noise reduction signal, which may improve the noise tracking performance of the acoustic device 100, so that the noise reduction signal may eliminate the environmental noise more accurately, and further improve the noise reduction effect and the hearing experience of the user.
In some embodiments, the processor 120 may divide the picked-up ambient noise into a plurality of frequency bands. The plurality of frequency bands correspond to different frequency ranges. For example, the processor 120 may divide the picked-up ambient noise into four frequency bands of 100-300Hz, 300-500Hz, 500-800Hz, 800-1500 Hz. In some embodiments, each frequency band includes parametric information (e.g., frequency information, amplitude information, phase information) for the ambient noise for the corresponding frequency range. For at least one of the plurality of frequency bands, processor 120 may perform steps 420-440 thereon to generate a noise reduction signal corresponding to each of the at least one frequency band. For example, processor 120 may perform steps 420-440 for frequency bands 300-500Hz and 500-800Hz of the four frequency bands to generate noise reduction signals corresponding to frequency bands 300-500Hz and 500-800Hz, respectively. Further, in some embodiments, speaker 130 may output target signals corresponding to respective frequency bands based on the noise reduction signals corresponding to the respective frequency bands. For example, speaker 130 may output a target signal that is approximately opposite in phase and equal in magnitude to noise in frequency band 300-500Hz to cancel noise in frequency band 300-500Hz and approximately opposite in phase and equal in magnitude to noise in frequency band 500-800Hz to cancel noise in frequency band 500-800 Hz.
In some embodiments, the processor 120 may also update the noise reduction signal according to a manual input by a user. For example, when the user wears the acoustic device 100 to play music in a noisy external environment, the user's own hearing experience effect is not ideal, and the user may manually adjust the parameter information (e.g., frequency information, phase information, and amplitude information) of the noise reduction signal according to the hearing effect of the user. For another example, when a special user (e.g., a hearing-impaired user or an elderly user) uses the acoustic device 100, the hearing ability of the special user is different from that of a general user, and the noise reduction signal generated by the acoustic device 100 cannot meet the requirement of the special user, which results in poor hearing experience of the special user. Under the condition, the adjustment times of the parameter information of some noise reduction signals can be preset, and special users can adjust the noise reduction signals according to the self auditory effect and the preset adjustment times of the parameter information of the noise reduction signals, so that the noise reduction signals are updated to improve the auditory experience of the special users. In some embodiments, the user may manually adjust the noise reduction signal through a key on the acoustic device 100. In other embodiments, the user may adjust the noise reduction signal via the terminal device. Specifically, the acoustic device 100 or an external device (e.g., a mobile phone, a tablet computer, a computer) communicatively connected to the acoustic device 100 may display parameter information of the noise reduction signal suggested to the user, and the user may perform fine adjustment of the parameter information according to the hearing experience of the user.
It should be noted that the above description related to the flow 400 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 400 may occur to those skilled in the art in light of the teachings herein. For example, steps in flow 400 may also be added, omitted, or combined. Such modifications and variations are intended to be within the scope of the present application.
Fig. 5A-D are schematic diagrams of exemplary arrangements of microphone arrays, such as microphone array 110, according to some embodiments of the present application. In some embodiments, the microphone array arrangement may be a regular geometric shape. As shown in fig. 5A, the microphone array may be a linear array. In some embodiments, the microphone array arrangement may have other shapes. For example, as shown in fig. 5B, the microphone array may be a cross-shaped array. For another example, as shown in fig. 5C, the microphone array may be a circular array. In some embodiments, the microphone array arrangement may also be of irregular geometry. For example, as shown in fig. 5D, the microphone array may be an irregular array. It should be noted that the arrangement of the microphone array is not limited to the linear array, the cross-shaped array, the circular array, and the irregular array shown in fig. 5A-D, but may also be an array with other shapes, such as a triangular array, a spiral array, a planar array, a stereo array, a radiation-type array, and the like, which is not limited in this application.
In some embodiments, each of the short solid lines in fig. 5A-D may be considered a microphone or a group of microphones. When each short solid line is considered as a group of microphones, the number of microphones of each group may be the same or different, the kind of microphones of each group may be the same or different, and the orientation of microphones of each group may be the same or different. The type, number and orientation of the microphones may be adaptively adjusted according to actual application conditions, which is not limited in the present application.
In some embodiments, there may be a uniform distribution among the microphones in the microphone array. The uniform distribution here may mean that the distances between any two adjacent microphones in the microphone array are the same. In some embodiments, the microphones in the microphone array may also be non-uniformly distributed. The non-uniform distribution here may mean that the spacing between any two adjacent microphones in the microphone array is different. The distance between the microphones in the microphone array can be adaptively adjusted according to actual conditions, which is not limited in the present application.
Fig. 6A-B are schematic diagrams of exemplary arrangements of microphone arrays, such as microphone array 110, according to some embodiments of the present application. When the user wears the acoustic device having the microphone array, as shown in fig. 6A, the microphone array is disposed at or around the human ear in an arrangement of a semicircular arrangement, and as shown in fig. 6B, the microphone array is disposed at the human ear in an arrangement of a linear arrangement. It should be noted that the arrangement of the microphone array is not limited to the semicircular and linear shapes shown in fig. 6A and 6B, and the arrangement position of the microphone array is not limited to the positions shown in fig. 6A and 6B, and the semicircular and linear shapes and the arrangement position of the microphone array are provided for illustrative purposes only.
FIG. 7 is an exemplary flow chart illustrating estimating noise at a target spatial location according to some embodiments of the present application. As shown in fig. 7, flow 700 may include:
in step 710, one or more sources of spatial noise related to the ambient noise picked up by the microphone array are determined. In some embodiments, this step may be performed by processor 120. As described herein, determining a spatial noise source refers to determining spatial noise source-related information, such as the location of the spatial noise source (including its orientation, its distance from a target spatial location, etc.), its phase, and its magnitude, etc.
In some embodiments, a spatial noise source related to ambient noise refers to a noise source whose sound waves may pass at or near the user's ear canal (e.g., a target spatial location). In some embodiments, the spatial noise sources may be noise sources in different directions (e.g., front, back, etc.) of the user's body. For example, there is crowd noise in front of the user's body and vehicle whistle noise to the left of the user's body, and in this case, the spatial noise sources include crowd noise sources in front of the user's body and vehicle whistle noise sources to the left of the user's body. In some embodiments, a microphone array (e.g., the microphone array 110) may pick up spatial noise in each direction of the body of the user and convert the spatial noise into an electrical signal to be transmitted to the processor 120, and the processor 120 may analyze the electrical signal corresponding to the spatial noise to obtain parameter information (e.g., frequency information, amplitude information, phase information, etc.) of the picked-up spatial noise in each direction. The processor 120 determines information of the spatial noise sources in the respective directions, for example, the orientations of the spatial noise sources, the distances of the spatial noise sources, the phases of the spatial noise sources, the magnitudes of the spatial noise sources, and the like, according to the parameter information of the spatial noise in the respective directions. In some embodiments, the processor 120 may determine the spatial noise source through a noise localization algorithm based on spatial noise picked up by the microphone array (e.g., the microphone array 110). The noise localization algorithm may include one or more of a beamforming algorithm, a super-resolution spatial spectrum estimation algorithm, a time difference of arrival algorithm (which may also be referred to as a delay estimation algorithm), and the like. The beam forming algorithm is a sound source localization method based on controllable beam forming of maximum output power. For example only, the beamforming algorithm may include a controlled Response Power-Phase Transform (SPR-PHAT) algorithm, a delay-and-sum beamforming (delay-and-sum beamforming) algorithm, a differential microphone algorithm, a Sidelobe cancellation (GSC) algorithm, a Minimum Variance distortion free Response (MVDR) algorithm, and so on. The super-resolution spatial spectrum estimation algorithm may include an autoregressive AR model, a minimum variance spectrum estimation (MV), a eigenvalue decomposition method (e.g., a Multiple Signal Classification (MUSIC) algorithm), and the like, which may calculate a correlation matrix of a spatial spectrum by acquiring sound signals (e.g., spatial noise) picked up by a microphone array and effectively estimate a direction of a spatial noise source. The Time Difference Of Arrival algorithm may first perform sound Time Difference Of Arrival estimation, obtain a Time Difference Of Arrival (TDOA) between microphones in the microphone array, and further locate the position Of the spatial noise source by using the obtained sound Time Difference Of Arrival in combination with the known spatial position Of the microphone array.
For example, the time delay estimation algorithm may determine the location of the noise source through geometric relationships by calculating the time difference of the ambient noise signal delivered to different microphones in the microphone array. As another example, the SPR-PHAT algorithm may be formed by beaming in the direction of each noise source, and the direction in which the beam energy is strongest may be approximated as the direction of the noise source. For another example, the MUSIC algorithm may be a subspace of the ambient noise signals obtained by performing eigenvalue decomposition on a covariance matrix of the ambient noise signals picked up by the microphone array, thereby separating the direction of the ambient noise. For more on determining the noise source, reference may be made elsewhere in this specification, for example, fig. 8 and its corresponding description.
In some embodiments, a spatial super-resolution image of the environmental noise may be formed by a synthetic aperture method, a sparse recovery method, a mutual prime array method, and the like, and the spatial super-resolution image may be used to reflect a signal reflection map of the environmental noise, so as to further improve the positioning accuracy of the spatial noise source.
In some embodiments, the processor 120 may divide the picked-up ambient noise into a plurality of frequency bands according to a specific frequency bandwidth (e.g., one frequency band per 500 Hz), each frequency band may correspond to a different frequency range, respectively, and determine a spatial noise source corresponding to the frequency band on at least one frequency band. For example, the processor 120 may perform signal analysis on the frequency bands divided by the environmental noise, obtain parameter information of the environmental noise corresponding to each frequency band, and determine the spatial noise source corresponding to each frequency band according to the parameter information. As another example, processor 120 may determine the spatial noise sources corresponding to each frequency band by a noise localization algorithm.
In step 720, noise at the target spatial location is estimated based on the spatial noise sources. In some embodiments, this step may be performed by processor 120. As described herein, estimating noise at a target spatial location refers to estimating parametric information of the noise at the target spatial location, e.g., frequency information, amplitude information, phase information, and the like.
In some embodiments, the processor 120 may estimate the parameter information of the noise respectively transmitted by the spatial noise sources to the target spatial position based on the parameter information (e.g., frequency information, amplitude information, phase information, etc.) of the spatial noise sources located in the directions of the user's body obtained in step 710, so as to estimate the noise at the target spatial position. For example, if there is a spatial noise source in a first orientation (e.g., front) and a second orientation (e.g., back) of the user's body, the processor 120 may estimate the frequency information, phase information, or amplitude information of the spatial noise source in the first orientation when the noise of the spatial noise source in the first orientation is transmitted to the target spatial location based on the position information, frequency information, phase information, or amplitude information of the spatial noise source in the first orientation. The processor 120 can estimate frequency information, phase information, or amplitude information of the second azimuth spatial noise source when the noise of the second azimuth spatial noise source is delivered to the target spatial location based on the position information, frequency information, phase information, or amplitude information of the second azimuth spatial noise source. Further, the processor 120 may estimate the noise information of the target spatial position based on the frequency information, the phase information, or the amplitude information of the first and second azimuth spatial noise sources, thereby estimating the noise information of the noise of the target spatial position. For example only, the processor 120 may estimate noise information for the target spatial location using virtual microphone techniques or other methods. In some embodiments, the processor 120 may extract parameter information of noise of the spatial noise source from the frequency response curve of the spatial noise source picked up by the microphone array by a method of feature extraction. In some embodiments, the method for extracting the parameter information of the noise of the spatial noise source may include, but is not limited to, principal Component Analysis (PCA), independent Component Analysis (ICA), linear Discriminant Analysis (LDA), singular Value Decomposition (SVD), and the like.
It should be noted that the above description related to the flow 700 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 700 may occur to those skilled in the art upon review of the present application. For example, the process 700 may further include the steps of locating the spatial noise source, extracting the parameter information of the noise of the spatial noise source, and so on. Also for example, step 710 and step 720 may be combined into one step. Such modifications and variations are intended to be within the scope of the present application.
FIG. 8 is a schematic illustration of noise estimating a spatial location of a target according to some embodiments of the present application. The following description will take the time difference of arrival algorithm as an example to illustrate how the localization of the spatial noise source is achieved. As shown in fig. 8, a processor (e.g., processor 120) may calculate time differences between noise signals generated by noise sources (e.g., 811, 812, 813) delivered to different microphones (e.g., microphone 821, microphone 822, etc.) in microphone array 820, and then determine the location of the noise sources by the positional relationship (e.g., distance, relative orientation) of microphone array 820 and the noise sources in conjunction with the known spatial location of microphone array 820.
Having obtained the locations of the noise sources (e.g., 811, 812, 813), the processor can estimate the phase delay and amplitude variation of the noise signals emitted by the noise sources as they pass from the noise sources to the target spatial location 830 based on the locations of the noise sources. Based on the phase delay, amplitude variation, and parameter information (e.g., frequency information, amplitude information, phase information, etc.) of the noise signal emitted by the spatial noise source, the processor may obtain the parameter information (e.g., frequency information, amplitude information, phase information, etc.) when the environmental noise is delivered to the target spatial location 830, thereby estimating the noise at the target spatial location.
It should be noted that the noise sources 811, 812, and 813, the microphone array 820, and the microphones 821 and 822 and the target spatial location 830 in the microphone array 820 described in fig. 8 are only for illustration and explanation, and do not limit the application scope of the present application. Various modifications and alterations will occur to those skilled in the art in light of the present application. For example, the microphones in the microphone array 820 are not limited to the microphones 821 and 822, and the microphone array 820 may include more microphones, and the like. Such modifications and variations are intended to be within the scope of the present application.
FIG. 9 is an exemplary flow diagram illustrating estimating noise and sound field at a target spatial location according to some embodiments of the present application. As shown in fig. 9, the process 900 may include:
in step 910, a virtual microphone is constructed based on the microphone array (e.g., microphone array 110, microphone array 820). In some embodiments, this step may be performed by processor 120.
In some embodiments, virtual microphones may be used to represent or simulate audio data captured by a microphone if the microphone is positioned at the target spatial location. That is, the audio data obtained by the virtual microphone may be approximate or equivalent to the audio data collected by the physical microphone if the physical microphone is placed at the target spatial location.
In some embodiments, the virtual microphone may include a mathematical model. The mathematical model may embody a relationship between noise or sound field estimates at the target spatial location and parametric information (e.g., frequency information, amplitude information, phase information, etc.) of the ambient noise picked up by the microphone array and parameters of the microphone array. The parameters of the microphone array may include one or more of an arrangement of the microphone array, a distance between each microphone, a number and a position of the microphones in the microphone array, and the like. The mathematical model may be computationally derived based on the initial mathematical model and parameters of the microphone array and parameter information (e.g., frequency information, amplitude information, phase information, etc.) of the sound (e.g., ambient noise) picked up by the microphone array. For example, the initial mathematical model may include parameters corresponding to parameters of the microphone array and parametric information of the ambient noise picked up by the microphone array, as well as model parameters. And substituting the parameters of the microphone array and the parameter information of the sound picked up by the microphone array and the initial values of the model parameters into the initial mathematical model to obtain the predicted noise or sound field of the target space position. This predicted noise or sound field is then compared to data obtained by physical microphones (noise and sound field estimates) placed at the target spatial locations to adjust the model parameters of the mathematical model. Based on the above-described adjustment method, the mathematical model is obtained by adjusting a large amount of data (for example, parameters of the microphone array and parameter information of the ambient noise picked up by the microphone array) a plurality of times.
In some embodiments, the virtual microphone may include a machine learning model. The machine learning model may be obtained by training based on parameters of the microphone array and parameter information (e.g., frequency information, amplitude information, phase information, etc.) of sounds (e.g., ambient noise) picked up by the microphone array. For example, the initial machine learning model (e.g., neural network model) is trained using parameters of the microphone array and parameter information of sounds picked up by the microphone array as training samples to obtain the machine learning model. Specifically, parameters of the microphone array and parameter information of sounds picked up by the microphone array may be input to the initial machine learning model, and prediction results (e.g., noise and sound field estimation of the target spatial location) may be obtained. This prediction is then compared to data obtained by physical microphones set at the target spatial location (noise and sound field estimates) to adjust the parameters of the initial machine learning model. Based on the above adjustment method, through a large amount of data (for example, parameters of the microphone array and parameter information of the environmental noise picked up by the microphone array), through a plurality of iterations, parameters of the initial machine learning model are optimized until a prediction result of the initial machine learning model is the same as or approximately the same as data obtained by a physical microphone disposed at the target spatial position, and a machine learning model is obtained.
Virtual microphone technology may move physical microphones away from locations where it is difficult to place the microphones (e.g., target spatial locations). For example, a physical microphone cannot be placed at a location (e.g., a target spatial location) of a user's ear canal for the purpose of opening the user's ears without blocking the user's ear canal. At this time, the microphone array may be disposed near the user's ear without blocking the ear canal, for example, the user's pinna or the like, by the virtual microphone technology, and then the virtual microphone at the position of the user's ear canal is constructed by the microphone array. The virtual microphone may utilize a physical microphone (i.e., an array of microphones) in a first location to predict sound data (e.g., amplitude, phase, sound pressure, sound field, etc.) in a second location (e.g., a target spatial location). In some embodiments, the sound data for the second location predicted by the virtual microphone (which may also be referred to as a particular location, e.g., a target spatial location) may be adjusted according to a distance between the virtual microphone and the physical microphone (i.e., the microphone array), a type of the virtual microphone (e.g., a mathematical model virtual microphone, a machine learning virtual microphone), and so forth. For example, the closer the distance between the virtual microphone and the physical microphone (i.e., the microphone array), the more accurate the sound data of the second location predicted by the virtual microphone. For another example, in some application-specific scenarios, the machine-learned virtual microphone predicts more accurate sound data for the second location than the mathematical model virtual microphone. In some embodiments, the location corresponding to the virtual microphone (i.e., the second location, e.g., the target spatial location) may be in the vicinity of the microphone array or may be remote from the microphone array.
In step 920, noise and sound field at the target spatial location are estimated based on the virtual microphones. In some embodiments, this step may be performed by processor 120.
In some embodiments, when the virtual microphone is a mathematical model, the processor 120 may input parameter information (e.g., frequency information, amplitude information, phase information, etc.) of ambient noise picked up by the microphone array and parameters of the microphone array (e.g., arrangement of the microphone array, spacing between microphones, number of microphones in the microphone array) as parameters of the mathematical model into the mathematical model in real time to estimate noise and sound field at the target spatial location.
In some embodiments, when the virtual microphone is a machine learning model, the processor 120 may input parameter information (e.g., frequency information, amplitude information, phase information, etc.) of ambient noise picked up by the microphone array and parameters of the microphone array (e.g., arrangement of the microphone array, spacing between each microphone, number of microphones in the microphone array) into the machine learning model in real time and estimate noise and sound field at the target spatial location based on the output of the machine learning model.
It should be noted that the above description related to the flow 900 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 900 may occur to those skilled in the art upon review of the present application. For example, step 920 may be divided into two steps to estimate the noise and sound field of the target spatial location, respectively. Such modifications and variations are intended to be within the scope of the present application.
Fig. 10 is a schematic diagram of constructing a virtual microphone according to some embodiments of the present application. As shown in fig. 10, the target spatial location 1010 may be located near the ear canal of the user. For the purpose of opening the ears of the user and not blocking the ear canal, the target spatial position 1010 cannot be provided with a physical microphone, so that the noise and sound field of the target spatial position 1010 cannot be directly estimated by the physical microphone.
To estimate the noise and sound field at the target spatial location 1010, a microphone array 1020 may be positioned in the vicinity of the target spatial location 1010. For example only, as shown in fig. 10, the microphone array 1020 may include a first microphone 1021, a second microphone 1022, and a third microphone 1023. Each microphone (e.g., the first microphone 1021, the second microphone 1022, and the third microphone 1023) of the microphone array 1020 may pick up ambient noise in the space where the user is located. The processor 120 may construct the virtual microphone based on parameter information (e.g., frequency information, amplitude information, phase information, etc.) of the ambient noise picked up by each of the microphones of the microphone array 1020 and parameters of the microphone array 1020 (e.g., the arrangement of the microphone array 1020, the spacing between each of the microphones, the number of microphones of the microphone array 1020). Further, based on the virtual microphone, processor 120 may estimate the noise and sound field at target spatial location 1010.
It should be noted that the target spatial location 1010 and the microphone array 1020 as well as the first microphone 1021, the second microphone 1022, and the third microphone 1023 in the microphone array 1020 are depicted in fig. 10 for illustration and explanation, and do not limit the application scope of the present application. Various modifications and alterations will occur to those skilled in the art in light of the present application. For example, the microphones in the microphone array 1020 are not limited to the first microphone 1021, the second microphone 1022, and the third microphone 1023, and the microphone array 1020 may further include more microphones, and the like. Such modifications and variations are intended to be within the scope of the present application.
In some embodiments, the microphone arrays (e.g., microphone array 110, microphone array 820, microphone array 1020) may pick up interfering signals (e.g., target signals and other sound signals) emitted by the speakers at the same time as ambient noise. In order to avoid that the microphone array picks up interfering signals emanating from the loudspeakers, the microphone array may be located at a distance from the loudspeakers. However, when placed far from the speakers, the microphone array may not be able to accurately estimate the sound field and/or noise at the target spatial location because it is too far from the target spatial location. To solve the above problem, a microphone array may be disposed at a target area to minimize interference signals from speakers to the microphone array.
In some embodiments, the target area may be a sound pressure level minimum area of the speaker. The region of minimum sound pressure level may be a region where the sound radiated by the speaker is small. In some embodiments, the speaker may form at least one set of acoustic dipoles. For example, a set of sound signals with approximately opposite phases and approximately the same amplitude output by the front surface and the back surface of the loudspeaker diaphragm can be regarded as two point sound sources. The two point sources may constitute acoustic dipoles or similar acoustic dipoles, the sound radiating outwards having a pronounced directivity. Ideally, in the direction of the straight line where the two point sound sources are connected, the sound radiated by the speakers is relatively large, the sound radiated in the other directions is significantly reduced, and the sound radiated by the speakers is minimal in the area of the perpendicular bisector (near the perpendicular bisector) of the connection line between the two point sound sources.
In some embodiments, the speaker (e.g., speaker 130) in an acoustic device (e.g., acoustic device 100) may be a bone conduction speaker. When the speaker is a bone conduction speaker and the interfering signal is a leakage sound signal of the bone conduction speaker, the target region may be a sound pressure level minimum region of the leakage sound signal of the bone conduction speaker. The region of minimum sound pressure level of the leakage sound signal may refer to a region of minimum leakage sound signal radiated by the bone conduction speaker. The microphone array is arranged in the sound pressure level minimum area of the sound leakage signal of the bone conduction loudspeaker, so that interference signals of the bone conduction loudspeaker picked up by the microphone array can be reduced, and the problem that the sound field of the target space position cannot be accurately estimated due to the fact that the microphone array is too far away from the target space position can be effectively solved.
Fig. 11 is a schematic diagram of a three-dimensional sound field leakage signal distribution of a bone conduction speaker at 1000Hz according to some embodiments of the present application. Fig. 12 is a schematic diagram of a two-dimensional sound field leakage signal distribution at 1000Hz for a bone conduction speaker according to some embodiments of the present application. As shown in fig. 11-12, the acoustic device 1100 may include a contact surface 1110. The contact face 1110 may be configured to contact the body (e.g., face, ears) of the user when the acoustic device 1100 is worn by the user. The bone conduction speaker may be disposed inside the acoustic device 1100. As shown in fig. 11, colors on the acoustic device 1100 may represent the leakage sound signals of the bone conduction speaker, and different color depths may represent different magnitudes of the leakage sound signals. The lighter the color is, the larger the leakage sound signal of the bone conduction loudspeaker is; the darker the color, the smaller the leakage signal representing the bone conduction speaker. As shown in fig. 11, the region 1120 in which the dotted line is located is darker in color and the leakage sound signal is smaller than other regions, so the region 1120 in which the dotted line is located may be a region with the minimum sound pressure level of the leakage sound signal of the bone conduction speaker. For example only, the microphone array may be disposed in the region 1120 (e.g., position 1) where the dashed line is located so that the received missing sound signal from the bone conduction speaker is small.
In some embodiments, the sound pressure of the region of minimum sound pressure level of the bone conduction speaker may be reduced by 5-30dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the bone conduction speaker in the region of minimum sound pressure level may be reduced by 7-28dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the region of minimum sound pressure level of the bone conduction speaker may be reduced by 9-26dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the region of minimum sound pressure level of the bone conduction speaker may be reduced by 11-24dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the bone conduction speaker in the region of minimum sound pressure level may be reduced by 13-22dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the region of minimum sound pressure level of the bone conduction speaker may be reduced by 15-20dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the bone conduction speaker in the region of minimum sound pressure level may be reduced by 17-18dB from the maximum output sound pressure of the bone conduction speaker. In some embodiments, the sound pressure of the bone conduction speaker in the region of minimum sound pressure level may be reduced by 15dB from the maximum output sound pressure of the bone conduction speaker.
The two-dimensional sound field distribution shown in fig. 12 is a two-dimensional cross-sectional view of the three-dimensional sound field leakage signal distribution of fig. 11. As shown in fig. 12, the color on the cross section may indicate the leakage sound signal of the bone conduction speaker, and the different color depths may indicate the different sizes of the leakage sound signals. The lighter the color, the larger the leakage sound signal representing the bone conduction speaker, and the darker the color, the smaller the leakage sound signal representing the bone conduction speaker. As shown in fig. 12, the areas 1210 and 1220 where the dotted lines are located are darker in color and have smaller leakage sound signals than other areas. Thus, the areas 1210 and 1220 where the dotted lines are located may be areas where the sound pressure level of the leakage sound signal of the bone conduction speaker is minimum. For example only, the microphone array may be placed in the areas 1210 and 1220 (e.g., location a and location B) where the dashed lines are located so that there is less leakage signal received from the bone conduction speaker.
In some embodiments, the bone conduction speaker emits a larger vibration signal during vibration, so that not only the sound leakage signal of the bone conduction speaker will interfere with the microphone array, but also the vibration signal of the bone conduction speaker will interfere with the microphone array. The vibration signal of the bone conduction speaker may refer to the vibration of other parts (e.g. housing, microphone array) of the acoustic device driven by the vibration of the vibrating part of the bone conduction speaker. In this case, the interference signal of the bone conduction speaker may include a leakage sound signal and a vibration signal of the bone conduction speaker. In order to avoid that the microphone array picks up interfering signals of the bone conduction speaker, the target area where the microphone array is located may be an area where the total energy of the leakage sound signal and the vibration signal delivered to the bone conduction speaker of the microphone array is minimal. The leakage sound signal and the vibration signal of the bone conduction loudspeaker are relatively independent signals, and the area with the minimum sound pressure level of the leakage sound signal of the bone conduction loudspeaker cannot represent the area with the minimum total energy of the leakage sound signal and the vibration signal of the bone conduction loudspeaker. Therefore, the determination of the target area requires an analysis of the total signal of the vibration signal and the leakage sound signal of the bone conduction speaker.
Fig. 13 is a frequency response diagram of the sum of the vibration signal and the leakage sound signal of a bone conduction speaker according to some embodiments of the present application. Fig. 13 shows frequency response curves of the total signal of the vibration signal and the leakage sound signal of the bone conduction speaker at the position 1, the position 2, the position 3, and the position 4 on the acoustic device 1100 in fig. 11. As shown in fig. 13, the abscissa may represent the frequency, and the ordinate may represent the sound pressure of the total signal of the vibration signal and the leakage sound signal of the bone conduction speaker. As described with respect to fig. 11, when only the missing sound signal of the bone conduction speaker is considered, the sound pressure level minimum area where the position 1 is located on the speaker 130 can be used as the target area for setting the microphone array (e.g., the microphone array 110, 820, 1020). When the vibration signal and the leakage signal of the bone conduction speaker are considered at the same time, the target region where the microphone array is set (i.e., the region where the sound pressure of the total signal of the vibration signal and the leakage signal of the bone conduction speaker is minimum) is not necessarily the position 1. Referring to fig. 13, the sound pressure of the total signal of the vibration signal and the leakage sound signal of the bone conduction speaker corresponding to the position 2 is small compared to other positions, and therefore, the position 2 can be used as a target area for setting the microphone array.
In some embodiments, the position of the target area may be related to the orientation of the diaphragms of the microphones of the microphone array. The orientation of the diaphragm of the microphone may affect the magnitude of the vibration signal of the bone conduction speaker received by the microphone. For example, when the diaphragm of the microphone is perpendicular to the vibration component of the bone conduction speaker, the microphone may collect a smaller vibration signal of the bone conduction speaker. For another example, when the diaphragm of the microphone is parallel to the vibration component of the bone conduction speaker, the vibration signal of the bone conduction speaker that can be collected by the microphone is larger. In some embodiments, the orientation of the diaphragm of the microphone may be set so as to reduce the bone conduction speaker vibration signal received by the microphone. For example, when the diaphragm of the microphone is perpendicular to the vibration part of the bone conduction speaker, the vibration signal of the bone conduction speaker may be ignored in determining the target position of the microphone array, and only the leakage signal of the bone conduction speaker is considered, i.e., the target position of the microphone array is determined according to the description of fig. 11 and 12. For another example, when the diaphragm of the microphone is parallel to the vibration component of the bone conduction speaker, the vibration signal and the leakage signal of the bone conduction speaker may be considered simultaneously in determining the target position of the microphone array, i.e., the target position of the microphone array is determined according to the description of fig. 13.
In some embodiments, the phase of the vibration signal of the bone conduction speaker received by the microphone may be adjusted by adjusting the orientation of the diaphragm of the microphone, so that the phase of the vibration signal of the bone conduction speaker received by the microphone is approximately opposite to the phase of the sound leakage signal of the bone conduction speaker received by the microphone, and the amplitude of the vibration signal of the bone conduction speaker received by the microphone is approximately equal to the amplitude of the sound leakage signal of the bone conduction speaker received by the microphone, so that the vibration signal of the bone conduction speaker received by the microphone and the sound leakage signal of the bone conduction speaker received by the microphone may at least partially cancel each other, thereby reducing the interference signal emitted by the bone conduction speaker received by the microphone array. In some embodiments, a vibration signal received by the microphone from the bone conduction speaker may reduce a sound leakage signal received by the microphone from the bone conduction speaker by 5-6dB.
In some embodiments, the speaker (e.g., speaker 130) in an acoustic device (e.g., acoustic device 100) may be an air conduction speaker. When the loudspeaker is an air conduction loudspeaker and the interfering signal is an emitted sound signal of the air conduction loudspeaker (i.e. a radiated sound field), the target region may be a region of minimum sound pressure level of the radiated sound field of the air conduction loudspeaker. The microphone array is arranged in the sound pressure level minimum area of the radiation sound field of the air conduction loudspeaker, so that interference signals of the air conduction loudspeaker picked up by the microphone array can be reduced, and the problem that the sound field of the target space position cannot be accurately estimated due to the fact that the microphone array is too far away from the target space position can be effectively solved.
Fig. 14A-B are schematic diagrams of sound field distributions for air conduction speakers according to some embodiments of the present application. As shown in fig. 14A-B, an air conduction speaker may be disposed within the open acoustic device 1400 and radiate sound outward from two sound conduction holes (e.g., 1401 and 1402 in fig. 14A-B) of the open acoustic device 1400, and the emitted sound may form dipoles (denoted as "+", "-" shown in fig. 14A-B).
As shown in fig. 14A, the open acoustic device 1400 is arranged such that the lines of dipoles are approximately perpendicular to the user's face region. In this case, the dipole radiated sound may form three stronger sound field regions 1421, 1422, and 1423). A sound pressure level minimum region (which may also be referred to as a sound pressure small region) of the radiation sound field of the air conduction speaker, for example, a dotted line and a region in the vicinity thereof in fig. 14A, may be formed between the sound field regions 1421 and 1423 and between the sound field regions 1422 and 1423. The sound pressure level minimum region may refer to a region where the intensity of sound output by the open acoustic device 1400 is relatively small. In some embodiments, the microphones 1430 of the microphone array may be positioned in the sound pressure level minimum region. For example, the microphones 1430 in the microphone array may be disposed at the position where the dashed line intersects the housing of the open acoustic device 1400 in fig. 14, so that the microphones 1430 can collect external environment noise and receive as few sound signals emitted from the air conduction speaker as possible, thereby reducing the interference of the sound signals emitted from the air conduction speaker to the active noise reduction function of the open acoustic device 1400.
As shown in fig. 14B, the open acoustic device 1400 is arranged such that the lines of dipoles are approximately parallel to the user's face region. In this case, the dipole radiated sound may form two stronger sound field regions 1424 and 1425). Between the sound field regions 1424 and 1425, a sound pressure level minimum region of the radiation sound field of the air conduction speaker, for example, a dotted line and its vicinity in fig. 14B, can be formed. In some embodiments, the microphones 1440 of the microphone array may be placed in the region of minimum sound pressure level. For example, the microphones 1440 of the microphone array may be disposed at the position where the dashed line intersects the housing of the open acoustic device 1400 in fig. 14, so that the microphones 1440 may collect the external environment noise while receiving as little sound signals emitted by the air conduction speaker as possible, and the interference of the sound signals emitted by the air conduction speaker with the active noise reduction function of the open acoustic device 1400 is reduced.
FIG. 15 is an exemplary flow chart illustrating outputting a target signal based on a transfer function according to some embodiments of the present application. As shown in fig. 15, the flow 1500 may include:
in step 1510, the noise reduction signal is processed based on the transfer function. In some embodiments, this step may be performed by the processor 120 (e.g., the magnitude-phase compensation unit 230). Further reference to noise reduction signals may be made elsewhere in this application, for example, fig. 3 and its corresponding description. Additionally, a speaker (e.g., speaker 130) may output a target signal based on the noise reduction signal generated by processor 120, as described with respect to FIG. 3.
In some embodiments, the target signal output by the speaker may be transmitted to a specific location in the user's ear (also referred to as a noise cancellation location) through the first acoustic path, the ambient noise may be transmitted to a specific location in the user's ear through the second acoustic path, and at the specific location, the target signal and the ambient noise cancel each other, so that the user cannot perceive the ambient noise or may perceive a weaker ambient noise. In some embodiments, when the speaker is an air conduction speaker, the specific location at which the target signal and the ambient noise cancel each other out may be at or near the ear canal of the user, e.g., the target spatial location. The first acoustic path may be a path for a target signal to travel from the air conduction speaker through the air to the target spatial location, and the second acoustic path may be a path for ambient noise to travel from a noise source to the target spatial location. In some embodiments, when the speaker is a bone conduction speaker, the specific location where the target signal and the ambient noise cancel each other may be at a basilar membrane of the user. The first acoustic path may be a path for a target signal from the bone conduction speaker, through the user's bones or tissues, to the user's basement membrane, and the second acoustic path may be a path for ambient noise from the noise source, through the user's ear canal, eardrum, to the user's basement membrane.
In some embodiments, a speaker (e.g., speaker 130) may be positioned near and not blocking the ear canal of a user such that the speaker is a distance from a noise cancellation location (e.g., target spatial location, basilar membrane). Therefore, when the target signal output from the speaker is delivered to the noise canceling position, the phase information and the amplitude information of the target signal may be changed. As a result, it may occur that the target signal output by the speaker does not achieve the effect of reducing the ambient noise signal, and even enhances the ambient noise, thereby causing the active noise reduction function of the acoustic device (e.g., the open acoustic output device 100) to be not achieved.
Based on the above, processor 120 may obtain a transfer function of the emission of the target signal from the speaker to the noise cancellation location. The transfer function may include a first transfer function and a second transfer function. The first transfer function may represent a variation (e.g., a variation in amplitude, a variation in phase) of a parameter of the target signal with the acoustic path (i.e., the first acoustic path) emanating from the speaker to the noise cancellation location. In some embodiments, when the speaker is a bone conduction speaker, the target signal emitted by the bone conduction speaker is a bone conduction signal, and the location where the target signal emitted by the bone conduction speaker and the ambient noise cancel is a basement membrane of the user. In this case, the first transfer function may represent a change in a parameter (e.g., phase, amplitude) of the target signal emanating from the bone conduction speaker to the basement membrane that is transferred to the user. In some embodiments, the first transfer function may be obtained experimentally when the speaker is a bone conduction speaker. For example, the bone conduction speaker outputs a target signal, and plays an air conduction sound signal having the same frequency as the target signal at a position near the ear canal of the user, and observes the cancellation effect of the target signal and the air conduction sound signal. When the target signal and the air conduction sound signal cancel each other out, the first transfer function of the bone conduction speaker may be obtained based on the air conduction sound signal and the target signal output by the bone conduction speaker. In some embodiments, when the speaker is an air conduction speaker, the target signal emitted by the air conduction speaker is an air conduction sound signal, and the first transfer function can be obtained by simulation and calculation of an acoustic diffusion field. For example, the acoustic diffusion field may be used to simulate the sound field of a target signal emitted by the air conduction speaker and the first transfer function of the air conduction speaker may be calculated based on the sound field. The second transfer function may represent a change in a parameter (e.g., a change in amplitude, a change in phase) of the ambient noise from the target spatial location to a location where the target signal and the ambient noise cancel. For example only, when the speaker is a bone conduction speaker, the second transfer function may represent a change in a parameter of ambient noise from the target spatial location to a basilar membrane of the user. In some embodiments, the second transfer function may be obtained by acoustic diffusion field simulation and calculation. For example, the acoustic diffusion field may be used to simulate the sound field of ambient noise and the second transfer function may be calculated based on the sound field.
In some embodiments, there may be not only a phase change but also a loss of energy in the signal during the transfer of the target signal. The transfer function may thus comprise a phase transfer function and a magnitude transfer function. In some embodiments, both the phase transfer function and the amplitude transfer function may be obtained by the methods described above.
Further, the processor 120 may process the noise reduction signal based on the obtained transfer function. In some embodiments, the processor 120 may adjust the amplitude and phase of the noise reduction signal based on the obtained transfer function. In some embodiments, the processor 120 may adjust the phase of the noise reduction signal based on the obtained phase transfer function and adjust the amplitude of the noise reduction signal based on the amplitude transfer function.
In step 1520, a target signal is output based on the processed noise reduction signal. In some embodiments, this step may be performed by speaker 130.
In some embodiments, speaker 130 may output the target signal based on the noise reduction signal processed in step 1510 such that when the target signal output by speaker 130 based on the processed noise reduction signal is delivered to a location where it cancels the ambient noise, the magnitude of the phase sum of the target signal and the ambient noise satisfies a certain condition. In some embodiments, the phase difference of the phase of the target signal and the phase of the ambient noise may be less than or equal to a certain phase threshold. The phase threshold may be in the range of 90-180 degrees. The phase threshold may be adjusted within this range according to the needs of the user. For example, when the user does not wish to be disturbed by the sound of the surrounding environment, the phase threshold may be a large value, e.g. 180 degrees, i.e. the phase of the target signal is opposite to the phase of the ambient noise. As another example, the phase threshold may be a small value, such as 90 degrees, when the user wishes to remain sensitive to the surrounding environment. It should be noted that the more ambient sound the user wishes to receive, the closer to 90 degrees the phase threshold may be, and the less ambient sound the user wishes to receive, the closer to 180 degrees the phase threshold may be. In some embodiments, where the phase of the target signal is constant (e.g., opposite in phase) to the phase of the ambient noise, the difference in magnitude between the magnitude of the ambient noise and the magnitude of the target signal may be less than or equal to a magnitude threshold. For example, when the user does not wish to be disturbed by the sound of the surrounding environment, the amplitude threshold may be a small value, e.g. 0dB, i.e. the amplitude of the target signal is equal to the amplitude of the ambient noise. As another example, when the user wishes to remain sensitive to the ambient environment, the magnitude threshold may be a large value, such as about equal to the magnitude of the ambient noise. It is noted that the more ambient sound the user wishes to receive, the closer the amplitude threshold may be to the amplitude of the ambient noise, and the less ambient sound the user wishes to receive, the closer the amplitude threshold may be to 0dB. Therefore, the purpose of reducing the environmental noise and the active noise reduction function of the acoustic device (for example, the acoustic output device 100) are achieved, and the hearing experience of the user is improved.
It should be noted that the above description of the process 1500 is for illustration and description only and is not intended to limit the scope of the present disclosure. Various modifications and changes to flow 1500 will be apparent to those skilled in the art in light of this description. For example, the flow 1500 may also include the step of obtaining a transfer function. For another example, step 1510 and step 1520 may be combined into one step. Such modifications and variations are intended to be within the scope of the present application.
FIG. 16 is an exemplary flow chart of estimating noise at a target spatial location provided in accordance with some embodiments described herein. As shown in fig. 16, flow 1600 may include:
in step 1610, components associated with the signals picked up by the bone conduction microphones are removed from the picked up ambient noise to update the ambient noise.
In some embodiments, this step may be performed by processor 120. In some embodiments, when the microphone array (e.g., microphone array 110) picks up the ambient noise, the user's own speaking sound is also picked up by the microphone array, i.e., the user's own speaking sound is also considered part of the ambient noise. In this case, the target signal output by the speaker (e.g., speaker 130) will cancel the voice of the user speaking himself. In some embodiments, the voice of the user speaking himself needs to be preserved in certain scenarios, for example, scenarios in which the user makes a voice call, sends a voice message, and the like. In some embodiments, the acoustic device (e.g., acoustic device 100) may include a bone conduction microphone, and when the user wears the acoustic device to make a voice call or record voice information, the bone conduction microphone may pick up a sound signal of the user's speech by picking up vibration signals generated by facial bones or muscles when the user speaks, and transmit the sound signal to the processor 120. The processor 120 obtains parameter information from the bone conduction microphone picked-up sound signal and removes sound signal components associated with the bone conduction microphone picked-up sound signal from the ambient noise picked-up by the microphone array (e.g., microphone array 110). The processor 120 updates the ambient noise according to the parameter information of the remaining ambient noise. The updated environmental noise does not contain the voice signal of the user speaking, namely, the user can hear the voice signal of the user speaking when the user carries out voice call.
In step 1620, the noise at the target spatial location is estimated based on the updated ambient noise. In some embodiments, this step may be performed by processor 120. Step 1620 may be performed in a similar manner to step 320, and related description is not repeated here.
It should be noted that the above description of flowchart 1600 is for illustration and description only and is not intended to limit the scope of the application. Various modifications and changes to flow 1600 may occur to those skilled in the art upon review of the present application. For example, the components associated with the signals picked up by the bone conduction microphone may be preprocessed and transmitted as audio signals to the terminal device. Such modifications and variations are intended to be within the scope of the present application.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.
Also, the present application uses specific words to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this application are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics may be combined as suitable in one or more embodiments of the application.
Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, and the like, cited in this application is hereby incorporated by reference in its entirety. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application may be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims (10)

1. An acoustic device, characterized in that the acoustic device comprises:
a microphone array configured to pick up ambient noise;
a processor configured to:
estimating a soundfield with the array of microphones at a target spatial location that is closer to a user's ear canal than any of the microphones of the array of microphones, and
generating a noise reduction signal based on the picked-up ambient noise and a sound field estimate of the target spatial location; and
at least one speaker configured to output a target signal for reducing the ambient noise based on the noise reduction signal, wherein the microphone array is disposed at a target area to minimize interference signals from the at least one speaker with the microphone array.
2. The acoustic apparatus of claim 1, wherein the generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location comprises:
estimating noise at the target spatial location based on the picked-up ambient noise; and
generating the noise reduction signal based on the noise at the target spatial location and the sound field estimate at the target spatial location.
3. The acoustic device of claim 2,
the acoustic device further comprises one or more sensors for acquiring motion information of the acoustic device, and the processor is further configured to:
updating noise at the target spatial location and a sound field estimate for the target spatial location based on the motion information; and
generating the noise reduction signal based on the noise at the updated target spatial location and the sound field estimate at the updated target spatial location.
4. The acoustic apparatus of claim 2, wherein the estimating the noise of the target spatial location based on the picked up ambient noise comprises:
determining one or more spatial noise sources related to the picked up ambient noise; and
estimating noise at the target spatial location based on the spatial noise source.
5. The acoustic apparatus of claim 1, wherein the estimating the soundfield of the target spatial location with the microphone array comprises:
constructing a virtual microphone based on the microphone array, wherein the virtual microphone comprises a mathematical model or a machine learning model and is used for representing audio data collected by the microphone if the target space position comprises the microphone; and
estimating a sound field of the target spatial location based on the virtual microphones.
6. The acoustic apparatus of claim 5, wherein the generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location comprises:
estimating noise for the target spatial location based on the virtual microphones; and
generating the noise reduction signal based on the noise at the target spatial location and the sound field estimate at the target spatial location.
7. The acoustic device of claim 1,
the at least one speaker is a bone conduction speaker,
the interference signal comprises a sound leakage signal and a vibration signal of the bone conduction speaker, an
The target region is a region where the total energy of the leakage sound signal and the vibration signal delivered to the bone conduction speaker of the microphone array is minimal.
8. The acoustic device of claim 1,
the at least one loudspeaker is an air conduction loudspeaker, and
the target region is a region of minimum sound pressure level of a radiated sound field of the air conduction speaker.
9. The acoustic device of claim 1,
the processor is further configured to process the noise reduction signal based on a transfer function, the transfer function including a first transfer function representing a change in a parameter of the target signal emanating from the at least one speaker to a location where the target signal and the ambient noise cancel, and a second transfer function representing a change in a parameter of the ambient noise from the target spatial location to a location where the target signal and the ambient noise cancel; and
the at least one speaker is further configured to output the target signal in accordance with the processed noise reduction signal.
10. A method of noise reduction, the method comprising:
picking up ambient noise by a microphone array;
by a processor
Estimating a soundfield with the array of microphones at a target spatial location, the target spatial location being closer to a user's ear canal than any of the arrays of microphones;
generating a noise reduction signal based on the picked-up ambient noise and the sound field estimate of the target spatial location; and
outputting, by at least one speaker, a target signal based on the noise reduction signal, the target signal for reducing the ambient noise, wherein the microphone array is disposed at a target area to minimize interference signals from the at least one speaker with the microphone array.
CN202110486203.6A 2021-04-25 2021-04-30 Acoustic device Pending CN115240697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111115388A TW202242855A (en) 2021-04-25 2022-04-22 Acoustic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/089670 WO2022226696A1 (en) 2021-04-25 2021-04-25 Open earphone
CNPCT/CN2021/089670 2021-04-25

Publications (1)

Publication Number Publication Date
CN115240697A true CN115240697A (en) 2022-10-25

Family

ID=83665731

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202180099448.1A Pending CN117501710A (en) 2021-04-25 2021-04-25 Open earphone
CN202110486203.6A Pending CN115240697A (en) 2021-04-25 2021-04-30 Acoustic device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202180099448.1A Pending CN117501710A (en) 2021-04-25 2021-04-25 Open earphone

Country Status (3)

Country Link
CN (2) CN117501710A (en)
TW (1) TW202242856A (en)
WO (2) WO2022226696A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115615624A (en) * 2022-12-13 2023-01-17 杭州兆华电子股份有限公司 Equipment leakage detection method and system based on unmanned inspection device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8649526B2 (en) * 2010-09-03 2014-02-11 Nxp B.V. Noise reduction circuit and method therefor
CN102348151B (en) * 2011-09-10 2015-07-29 歌尔声学股份有限公司 Noise canceling system and method, intelligent control method and device, communication equipment
CN108668188A (en) * 2017-03-30 2018-10-16 天津三星通信技术研究有限公司 The method and its electric terminal of the active noise reduction of the earphone executed in electric terminal
CN107346664A (en) * 2017-06-22 2017-11-14 河海大学常州校区 A kind of ears speech separating method based on critical band
CN107452375A (en) * 2017-07-17 2017-12-08 湖南海翼电子商务股份有限公司 Bluetooth earphone
US10706868B2 (en) * 2017-09-06 2020-07-07 Realwear, Inc. Multi-mode noise cancellation for voice detection
JP6972814B2 (en) * 2017-09-13 2021-11-24 ソニーグループ株式会社 Earphone device, headphone device and method
KR102406572B1 (en) * 2018-07-17 2022-06-08 삼성전자주식회사 Method and apparatus for processing audio signal
CN111935589B (en) * 2020-09-28 2021-02-12 深圳市汇顶科技股份有限公司 Active noise reduction method and device, electronic equipment and chip

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115615624A (en) * 2022-12-13 2023-01-17 杭州兆华电子股份有限公司 Equipment leakage detection method and system based on unmanned inspection device
CN115615624B (en) * 2022-12-13 2023-03-31 杭州兆华电子股份有限公司 Equipment leakage detection method and system based on unmanned inspection device

Also Published As

Publication number Publication date
WO2022226696A1 (en) 2022-11-03
WO2022227056A1 (en) 2022-11-03
TW202242856A (en) 2022-11-01
CN117501710A (en) 2024-02-02

Similar Documents

Publication Publication Date Title
US11304014B2 (en) Hearing aid device for hands free communication
US10321241B2 (en) Direction of arrival estimation in miniature devices using a sound sensor array
CN108600907B (en) Method for positioning sound source, hearing device and hearing system
US11715451B2 (en) Acoustic devices
EP2928214B1 (en) A binaural hearing assistance system comprising binaural noise reduction
CN107690119B (en) Binaural hearing system configured to localize sound source
EP3236672B1 (en) A hearing device comprising a beamformer filtering unit
EP3499915B1 (en) A hearing device and a binaural hearing system comprising a binaural noise reduction system
US10587962B2 (en) Hearing aid comprising a directional microphone system
EP3883266A1 (en) A hearing device adapted to provide an estimate of a user's own voice
WO2023087565A1 (en) Open acoustic apparatus
CN115240697A (en) Acoustic device
CN116156372A (en) Acoustic device and transfer function determining method thereof
US11689845B2 (en) Open acoustic device
RU2800546C1 (en) Open acoustic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40077980

Country of ref document: HK