CN113129916B

CN113129916B - Audio acquisition method, system and related device

Info

Publication number: CN113129916B
Application number: CN201911404753.8A
Authority: CN
Inventors: 王昆; 王宇峰; 余珞
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2024-04-12
Anticipated expiration: 2039-12-30
Also published as: CN113129916A

Abstract

The application discloses an audio acquisition method, an audio acquisition system and a related device. The method comprises the steps that an audio acquisition device acquires a first voice signal and a first vibration signal of a human body vocal part vibration bone block; the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal; the audio acquisition equipment determines a first noise signal in the first voice signal by using the second vibration signal; the audio acquisition device separates a first noise signal from the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic device. The interference of noise on the collected user voice signals can be effectively reduced.

Description

Audio acquisition method, system and related device

Technical Field

The present disclosure relates to the field of electronic technologies, and in particular, to an audio acquisition method, system, and related devices.

Background

In recent years, when a user performs voice interaction (voice call, video call, voice assistant enabled) with an electronic device (e.g., a mobile phone), more and more users choose to perform voice interaction with the electronic device using an audio capturing device such as headphones, glasses with a microphone, or the like that can capture a user's voice signal. That is, the audio collection device collects a voice signal of a user and then transmits the voice signal to the electronic device. In this way, the user may not need to hold the electronic device and may free up the user's hands.

However, in the prior art, when the audio collection device collects the voice signal of the user, the audio collection device also collects the noise signal in the environment where the user is located. Noise signals collected by the audio collection device interfere with the voice interaction of the user with the electronic device.

Therefore, how to effectively reduce the interference of noise to the collected user voice signals by the audio collection device in a noise environment is a problem to be solved.

Disclosure of Invention

The application provides an audio acquisition method, an audio acquisition system and a related device, which can effectively reduce the interference of noise on acquired user voice signals, so that the user experience of a user and electronic equipment voice interaction process can be improved.

In a first aspect, the present application provides an audio acquisition system, the system including an audio acquisition device and a first electronic device, the audio acquisition device and the first electronic device establishing a communication connection; wherein,

the audio acquisition equipment is used for acquiring a first voice signal and a first vibration signal of a human body vocal part vibration bone block of a user;

the audio acquisition equipment is used for carrying out signal processing on the first vibration signal to obtain a second vibration signal;

the audio acquisition equipment is used for determining a first noise signal in the first voice signal by using the second vibration signal;

The audio acquisition equipment is used for separating a first noise signal in the first voice signal to obtain a second voice signal and transmitting the second voice signal to the first electronic equipment;

the first electronic device is used for receiving the second voice signal.

In the audio collection system provided in the first aspect, the audio collection device processes the collected vibration signal and filters noise from the voice signal according to the vibration signal. Because the vibration signals have no noise signal interference, the frequency of the vibration signals of the human body acoustic vibration bone blocks is within 2KHz, and the frequency of the voice signals is between 20Hz and 20 KHz. The vibration signal can thus be used to determine the noise signal level in the speech signal at frequencies within 2 KHz. And then determining the noise signal size contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, so that the noise signal in the voice signal can be filtered. Therefore, the influence of noise signals can be effectively reduced, so that a user can interact with the electronic equipment more effectively, and the user experience is improved.

With reference to the first aspect, in one possible implementation manner, the audio acquisition device is specifically configured to: filtering the low-frequency signal in the first vibration signal to obtain a third vibration signal; determining the said Conjugate coefficient H of channel coefficient H in first vibration signal acquisition channel ^* The method comprises the steps of carrying out a first treatment on the surface of the Convolving said third vibration signal with a conjugate coefficient H ^* A second vibration signal is obtained. Thus, by convolving a conjugate coefficient, channel interference in the audio acquisition device can be reduced.

With reference to the first aspect, in one possible implementation manner, the audio acquisition device is specifically configured to: determining a second noise signal which is the same as the second vibration signal in the frequency range in the first voice signal according to the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal; and carrying out linear prediction on the second noise signal to obtain a first noise signal of the complete frequency range of the first voice signal. Thus, the audio acquisition device can determine the noise signal carried in the acquired voice signal.

With reference to the first aspect, in one possible implementation manner, the audio acquisition device is specifically configured to: before the first voice signal and the first vibration signal are collected, receiving a first user operation of a user; a first vibration signal and a first speech signal are acquired in response to a first user operation.

With reference to the first aspect, in one possible implementation manner, the audio acquisition device is specifically configured to: and receiving a first instruction sent by the first electronic equipment, wherein the first instruction is used for indicating the audio acquisition equipment to start to acquire a first voice signal and a first vibration signal of the human vocal cords to vibrate the bone blocks.

With reference to the first aspect, in one possible implementation manner, the first electronic device is further configured to: receiving a second user operation of the user; and responding to the second user operation, and sending a first instruction to the audio acquisition equipment.

With reference to the first aspect, in one possible implementation manner, the first electronic device is further configured to: responsive to a second user operation, a voice call is initiated with the second electronic device or a voice assistant in the first electronic device is initiated.

With reference to the first aspect, in one possible implementation manner, the audio capturing device may be glasses. Like this, when the user need wear glasses (when the user myopia or need sunglasses to shelter from sunshine promptly), need not to wear the earphone again and just can gather user's pronunciation, convenience of customers has promoted user experience.

In a second aspect, the present application provides an audio acquisition method, including: the method comprises the steps that an audio acquisition device acquires a first voice signal and a first vibration signal of a human body vocal part vibration bone block; the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal; the audio acquisition equipment determines a first noise signal in the first voice signal by using the second vibration signal; the audio acquisition device separates a first noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic device, wherein the audio acquisition device is in communication connection with the electronic device.

With reference to the second aspect, in one possible implementation manner, the audio collecting device collects a first voice signal and a first vibration signal of a human vocal cord vibration bone block, including: the audio acquisition device receives a first user operation of a user before acquiring a first voice signal and a first vibration signal; in response to a first user operation, the audio acquisition device acquires a first vibration signal and a first speech signal.

With reference to the second aspect, in one possible implementation manner, before the audio collecting device collects the first voice signal and the first vibration signal of the human vocal cord vibration bone block, the method includes: the audio acquisition equipment receives a first instruction sent by the electronic equipment, wherein the first instruction is used for instructing the audio acquisition equipment to start to acquire a first voice signal and a first vibration signal of a human vocal part vibration bone block.

With reference to the second aspect, in one possible implementation manner, the processing, by the audio collecting device, the first vibration signal to obtain a second vibration signal includes: the audio acquisition equipment filters a low-frequency signal in the first vibration signal to obtain a third vibration signal; the audio acquisition device determines the conjugate coefficient H of the channel coefficient H in the first vibration signal acquisition channel ^* The method comprises the steps of carrying out a first treatment on the surface of the The audio acquisition device convolves the third vibration signal with the conjugate coefficient H ^* A second vibration signal is obtained.

With reference to the second aspect, in a possible implementation manner, the determining, by the audio capturing device, a first noise signal in the first voice signal using the second vibration signal includes: the audio acquisition equipment determines a second noise signal which is the same as the second vibration signal in frequency range in the first voice signal according to the second vibration signal; the frequency range of the first voice signal is larger than that of the second vibration signal; the audio acquisition device performs linear prediction on the second noise signal to obtain a first noise signal in a complete frequency range of the first voice signal.

In a third aspect, the present application provides an audio acquisition device comprising one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being configured to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the audio acquisition device to perform the audio acquisition method in any of the possible implementations of the above aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium including computer instructions that, when executed on an electronic device, cause the electronic device to perform the audio acquisition method in any one of the possible implementations of the above aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer causes the computer to perform the audio acquisition method in any one of the possible implementations of the above aspect.

Drawings

Fig. 1 is a schematic diagram of an audio acquisition system 10 according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an audio capturing apparatus 100 according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of glasses 101 according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present application;

fig. 5 is a schematic flow chart of an audio acquisition method according to an embodiment of the present application;

fig. 6 is a schematic flow chart of signal processing according to an embodiment of the present application;

fig. 7 is a schematic flow chart of signal processing according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a waveform of a vibration signal according to an embodiment of the present application;

FIG. 9 is a schematic diagram of waveforms of a voice signal according to an embodiment of the present disclosure;

fig. 10 is an interaction schematic diagram of an audio collection method according to an embodiment of the present application.

Detailed Description

The following description will be given in detail of the technical solutions in the embodiments of the present application with reference to the accompanying drawings. Wherein, in the description of the embodiments of the present application, "/" means or is meant unless otherwise indicated, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and in addition, in the description of the embodiments of the present application, "plural" means two or more than two.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

First, an audio acquisition system according to an embodiment of the present application will be described. Referring to fig. 1, fig. 1 is a schematic diagram of an audio acquisition system according to an embodiment of the present application. The audio acquisition system 10 may include an audio acquisition device 100 and an electronic device 200. The embodiment of the application is described by taking the following scenes as examples: the audio collection device 100 is a pair of glasses that can collect voice signals, and the electronic device 200 is a mobile phone.

The audio acquisition device 100 may establish a communication connection with the electronic device 200 via bluetooth or a wireless local area network. The audio collection device 100 can collect user speech. The audio collection device 100 can also receive audio transmitted by the electronic device 200 and play the audio. The electronic device 200 may receive the user's voice captured by the audio capturing device 100. The electronic device 200 may also send audio signals to the audio acquisition device 100.

The audio collection device 100 can collect a user's voice signal and a vibration signal of the sonotrode vibrating the bone mass while the user speaks. The audio collection device 100 processes the collected vibration signal, and performs noise filtering signal processing on the voice signal according to the processed vibration signal, so as to obtain a voice signal after noise filtering signal processing. The audio collection device 100 transmits the voice signal after the noise signal filtering process to the electronic device 200.

The audio collecting apparatus 100 may be implemented as any apparatus capable of collecting a vibration signal and a voice signal of a human body vocal part vibration bone piece, for example, glasses, headphones, etc. having a bone conduction sensor and a microphone. The electronic device 200 may be implemented as any one of the following electronic devices: a cell phone, a personal computer, a portable game machine, a portable media playing device, a vehicle-mounted media playing device, etc.

The following describes an audio capturing apparatus 100 according to an embodiment of the present application. Referring to fig. 2, fig. 2 is a schematic structural diagram of an audio capturing apparatus 100 according to an embodiment of the present application.

As shown in fig. 2, the audio collection device 100 may include: a processor 301, a memory 302, a sensor 303, a wireless communication module 304, at least one electroacoustic transducer (electro-acoustic transducer) 305, a microphone 306 and a power supply 307.

It should be understood that the audio capturing device 100 shown in fig. 2 is only one example, and that the audio capturing device 100 may have more or fewer components than shown in fig. 2, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

Wherein the memory 302 may be used for storing application code which the processor 301 executes to cause the audio acquisition device 100 to perform the method in an embodiment of the invention. The audio acquisition device 100 may establish a communication connection, in particular a bluetooth or Wi-Fi connection, with the electronic device 200 via the wireless communication module 304. The audio collection device 100 may transmit a voice signal to the electronic device 200 or receive an audio signal transmitted by the electronic device 200 through the wireless communication module 304. The application code stored in the memory 302 may also be used to implement a function of calling the electroacoustic transducer 305 to convert an audio signal into audio and play.

The sensor 303 may comprise a bone conduction sensor that may acquire a vibration signal of the sonotrode vibrating bone mass.

In some embodiments, the sensor 303 may also include an acceleration sensor. The acceleration sensor may detect a tapping operation. Specifically, the different number of taps causes the acceleration sensor to output different voltage signals that can be transferred to the processor to perform the corresponding control function. For example, when the acceleration sensor detects a continuous N (N is an integer greater than 0) tap operation and outputs a corresponding voltage signal, the processor 301 may initiate the wireless communication module 304 to establish a communication connection, specifically a Wi-Fi connection, with the wireless transmission device 200. For another example, when the acceleration sensor detects a continuous n+1 tap operation and outputs a corresponding voltage signal, the processor 301 may initiate the wireless communication module 304 to establish a communication connection with the electronic device 200, which may be a bluetooth connection or a Wi-Fi connection.

In other embodiments, the sensor 303 may also include a fingerprint sensor for detecting a user fingerprint, identifying a user identity, or the like.

In other embodiments, the sensor 303 may also include a touch sensor, and in some embodiments, the processor 301 initiates the wireless communication module 304 to receive an audio or voice signal when the touch sensor detects a touch operation.

In other embodiments, the sensor 303 may further include a pressure sensor for detecting a pressing operation by a user. In other embodiments, the processor 301 may activate the wireless communication module 304 to receive an audio or voice signal when the pressure sensor detects a pressing operation.

In other embodiments, sensor 303 may also comprise a distance sensor, a proximity light sensor. The distance sensor, proximity light sensor, may detect whether there is an object in the vicinity of the audio capture device 100, thereby determining whether the audio capture device 100 is being worn by the user. In other embodiments, sensor 303 may also include an ambient light sensor and processor 301 may adaptively adjust parameters, such as volume level, based on the brightness of ambient light sensed by the ambient light sensor.

A wireless communication module 304 for supporting short-range data interaction between the audio acquisition device 100 and various electronic devices. In some embodiments, the wireless communication module 304 may include a bluetooth transceiver for receiving bluetooth audio broadcast signals broadcast by the electronic device 200. The wireless communication module 304 may also include a Wi-Fi module that may receive audio or voice signals transmitted by the wireless transmission device 200 described above.

Electroacoustic transducer 305, which may include a receiver (i.e., an "earpiece") or a speaker, may be used to convert and play an audio electrical signal into a sound signal. The audio electrical signal may be decoded from audio, which may be transmitted by the electronic device 200. The electronic device 200 establishes a Wi-Fi connection with the audio capturing device 100, and the audio capturing device 100 receives an audio file transmitted by the electronic device 200, and then converts the audio file into a voice signal and plays the voice signal.

Microphone 306, which may also be referred to as a "microphone," is used to convert speech signals into electrical audio signals. For example, as the user speaks, the microphone 306 may collect and convert the user's voice signal into an audio electrical signal.

A power supply 307 may be used to power the various components contained in the audio acquisition device 100. In some embodiments, the power source 307 may be a battery, such as a rechargeable battery.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the audio capturing apparatus 100. In addition, the wireless communication module 304 may also include a bluetooth transceiver. The audio collection device 100 may establish a bluetooth connection with other bluetooth audio sources through the bluetooth transceiver to enable short-range data interaction therebetween, e.g., the audio collection device 100 receives an audio signal through the bluetooth transceiver and then plays. The audio acquisition device 100 may also contain one earplug, or two earplugs. The earpiece comprises the above-described individual functional modules (processor 301, memory 302, sensor 303, wireless communication module 304, electroacoustic transducer 305, microphone 306 and power supply 307) and an earpiece housing enclosing these functional modules together. When the audio acquisition device 100 includes two earpieces, the two earpieces may be used as a pair of headphones, or a bluetooth connection may be established through a bluetooth transceiver to enable data interaction between the two earpieces. The audio acquisition device 100 may have more or fewer components than shown in fig. 4, may combine two or more components, or may have a different configuration of components. For example, the audio capture device 100 may also include an indicator light (which may indicate the status of the ear bud, etc.), a dust screen (which may be used with the ear bud), etc. The various components shown in fig. 4 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.

In some illustrative examples, the audio capture device 100 to which embodiments of the present application relate may be implemented as a pair of eyeglasses 101 as shown in fig. 3. Referring to fig. 3, fig. 3 is a schematic view of glasses in an embodiment of the present application. As shown in fig. 3, each nose pad 10a of the glasses 101 may include at least one bone conduction sensor therein for collecting vibration signals of nasal bones when a user wearing the glasses speaks. The temple 10b includes at least one microphone therein for capturing voice signals of a user wearing the glasses. The frame 10c includes a processor, memory, power supply, a plurality of transmission elements, and the like. The glasses 101 may be near vision glasses or sunglasses.

It should be understood that the eyeglass 101 shown in fig. 3 is only one example, and that the eyeglass 101 may be different in appearance from the eyeglass 101 shown in fig. 3. That is, the outer shape of the nose pad, the frame, the temple, and the lens may be different from the shape of the nose pad 10a, the temple 10b, and the frame 10c, which are shown in fig. 3, which have been lenses.

The glasses are used for collecting the vibration signals and the voice signals of the nasal bones of the user, so that the voice of the user can be collected, and the functions of the near vision mirror or the sunglasses can be realized. When the user wears the glasses, other wearing equipment such as headphones and the like do not need to be worn for audio acquisition. Therefore, only one device is used, various requirements of users can be met, and user experience is improved.

The following describes an electronic device 200 according to an embodiment of the present application. Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present application.

It should be understood that the electronic device 200 shown in fig. 4 is only one example, and that the electronic device 200 may have more or fewer components than shown in fig. 4, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The electronic device 200 may include: processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the structure illustrated in the embodiments of the present invention does not constitute a specific limitation on the electronic device 200. In other embodiments of the present application, electronic device 200 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 200, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 200.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of electronic device 200. The processor 110 and the display screen 194 communicate via a DSI interface to implement the display functionality of the electronic device 200.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 200, or may be used to transfer data between the electronic device 200 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

It should be understood that the connection relationship between the modules illustrated in the embodiment of the present invention is only illustrative, and does not limit the structure of the electronic device 200. In other embodiments of the present application, the electronic device 200 may also use different interfacing manners, or a combination of multiple interfacing manners, as in the above embodiments.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 200. The charging management module 140 may also supply power to the first audio device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 200 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied on the electronic device 200. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied on the electronic device 200. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 200 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 200 may communicate with a network and other devices via wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 200 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light emitting diode (AMOLED), a flexible light-emitting diode (flex), a mini, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 200 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 200 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the electronic device 200 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 200 is selecting a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 200 may support one or more video codecs. In this way, the electronic device 200 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the electronic device 200 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 200. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 200 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 200 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 200 may implement audio functions through the audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, and application processor, etc. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 200 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the electronic device 200 is answering a telephone call or voice message, the voice can be received by placing the receiver 170B close to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 200 may be provided with at least one microphone 170C. In other embodiments, the electronic device 200 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 200 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 200 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 200 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 200 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device 200. In some embodiments, the angular velocity of electronic device 200 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 200, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the electronic device 200 through the reverse motion, thereby realizing anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 200 calculates altitude from barometric pressure values measured by the barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The electronic device 200 may detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the electronic device 200 is a flip machine, the electronic device 200 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 200 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 200 is stationary. The method can also be used for recognizing the gesture of the first audio equipment, and is applied to applications such as horizontal-vertical screen switching, pedometers and the like.

A distance sensor 180F for measuring a distance. The electronic device 200 may measure the distance by infrared or laser. In some embodiments, the electronic device 200 may range using the distance sensor 180F to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 200 emits infrared light outward through the light emitting diode. The electronic device 200 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it may be determined that an object is in the vicinity of the electronic device 200. When insufficient reflected light is detected, the electronic device 200 may determine that there is no object in the vicinity of the electronic device 200. The electronic device 200 can detect that the user holds the electronic device 200 close to the ear by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense ambient light level. The electronic device 200 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 200 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 200 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 200 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by temperature sensor 180J exceeds a threshold, electronic device 200 performs a reduction in the performance of a processor located in the vicinity of temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the electronic device 200 heats the battery 142 to avoid the low temperature causing the electronic device 200 to be abnormally shut down. In other embodiments, when the temperature is below a further threshold, the electronic device 200 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 200 at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the sound portion vibration bone block obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to implement a heart rate detection function. For example, the sonovibrating bone pieces may include bones such as teeth, gums, maxillary and mandibular bones, nasal bones, and the like.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 200 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 200.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 200. The electronic device 200 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 200 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 200 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device 200 and cannot be separated from the electronic device 200.

The embodiment of the application provides an audio collection method, in which an audio collection device 100 collects vibration signals and voice signals of a human vocal part vibration bone block of a user. The audio collection device 100 processes the collected vibration signal and performs noise filtering processing on the voice signal according to the vibration signal, so as to obtain a noise-filtered voice signal and send the noise-filtered voice signal to the electronic device 200.

In the prior art, an audio acquisition device only acquires a voice signal, and noise signals in the environment can interfere with the voice signal. Thus, when a user is talking, the audio collection device 100 collects the voice of the user and sends the voice to the electronic device 200, and the electronic device 200 sends the voice of the user to the electronic device of the opposite party. Because the voice of the user collected by the audio collecting apparatus 100 carries noise, the opposite party of the call may hear the specific content of the voice of the user unclear, which results in poor experience in the call process. In addition, when the user voice command interacts with the electronic device 200, if there is interference of a noise signal in the user voice collected by the audio collection device 100, the electronic device 200 cannot correctly recognize the voice command in the voice signal transmitted by the audio collection device 100. Thus, the electronic device 200 cannot properly execute the user voice command, resulting in poor user experience.

Compared with the prior art, in the audio collection method provided by the application, the audio collection device 100 processes the collected vibration signals and filters noise from the voice signals according to the vibration signals. Because the vibration signals have no noise signal interference, the frequency of the vibration signals of the human body acoustic vibration bone blocks is within 2KHz, and the frequency of the voice signals is between 20Hz and 20 KHz. The vibration signal can thus be used to determine the noise signal level in the speech signal at frequencies within 2 KHz. And then determining the noise signal size contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, so that the noise signal in the voice signal can be filtered. In this way, the influence of noise signals can be effectively reduced, so that the user can interact with the electronic device 200 more effectively, and the user experience is improved.

Fig. 5 is a flowchart of an audio acquisition method according to an embodiment of the present application. The audio collection device 100 may collect a voice signal of a user and a vibration signal of nasal bone, determine a noise signal pair of the voice signal to filter the noise signal according to the vibration signal, and finally send the voice signal with the noise signal filtered to the electronic device 200. The method specifically comprises the following steps:

S101, the audio acquisition device 100 acquires a first voice signal and a first vibration signal of a user human vocal part vibration bone block.

The audio collecting apparatus 100 collects a first vibration signal of vibrating bone pieces of a human body's vocal part using the bone conduction sensor 303, and collects a first voice signal of a user using the microphone 306. For example, the audio capturing device 100 may be glasses 101 as shown in fig. 3. The bone conduction sensor at the nose pad of the glasses 101 collects vibration signals at the nasal bones of the user. Microphones at the temples of the glasses 101 collect the user's voice signals.

In one possible implementation, before the audio capturing device 100 captures the first vibration signal and the first voice signal of the bone piece of the human vocal part of the user, it includes: the audio capturing device 100 receives a first instruction for instructing the audio capturing device 100 to start capturing a first vibration signal and a first voice signal of a bone mass vibrated by a human vocal part of a user. The first instruction may be an instruction generated by the audio capturing device 100 receiving a user operation, for example, a click or double click operation or a press operation by the user received by the audio capturing device 100, the processor in the audio capturing device 100 generating an instruction instructing the bone conduction sensor 303 in the audio capturing device 100 to start capturing a first vibration signal of the bone mass of the human body of the user, and instructing the microphone 306 in the audio capturing device 100 to start capturing a first voice signal. Alternatively, the first instruction may be an instruction transmitted by the electronic device 200. For example, upon receiving a user operation that a user selects to answer a call with the audio capturing apparatus 100, the electronic apparatus 200 transmits a first instruction to the audio capturing apparatus 100.

In one possible implementation, before the audio capturing device 100 captures the first vibration signal and the first voice signal of the bone piece of the human vocal part of the user, it includes: the audio acquisition device 100 and the electronic device 200 establish a communication connection. Specifically, the audio acquisition device 200 may establish a communication connection with the electronic device 200 through the wireless communication module 304. The communication connection may be a Wi-Fi connection or a bluetooth connection.

S102, the audio acquisition device 100 performs signal processing on the first vibration signal to obtain a second vibration signal.

The first vibration signal acquired by the audio acquisition apparatus 100 is an analog signal. Because of the acquisition device, the acquired first vibration signal is weak, and signal processing is required to be performed on the acquired first vibration signal. After the audio collection device 100 collects the first vibration signal, signal processing is performed on the first vibration signal, so as to obtain a second vibration signal.

In one possible implementation, the processing of the first vibration signal by the audio acquisition device 100 may refer to fig. 6. Fig. 6 is a schematic diagram of a signal processing flow provided in an embodiment of the present application. As shown in fig. 6, the processing of the vibration signal by the audio acquisition device 100 may include:

S1021, the audio collection device 100 amplifies the first vibration signal.

The audio collection device 100 may amplify the first vibration signal through a signal amplification circuit.

S1022, the audio capturing device 100 performs analog-to-digital conversion on the amplified first vibration signal to obtain a vibration signal in the form of a digital signal.

The audio collection device 100 converts the amplified first vibration signal into a vibration signal in the form of a digital signal through an analog-to-digital converter.

S1023, the audio capturing apparatus 100 processes the vibration signal in the form of a digital signal using a digital signal processing unit in the audio capturing apparatus 100.

In one possible implementation, the process of processing the vibration signal in the form of a digital signal by the audio acquisition device 100 using the digital signal processing unit may refer to fig. 7. As shown in fig. 7, the process of processing the vibration signal in the form of a digital signal by the audio acquisition device 100 may include:

s201, filtering out low-frequency vibration signals generated by human body movement.

Vibration signals are generated due to vibration of bones caused when a human body moves. The vibration frequency of the human body vocal part vibration bone block caused by the motion is far smaller than that of the human body vocal part vibration bone block caused by speaking. Because the vibration signal collected by the audio collection device is amplified, the vibration frequency of the vibration of the human body acoustic part vibration bone block caused by the motion is amplified, and in order to avoid the influence of the bone vibration generated by the user motion on the vibration signal generated by the bone motion caused by the user speaking, the audio collection device 100 performs filtering processing on the collected first vibration signal.

In one possible implementation, the audio acquisition device 100 may only preserve portions of the first vibration signal having a vibration frequency greater than the first threshold.

S202, filtering compensation is carried out.

Human body movement may also cause mechanical changes in the sensors in the audio acquisition device 100, resulting in interference with the acquired first vibration signal. In addition, the first vibration signal collected by the audio collection device 100 may be disturbed due to the horn vibration in the audio collection device 100. Factors of various devices in the audio capturing apparatus 100 cause the first vibration signal to be captured and the voice signal to be distorted, thereby resulting in failure to obtain the complete voice information of the user. To solve this problem, in the embodiment of the present application, the audio acquisition apparatus 100 performs filtering compensation processing on the acquired first vibration signal, and reduces distortion of the signal.

In one possible implementation, the audio acquisition device 100 performs a filter compensation process on the first vibration signal, including: filtering compensation parameter H for determining interference channel parameter H for generating interference to vibration signal ^* Convolving the vibration signal of the output vibration signal with the filter compensation parameter H ^* And obtaining a second vibration signal.

In one possible implementation manner, the vibration signal obtained by amplifying and filtering the low-frequency vibration signal generated by the human motion from the first vibration signal acquired by the audio acquisition device may be represented by R1, and the vibration signal obtained by digitally processing the first vibration signal may be represented by S1. Then the first time period of the first time period,

S1＝H*R1 (1)

in formula (1), "x" represents convolution. According to formula (1), the filter compensation parameter H can be obtained ^* ：

H ^* ＝INV(A1 ^T *A1)*A1 ^T *S1 (2)

H*H ^* ＝1 (3)

Wherein "INV" represents matrix inversion, A1 is the generator matrix of R1, A1 ^T Is the transposed matrix of A1. H ^* Is a conjugate matrix of H.

S1024, the audio capturing apparatus 100 obtains the second vibration signal.

Then the second vibration signal should be:

S2＝S1*H ^* ＝R1*H*H ^* (4)

thus, the obtained second vibration signal is closer to the original vibration signal of the bone block vibrated by the human vocal part, and the channel interference can be reduced. Thus, the voice signal of the user can be acquired more accurately by using the second vibration signal.

S103, the audio acquisition device 100 determines a noise signal in the first voice signal by using the second vibration signal.

As shown in fig. 8, fig. 8 is a schematic diagram of the vibration signal of the nasal bone acquired by the audio acquisition device 100. The vibration signal of the nasal bone acquired by the audio acquisition device 100 is free from interference of the noise signal. The frequency of the vibration signal of the nasal bone collected in fig. 8 is F1, F1 being less than 2KHz. It will be appreciated here that the frequency of the vibration signal captured by the spectacles is related to the sensors in the spectacles. When the sensor is sensitive, the vibration frequency of the collected vibration signal may be greater than 2KHz, which is not limited herein. As shown in fig. 9, fig. 9 is a schematic diagram of a voice signal collected by a microphone in the audio collection device 100. The voice signal collected in the audio collection device 100 includes human voice and noise. Noise can interfere with human voice. In the case of the same frequency range, the vibration signal of the nasal bone is identical to the voice signal of the user carried by the human voice signal. Thus, the amplitude of the noise signal in the speech signal in the frequency range of 0-F1 can be determined using the vibration signal of the nasal bone. Then, the magnitude of the noise signal in the speech signal in the range of 0-F2 can be obtained by linear prediction. Thus, noise can be filtered out, and a human voice signal without noise interference can be obtained.

In one illustrative example, assume that the vibration signal in fig. 8 is vector Y. In fig. 9, the speech signal with a frequency range of 0-F1 is vector x1= (x+n). The vector X is the voice collected by the microphone, and the vector N is the noise collected by the microphone. If the processing coefficient is vector B, then y= (x+n) ×b, n= (Y-x×b)/B, and noise in the frequency range of 0 to F1 can be obtained. Then, noise N1 having a frequency range of 0 to F2 can be obtained by linear prediction. The linear prediction algorithm may refer to the linear prediction algorithm in the prior art, and will not be described herein.

In one possible implementation, before determining the noise signal in the first speech signal using the second vibration signal, the audio acquisition device 100 includes: the first speech signal is signal processed.

Alternatively, processing the first speech signal may include amplifying, filtering, etc. the first speech signal. For specific reference, the above processing procedure of the first vibration signal is omitted here.

S104, the audio acquisition device 100 separates the noise signal in the first voice signal, obtains a second voice signal and sends the second voice signal to the electronic device.

When it is predicted that the noise signal in the full frequency band range (for example, the frequency range shown in fig. 9 is 0-F2) of the voice signals collected in the audio collection apparatus 100 is N1, the voice signal is X2, and then the human voice signal is x3=x2-N1, that is, the second voice signal.

According to the audio collection method, after the collected vibration signals are processed by the audio collection device 100, noise is filtered from the voice signals according to the vibration signals. Because the vibration signals have no noise signal interference, the frequency of the vibration signals of the human body acoustic vibration bone blocks is within 2KHz, and the frequency of the voice signals is between 20Hz and 20 KHz. The vibration signal can thus be used to determine the noise signal level in the speech signal at frequencies within 2 KHz. And then determining the noise signal size contained in the voice signal within the frequency range of 20 Hz-20 KHz according to linear prediction, so that the noise signal in the voice signal can be filtered. In this way, the influence of noise signals can be effectively reduced, so that the user can interact with the electronic device 200 more effectively, and the user experience is improved.

Fig. 10 is an interaction schematic diagram of an audio collection method according to an embodiment of the present application. In the embodiment of the present application, the following scenario is taken as an example: the audio collection device 100 is a pair of glasses 101, and the electronic device 200 is a mobile phone. When the mobile phone 1 and the mobile phone 2 perform voice call, the user of the mobile phone 1 inputs a voice signal to the mobile phone 1 through the glasses 101. After the glasses 101 collect the voice signal, the voice signal is processed, so that the influence of the noise signal can be reduced. Thereby enabling a smoother call between the mobile phone 1 and the mobile phone 2. As shown in fig. 10, the method specifically includes:

S301, the glasses 101 are in communication connection with the mobile phone 1.

Specifically, the glasses 101 may establish a Wi-Fi connection or a bluetooth connection with the mobile phone 1.

S302, triggering the glasses 101 to start executing step S303.

In this embodiment, step S302 may include two ways to trigger the glasses 101 to execute step S303. This way of triggering the glasses 101 to perform step S303 can be seen in particular from steps S302a and S302b.

Mode one:

s302a, the glasses 101 receive a first user operation.

The glasses 101 may receive a first user operation, which may be used to trigger the glasses 101 to perform step S303. The first user operation may specifically be a single click or double click of the glasses 101 by the user, or a long press of a start key of the glasses 101, etc., which is not limited herein.

Mode two:

s302b, the glasses 101 receive the first instruction sent by the mobile phone 1.

The glasses 101 may receive a first instruction sent by the mobile phone 1, where the first instruction is used to trigger the glasses 101 to execute step S303.

It will be appreciated that in one possible implementation, the handset receives a second user operation, and in response to the second user operation, sends a first instruction to the glasses 101. The second user operation may be the user making a voice call on the handset 1 or starting a handset assistant APP installed in the handset 1.

S303, the glasses 101 collect a first nasal bone vibration signal and a first voice signal of a user.

Step S303 may refer to the description in step S101, and will not be described herein.

S304, the glasses 101 perform signal processing on the first vibration signal to obtain a second vibration signal, and process the first voice signal according to the second vibration signal to obtain a second voice signal.

Step 304 may refer to the descriptions in steps S102-S104, and will not be described here.

S305, the glasses 101 send a second voice signal to the mobile phone 1

S306, the mobile phone 1 receives the second voice signal and encodes the second voice signal to obtain the first voice data.

After receiving the voice signal, the mobile phone 1 may encode the voice signal to obtain first voice data. The first voice data may be a modulated pulse code file. The modulated pulse encoded file is a file in the form of a digital signal containing only 0 and 1.

S307, the mobile phone 1 sends the first voice data to the mobile phone 2.

S308, the mobile phone 2 receives the first voice data.

In one possible implementation, after the glasses 101 perform S303, the first vibration signal and the first voice signal are directly transmitted to the mobile phone 1, and then the mobile phone 1 performs step S304 and step S305.

In one possible implementation, after the mobile phone 1 receives the second voice signal, the voice assistant application in the mobile phone 1 performs voice recognition on the second voice signal, and displays the recognition result in the user interface. The voice assistant application may also execute voice commands carried in the recognition results. For example, the voice command carried in the recognition result is to check weather, and the voice assistant application program may display the weather inquiry result in the user interface, or want the user to report the weather inquiry result.

Embodiments also provide a computer readable storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform one or more steps of any of the methods described above.

Embodiments of the present application also provide a computer program product comprising instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

The foregoing is merely a specific implementation of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the protection scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. The audio acquisition system is characterized by comprising an audio acquisition device and a first electronic device, wherein the audio acquisition device and the first electronic device are in communication connection; wherein,

the audio acquisition device is configured to determine, according to a second vibration signal, a second noise signal in the first speech signal, where the second noise signal is the same as the second vibration signal in frequency range, and the frequency range of the first speech signal is greater than the frequency range of the second vibration signal;

the audio acquisition equipment is used for carrying out linear prediction on the second noise signal to obtain a first noise signal in the complete frequency range of the first voice signal; the audio acquisition device is used for separating the first noise signal in the first voice signal to obtain a second voice signal and sending the second voice signal to the first electronic device;

the first electronic device is configured to receive the second voice signal.

2. The system according to claim 1, wherein the audio acquisition device is specifically configured to:

filtering the low-frequency signal in the first vibration signal to obtain a third vibration signal;

determining a conjugate coefficient H of a channel coefficient H in the first vibration signal acquisition channel ^* ；

Convolving said third vibration signal with a conjugate coefficient H ^* A second vibration signal is obtained.

3. The system according to claim 1, wherein the audio acquisition device is specifically configured to:

receiving a first user operation of a user before the first voice signal and the first vibration signal are collected;

the first vibration signal and the first voice signal are acquired in response to the first user operation.

4. A system according to any of claims 1-3, characterized in that the audio acquisition device is specifically adapted to:

and receiving a first instruction sent by the first electronic equipment, wherein the first instruction is used for indicating the audio acquisition equipment to start to acquire a first voice signal and a first vibration signal of a human vocal part vibration bone block.

5. The system of claim 4, wherein the first electronic device is further configured to:

Receiving a second user operation of the user;

and responding to the second user operation, and sending a first instruction to the audio acquisition equipment.

6. The system of claim 5, wherein the first electronic device is further configured to:

and responding to the second user operation, starting a voice call with the second electronic equipment or starting a voice assistant in the first electronic equipment.

7. An audio acquisition method, comprising:

the method comprises the steps that an audio acquisition device acquires a first voice signal and a first vibration signal of a human body vocal part vibration bone block;

the audio acquisition equipment performs signal processing on the first vibration signal to obtain a second vibration signal;

the audio acquisition equipment determines a second noise signal which is the same as the second vibration signal in the frequency range in the first voice signal according to the second vibration signal; the frequency range of the first voice signal is larger than the frequency range of the second vibration signal;

the audio acquisition equipment carries out linear prediction on the second noise signal to obtain a first noise signal in the complete frequency range of the first voice signal;

the audio acquisition device separates the first noise signal in the first voice signal to obtain a second voice signal and sends the second voice signal to the electronic device, wherein the audio acquisition device is in communication connection with the electronic device.

8. The method of claim 7, wherein the audio acquisition device acquiring the first speech signal and the first vibration signal of the human vocal tract vibrating the bone pieces comprises:

the audio acquisition device receives a first user operation of a user before acquiring the first voice signal and the first vibration signal;

in response to the first user operation, the audio acquisition device acquires the first vibration signal and the first voice signal.

9. The method of claim 7, wherein prior to the audio acquisition device acquiring the first speech signal and the first vibration signal of the human vocal cords vibrating the bone pieces, comprising:

the audio acquisition device receives a first instruction sent by the electronic device, wherein the first instruction is used for indicating the audio acquisition device to start to acquire a first voice signal and a first vibration signal of a human vocal part vibration bone block.

10. The method according to any one of claims 7-9, wherein the audio acquisition device performing signal processing on the first vibration signal to obtain a second vibration signal, comprising:

the audio acquisition equipment filters a low-frequency signal in the first vibration signal to obtain a third vibration signal;

The audio acquisition device determines a conjugate coefficient H of a channel coefficient H in the first vibration signal acquisition channel ^* ；

The audio acquisition device convolves the third vibration signal with a conjugate coefficient H ^* A second vibration signal is obtained.

11. An audio acquisition device, comprising: a communication interface, a memory, and a processor; the communication interface, the memory being coupled to the processor, the memory being for storing computer program code, the computer program code comprising computer instructions which, when read from the memory by the processor, cause the audio acquisition device to perform the method according to any one of claims 7-10.

12. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 7 to 10.