CN110708625A

CN110708625A - Intelligent terminal-based environment sound suppression and enhancement adjustable earphone system and method

Info

Publication number: CN110708625A
Application number: CN201910911736.7A
Authority: CN
Inventors: 陈闻杰; 张子默; 陈杰; 沙奕兰
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-01-17

Abstract

The invention discloses an intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, which comprises: a sound collection device for collecting ambient sound signals; the intelligent terminal generates a corresponding suppression signal of the same-amplitude reverse sound wave according to the noise by distinguishing the noise and the information sound in the environment sound signal, and generates a corresponding enhancement signal of the same-amplitude same-direction sound wave according to the information sound; and the sound output device is used for outputting the sound signal of the sound source and simultaneously outputting the suppression signal and/or the enhancement signal. The system of the invention generates corresponding same-amplitude reverse sound waves to the noise to offset the noise; the information sounds are either not suppressed, are "pierced" through the earpiece, or are appropriately enhanced. The earphone plays the sound source, the noise reverse phase sound and the enhanced information sound, so that the listener can enjoy the sound source sound and shield the noise without influencing the input of the information sound. The invention also discloses a suppression and enhancement adjustable method.

Description

Intelligent terminal-based environment sound suppression and enhancement adjustable earphone system and method

Technical Field

The invention belongs to the technical field of electronic appliances, and particularly relates to an intelligent terminal-based environment sound filtering and enhancing adjustable earphone system and method.

Background

Earphones are sound playing devices worn against the ears. The earphone is connected to a terminal (hereinafter referred to as a "sound source terminal") which generates sound such as a radio, a CD player, a computer, a mobile phone, etc., and is widely used for listening to broadcasting, enjoying music, learning foreign languages, etc. Compared with the externally-placed sound equipment, the earphone has small interference to other people, and is suitable for occasions used by a single person.

Generally, the sound played by the earphone comes from the sound source terminal. In addition to the sound (sound source sound) emitted from the sound source device, the human ear can also hear the environmental sound. Typically, these ambient sounds are defined as noise, which may interfere with the sound source sound heard by the human ear, especially in a noisy environment.

The designer of the headset reduces the noise interference with the sound source sound by suitable methods, including passive noise reduction and active noise reduction.

Passive noise reduction is mainly achieved by enhancing the sealing between the earphone and the ear canal to block external noise. The active noise reduction is realized by adding a microphone on the outer side of the earphone to monitor the environmental noise, and the signal processing chip is used for generating reverse sound waves with the same amplitude and opposite phases of the environmental noise, so that the effect of offsetting the environmental noise is achieved.

In real life, we have found that ambient sound is not necessarily noise that must be filtered, which is not useful at all. There are occasions when it is necessary to hear external sounds, such as:

(1) when the earphone is worn beside a road in a city to run and exercise or walk to leisure, if the sound of automobile running cannot be heard, important prompting sounds such as horn sound, brake sound and the like are dangerous.

(2) When a bus or a subway is in a sitting state, if the user wears the earphones, the user cannot hear the stop-reporting sound, and the user can easily sit and stand.

(3) Listening to music helps to improve efficiency when working in a scene (e.g., writing a program or document). However, if the user needs to communicate with surrounding colleagues briefly, the user needs to take off the earphone first. It is troublesome to wear and remove the garment frequently.

(4) In a factory scene, some production environments are relatively noisy, earplugs need to be equipped from the viewpoint of labor protection, but the sounds of colleagues cannot be heard, and the communication is influenced.

(5) At home, when the mother wears the earphones to listen to songs, the sound of the mother calling us to eat is not heard.

From the above scenario, we can see that the environmental sound includes some useful sounds besides useless noise, and provides information (informative sound) in terms of alarm, communication and the like for us.

Therefore, the invention is proposed, and the basic idea is to intelligently distinguish the environmental sound into noise and information sound, suppress the noise and enhance the information sound. Because of strong intelligence requirements and user customization, the intelligent processing part is put in the terminal for processing, that is, the sound source terminal is an intelligent terminal, such as a mobile phone, a computer, a tablet computer, an intelligent music playing device and the like. Along with the popularization of intelligent terminals such as smart phones and the like, the strong computing power of the intelligent terminals is utilized to carry out relevant processing, an earphone end only needs a microphone for collecting environmental sound, an expensive special processing device with a single function required by the existing active noise reduction earphone is not needed, and the cost is reduced to a great extent. The software development and data management of the intelligent terminal are utilized to realize the customization of the requirements, and the flexibility of the system is improved to a great extent.

Disclosure of Invention

In view of the above problems, the present invention provides an ambient sound suppression and enhancement adjustable earphone system based on an intelligent terminal, including: a sound collection device for collecting ambient sound signals; the intelligent terminal is connected with the sound acquisition device, generates a corresponding suppression signal of the same-amplitude reverse sound wave according to the noise by distinguishing the noise and the information sound in the environment sound signal, and generates a corresponding enhancement signal of the same-amplitude same-direction sound wave according to the information sound; and the other intelligent terminals are connected and used for outputting sound source sound signals and simultaneously outputting suppression signals and/or enhancement signals.

In the ambient sound suppression and enhancement adjustable headphone system based on an intelligent terminal provided by the present invention, the intelligent terminal includes: the environment sound signal separation module is connected with the sound acquisition device and used for receiving the environment sound signals and distinguishing the noise of the environment sound and the information sound in the sound source sound signals; the suppression module is connected with the environmental sound signal separation module and used for generating a corresponding suppression signal of the same-amplitude reverse sound wave according to noise; the enhancement module is connected with the environment sound signal separation module and used for generating corresponding enhancement signals of the same-amplitude same-direction sound waves according to the information sound; the audio playing module is used for outputting sound source sound signals; and the sound synthesis module is respectively connected with the suppression module, the enhancement module and the audio playing module, is connected with the sound output device, and is used for synthesizing and outputting the sound source sound signal, the suppression signal and/or the enhancement signal to the sound output device.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the intelligent terminal further comprises a database connected with the environment sound signal separation module, and data for distinguishing noise and information sound and a trained model for separation are stored in the database.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the database is further connected with a self-learning module, and the self-learning module uses sound data in the database, borrows calculation power of a server end, learns and trains related noise signals and information sound signals and establishes a customized model.

In the ambient sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the database comprises: a general database, a personal database and a model;

the general database is stored at the server side and comprises general sound data such as automobile knocking and the like. The personal database is customized by the user and stores the sound information of a specific sound source. The model is transmitted to the intelligent terminal for separating and identifying the environmental sound after the training of the server terminal is completed. The general database and the model are updated by the server irregularly, while the personal database needs to be updated by the user through the data transmitted by the matched software on the intelligent terminal.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the intelligent terminal can use a trained model to separate environment sound, and respectively suppress or enhance the separated audio frequency according to the type of the environment sound.

Based on the system, the invention also provides an adjustable method for inhibiting and enhancing the environmental sound based on the intelligent terminal, which comprises the following steps:

the method comprises the following steps: collecting an ambient sound signal by a microphone;

step two: the environment sound separation module detects noise and information sound in the environment sound signal;

step three: the suppression module generates a corresponding suppression signal of the same-amplitude reverse sound wave according to the noise, and/or the enhancement module generates a corresponding enhancement signal of the same-amplitude same-direction sound wave according to the information sound;

step four: the sound source sound signal is output, and the suppression signal and/or the enhancement signal are output simultaneously. The method is characterized by further comprising the following steps: the noise and the information sound in the ambient sound signal are distinguished from each other on the basis of the database.

According to the method for adjusting the environmental sound suppression and enhancement based on the intelligent terminal, self-learning is completed in the server, new data are acquired at the server, or new model parameters are output to the user side after the model parameters are improved. The user end only needs to obtain the latest model parameters regularly. The model training is carried out in three parts, and the details are as follows:

the method comprises the following steps: and acquiring data and preprocessing the data. The data set used is a data set publicly available on the internet, such as TIMIT, WSJ0, THCHS-30, and the like. The data set will continue to expand and any two audio frequencies can be mixed to form a new mixed audio frequency, thus being sufficient to form enough training audio. The same preprocessing is performed on each training audio, which includes:

1. pre-emphasis: the high-frequency part of the voice is emphasized, the influence of lip radiation is removed, and the high-frequency resolution of the voice is increased.

2. Resampling: the sampling rate of the original audio is changed, and the further processing of the data is facilitated.

3. Framing: according to the short-time stationarity of the waveform, the audio frequency is divided into small segments, and the calculation amount is reduced.

4. Windowing: the frequency spectrum is multiplied by a window function to characterize the signal as part of a periodic function.

5. And (3) end point detection: and detecting whether signal sound exists in each frame.

6. Short-time Fourier transform: the time-domain spectrum is converted to a time-frequency spectrum.

Step two: and transmitting the preprocessed data serving as input into a neural network, and calculating original data frequency spectrum embedding serving as sound characteristics of original audio.

Step three: and clustering the sound characteristic space, wherein each cluster is the sound source characteristic of one sound source.

Compared with the prior art, the invention has the following beneficial technical effects:

1) the ambient sound is divided into noise and information sound. Noise reduction processing is performed on the noise, and useful information sound is enhanced.

2) When the earphone is worn, useful warning sound and prompt sound can still be heard, and danger or bus stop missing and the like are avoided.

3) The earphone is not required to be taken off, the earphone can be communicated with people around, the operation is convenient, and when the noise is reduced, the sound related to or concerned by the user can not be leaked.

4) The cost is low by utilizing the calculation performance of the intelligent terminal.

5) The core processing unit is realized by software, is flexible and can be customized.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Fig. 1 is a schematic structural diagram of an adjustable earphone system for ambient sound suppression and enhancement based on an intelligent terminal in an embodiment.

Fig. 2 is a schematic flowchart of an adjustable ambient sound suppression and enhancement method based on an intelligent terminal in an embodiment.

Detailed Description

The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the technical solution of the embodiment of the present invention, the ambient sound suppression and enhancement adjustable headphone system based on an intelligent terminal of the present invention includes: a sound collection device for collecting ambient sound signals; the intelligent terminal is connected with the sound acquisition device, generates a corresponding suppression signal of the same-amplitude reverse sound wave according to the noise by distinguishing the noise and the information sound in the environment sound signal, and generates a corresponding enhancement signal of the same-amplitude same-direction sound wave according to the information sound; and the other intelligent terminals are connected and used for outputting sound source sound signals and simultaneously outputting suppression signals and/or enhancement signals.

Specifically, referring to fig. 1, the smart terminal includes: the environment sound signal separation module is connected with the sound acquisition device and used for receiving the environment sound signals and distinguishing the noise of the environment sound and the information sound in the sound source sound signals; the suppression module is connected with the environmental sound signal separation module and used for generating a corresponding suppression signal of the same-amplitude reverse sound wave according to noise; the enhancement module is connected with the environment sound signal separation module and used for generating corresponding enhancement signals of the same-amplitude same-direction sound waves according to the information sound; the audio playing module is used for outputting sound source sound signals; and the sound synthesis module is respectively connected with the suppression module, the enhancement module and the audio playing module, is connected with the sound output device, and is used for synthesizing and outputting the sound source sound signal, the suppression signal and/or the enhancement signal to the sound output device. Preferably, the sound collection device and the sound output device are integrated into a whole. As shown in fig. 1, in the present embodiment, the sound collection device and the sound output device are respectively composed of a microphone and a speaker. The microphone is positioned outside the earphone and is mainly used for collecting environmental sound, distinguishing a noise part and an information sound part in the environmental sound through processing of the intelligent terminal, generating corresponding same-amplitude reverse sound waves for the noise and offsetting the noise; the information sounds are either not suppressed, are "pierced" through the earpiece, or are appropriately enhanced. The earphone plays the sound source, the noise reverse phase sound and the enhanced information sound, so that the listener can enjoy the sound source sound and shield the noise without influencing the input of the information sound.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the intelligent terminal further comprises a database connected with the environment sound signal separation module, and data used for distinguishing noise and information sound is stored in the database.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the database is further connected with a self-learning module, and the self-learning module is used for storing trained data for distinguishing noise and information sound into the database through a mobile phone environment sound signal, a noise signal and an information sound signal related to learning and training and establishing a customized model.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the self-learning system uses an auditory attention selection model. When the model is trained in advance, the audio frequency only containing the voice of a specific speaker needs to be transmitted, the model automatically extracts the vocal print characteristics in the voice of the speaker and deposits the vocal print characteristics into a long-term memory unit in a database, and therefore learning and memorizing of the specific voice characteristics are completed. Each model training will deepen the memory of the corresponding sound characteristics and strengthen the recognition effect of the sound. By means of the self-learning function, the voiceprint characteristics of the specific speaker stored in the database can be identified and amplified when the corresponding voice is received next time.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the database can store some common, general and important sound characteristics (such as automobile whistling, braking sound, alarm sound and the like) as sound sources needing to be amplified in advance through the self-learning method, so that the workload of users is reduced. And on the intelligent terminal, allowing the user to add a new sound source (such as relatives and friends of the user) by himself, only needing to input a certain length of audio, extracting the voiceprint characteristics of the new biogenic source can be completed, then the voiceprint characteristics are stored in the database by the intelligent terminal for later use, and if the system detects the audio which is the same as the voiceprint characteristics, the system automatically amplifies or reduces the audio. And the audio will in turn adjust the specific parameters of the voiceprint feature, enhancing the training effect.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the database consists of two parts. The first part is a universal database which is self-contained when the system leaves a factory and stores sound characteristics of various common sound sources, such as various automobile whistling sounds, door knocking sounds, alarm sounds, brake sounds, automobile driving sounds and the like. After the universal database leaves the factory, the universal database can still be updated uniformly in a matched software updating mode on the terminal, and the universal database of a user is ensured to be kept in the latest state.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the second part of the database is a personal database which can be customized by a user, and the user can input a certain amount of audio of a specific sound source (such as family sound, colleague sound and the like) into the personal database through a data entry function in terminal software. Thereby enabling the self-learning module to learn the sound characteristics of the particular sound source. And then when the environmental sound separation module detects the sound source in the personal database, the environmental sound separation module judges the sound source as information sound and instructs an enhancement module to enhance the sound.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the third part of the database stores model parameters used by the environment sound separation module. The initial training is the adjusted general model before delivery, and when the user uses the model later, the data in the personal database and the general database can be updated through the self-learning module, and the existing model parameters are optimized, so that a better effect is achieved.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the database is connected with the self-learning module, and the self-learning module updates the model by learning related noise signals and information sound signals in the database so as to maintain the long-term effectiveness of the model. The learning and updating of the model are completed at the server side, and the server side returns the learned new model parameters to the intelligent terminal so as to avoid the problem of insufficient computing power of the intelligent terminal.

In the environment sound suppression and enhancement adjustable earphone system based on the intelligent terminal, the server-side learning model uses a 'deep clustering' technology which reveals head angles in a plurality of machine learning directions in recent years, a neural network is trained to generate spectrum embedding with certain resolution capability on a partition label of input audio, then the generated spectrum embedding is subjected to traditional clustering, a waveform mask generated by clustering can be directly used as a filtering mask, the filtering mask is used for filtering original audio, the filtered audio is a plurality of sub-audios obtained by separating audio containing a plurality of sound sources, and then the sub-audios are identified to judge whether the sub-audios belong to noise to be suppressed or signal sound to be amplified. The model parameters are modified based on error back propagation during the separation process. After the training is completed, the server sends a request for updating the model to the intelligent terminal of the user at a proper time.

In the intelligent terminal-based environment sound suppression and enhancement adjustable earphone system, the environment sound signal separation module imitates the human attention selection mechanism, after receiving a section of audio signal, the voiceprint characteristics of the audio signal are extracted, then the database is traversed, the voiceprint characteristics are matched with the voiceprint characteristics in the database, and if a sound source with higher matching degree can be found, the part of audio is amplified or reduced and played. The attention selection mechanism is used to screen out the voice frequency of the concerned speaker from the mixed voice of a plurality of speakers.

The invention also provides an adjustable method for inhibiting and enhancing the environmental sound based on the intelligent terminal, which comprises the following steps:

step four: the sound source sound signal is output, and the suppression signal and/or the enhancement signal are output simultaneously.

In the method for adjusting environmental sound suppression and enhancement based on the intelligent terminal, the second step further comprises: the noise and the information sound in the ambient sound signal are distinguished from each other on the basis of the database.

The self-learning is completed in the server, new data is obtained at the server side, or after the model parameters are improved, the new model parameters are output to the user side. The user end only needs to obtain the latest model parameters regularly. The model training is carried out in three parts, and the details are as follows:

The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any way. It will be understood by those skilled in the art that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An adjustable headphone system of ambient sound suppression and enhancement based on intelligent terminal, characterized by, includes:

a sound collection device for collecting ambient sound signals;

the intelligent terminal is connected with the sound acquisition device, generates a corresponding suppression signal of the same-amplitude reverse sound wave according to the noise by distinguishing the noise and the information sound in the environment sound signal, and generates a corresponding enhancement signal of the same-amplitude same-direction sound wave according to the information sound;

and the other intelligent terminals are connected and used for outputting sound source sound signals and simultaneously outputting suppression signals and/or enhancement signals.

2. The intelligent terminal based ambient sound suppression and enhancement adjustable earphone system according to claim 1, wherein the intelligent terminal comprises:

the environment sound signal separation module is connected with the sound acquisition device and used for receiving the environment sound signals and distinguishing the noise of the environment sound and the information sound in the sound source sound signals;

the suppression module is connected with the environmental sound signal separation module and used for generating a corresponding suppression signal of the same-amplitude reverse sound wave according to noise;

the enhancement module is connected with the environment sound signal separation module and used for generating corresponding enhancement signals of the same-amplitude same-direction sound waves according to the information sound;

the audio playing module is used for outputting sound source sound signals;

and the sound synthesis module is respectively connected with the suppression module, the enhancement module and the audio playing module, is connected with the sound output device, and is used for synthesizing and outputting the sound source sound signal, the suppression signal and/or the enhancement signal to the sound output device.

3. The intelligent terminal based ambient sound suppression and enhancement adjustable earphone system according to claim 2, wherein the intelligent terminal further comprises a database connected to the ambient sound signal separation module, the database storing data for distinguishing noise and information sound and a trained model for separation.

4. The intelligent terminal based ambient sound suppression and enhancement adjustable earphone system according to claim 3, wherein the database is further connected with a self-learning module, and the self-learning module uses the sound data in the database, borrows the computational power of the server, and builds the customized model by learning and training the related noise signals and information sound signals.

5. The intelligent terminal-based ambient sound suppression and enhancement adjustable earphone system according to claim 4, wherein the database comprises: a general database, a personal database and a model;

the general database is stored at the server end and comprises general sound data;

the personal database is customized by a user and stores sound information;

and after the training of the model is completed by the server side, the model is transmitted to the model for separating and identifying the environmental sound of the intelligent terminal.

6. A smart terminal-based ambient sound suppression and enhancement adjustable earphone system as claimed in claim 5, wherein the generic database and model are updated by the server on an irregular basis, while the personal database requires the user to update by means of a companion software incoming data on the smart terminal.

7. The intelligent terminal-based ambient sound suppression and enhancement adjustable earphone system according to claim 5, wherein the intelligent terminal can use a trained model to separate the ambient sound, and suppress or enhance the separated audio respectively according to the type of the ambient sound.

8. An ambient sound suppression and enhancement adjustable method based on a smart terminal, which is characterized in that the ambient sound suppression and enhancement adjustable earphone system based on the smart terminal according to any one of claims 1-7 is adopted, and the method comprises the following steps:

the method comprises the following steps: collecting environmental sound signals through a sound collection device;

step two: the environment sound separation module detects noise and information sound in the environment sound signal; the second step comprises the following steps: distinguishing noise and information sound in the environmental sound signal according to the database;

9. The intelligent terminal based environmental sound suppression and enhancement adjustable method according to claim 8, wherein the self-learning is completed at the server, new data is acquired at the server, or new model parameters are output to the user side after the model parameters are improved; the user end only needs to obtain the latest model parameters regularly.

10. The intelligent terminal based environmental sound suppression and enhancement adjustable method according to claim 9, wherein model training is carried out in three parts:

the method comprises the following steps: acquiring data and preprocessing the data; the used data set is a data set which is publicly available on the Internet; the data set can be continuously expanded, and new mixed audio can be formed by mixing any two audios, so that enough training audio can be formed; the same preprocessing is performed on each training audio, which includes:

1. pre-emphasis: the high-frequency part of the voice is emphasized, the influence of lip radiation is removed, and the high-frequency resolution of the voice is increased;

2. resampling: the sampling rate of the original audio is changed, so that the further processing of data is facilitated;

3. framing: according to the short-time stationarity of the waveform, the audio frequency is divided into small segments, and the calculated amount is reduced;

4. windowing: multiplying the frequency spectrum by a window function to enable the signal to have partial characteristics of a periodic function;

5. and (3) end point detection: detecting whether signal sound exists in each frame;

6. short-time Fourier transform: converting the time domain spectrum into a time frequency spectrum;

step two: transmitting the preprocessed data serving as input into a neural network, calculating original data frequency spectrum embedding serving as sound characteristics of original audio;