CN115835080A - Intelligent transparent transmission system for earphone - Google Patents

Intelligent transparent transmission system for earphone Download PDF

Info

Publication number
CN115835080A
CN115835080A CN202211468553.0A CN202211468553A CN115835080A CN 115835080 A CN115835080 A CN 115835080A CN 202211468553 A CN202211468553 A CN 202211468553A CN 115835080 A CN115835080 A CN 115835080A
Authority
CN
China
Prior art keywords
sound
ear
right ear
signal
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211468553.0A
Other languages
Chinese (zh)
Inventor
陆成湘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zcan Microelectronics Technology Co ltd
Original Assignee
Shanghai Zcan Microelectronics Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zcan Microelectronics Technology Co ltd filed Critical Shanghai Zcan Microelectronics Technology Co ltd
Priority to CN202211468553.0A priority Critical patent/CN115835080A/en
Publication of CN115835080A publication Critical patent/CN115835080A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention captures the spatial audio in the direction of the appointed area by using the microphone array of the double-ear earphone and the complex Gaussian mixture model trained in advance in different directions, performs sound effect adjustment on the appointed type of sound in the direction of the appointed area by using a neural network intelligent model, performs sound effect balance adjustment processing on the double-ear sound channels, and then transmits the appointed type of sound in the appointed direction of the user to the user, thereby realizing intelligent and personalized ideal sound effect experience of different users and different application requirements in different environments, strengthening and extending the hearing aid capability of the earphone in daily life, and practically meeting the requirement of people for using the earphone as a daily life helper.

Description

Intelligent transparent transmission system for earphone
[ technical field ] A method for producing a semiconductor device
The invention belongs to the technical field of electronics, and particularly relates to an intelligent transparent transmission system for an earphone.
[ background of the invention ]
As one of the most common wearable electronic products for life, the earphone has become an indispensable living article for people, and thus is endowed with more and more functions: in order to avoid the interference of peripheral noise when the earphone is used for listening to music, the earphone has an active noise reduction (ANC) function; in order to prevent the environmental noise of a speaking party from being transmitted to a listening party during calling, the earphone is added with an environmental noise reduction (ENC) function; in order to clearly hear any external sound without removing the earphone, the earphone has a so-called "pass-through" function. However, the noise reduction and transmission functions of the earphone are greatly influenced by the complex conditions of environmental sounds, different wearing modes and different ear structures, which influences the listening experience of a user to a certain extent, so that the user is difficult to clearly hear the sound to be heard in many cases, for example, when the range hood is used in a kitchen to cook dishes, the user wants to listen to music and is not interfered by noise of the range hood, and simultaneously wants to hear crying sound or doorbell sound of children in a bedroom outside; for another example, when a person beside a living room watches TV, the person just wants to hear the TV sound, but does not want to hear the sound of the person beside the living room; for example, when a party wants to hear only the sound of chat, does not want to hear the sound of background music, etc., it is difficult for the ordinary earphone unvarnished transmission function to meet the user's requirement.
At present, the earphone microphone picks up environmental sounds, and then the signals are subjected to gain processing, analog-to-digital conversion, sampling filtering and transparent transmission filtering, and then the signals are subjected to digital-to-analog conversion and then played by an earphone loudspeaker, so that a user finally hears the simulated natural environment sounds. If the human voice enhancement type transparent transmission function is needed, the human voice enhancement type transparent transmission function can be realized by setting a reasonable pass band of the band-pass filter.
In order to further satisfy the user demand and improve the user experience, people in the industry at present propose a method for judging the sound source direction by using the magnitude of the environmental sound signals picked up by the left and right microphones of the earphone, matching corresponding target sound effect signals according to the judged sound source direction and the corresponding relationship between the azimuth information of the sound source in different azimuths and the corresponding sound effect signals, and then controlling the loudspeaker of the earphone to play the sound with the corresponding sound effect according to the target sound effect signals.
However, the above-mentioned earphone pass-through function is to clearly learn the environmental sound or enhance only the voice in the environment without taking off the earphone after entering the pass-through mode, and the user cannot selectively obtain the sound in different designated directions, and cannot selectively obtain the designated sound types except the voice, so that the earphone pass-through function cannot flexibly meet the personalized application requirements of the user in different environments, and cannot extend the hearing-aid function of the earphone.
[ summary of the invention ]
The invention aims to provide an intelligent transparent transmission system of an earphone, which is used for solving the problem that the prior art can not flexibly meet the requirements of different users on different hearing aid effects under different scenes and can not enable the users to obtain individualized ideal sound effect experience.
In order to achieve the above object, an intelligent transparent transmission system for earphones according to the present invention includes:
the left and right ear microphone arrays are arranged in bilateral symmetry corresponding to the ears and comprise a plurality of microphones for capturing audio signals;
the left and right ear space audio capturing module comprises a plurality of complex Gaussian mixture intelligent models corresponding to the space direction, the left and right ear space audio capturing module respectively carries out FFT (fast Fourier transform) on sound signals collected by a left and right ear microphone array and then simultaneously inputs the sound signals to each complex Gaussian mixture intelligent model, each complex Gaussian mixture intelligent model analyzes and processes the input sound signals according to the sound direction selected by a user to capture the space audio signals in the appointed direction, simultaneously calculates the likelihood value that the direction represented by each complex Gaussian mixture intelligent model is consistent with the current sound source direction so as to obtain the weight coefficient of the sound source in each direction, and carries out weighting processing after IFFT (inverse fast Fourier transform) on the space audio signals captured by each complex Gaussian mixture intelligent model so as to respectively obtain the left ear sound signals and the right ear sound signals in the appointed area direction;
left and right ear system space audio effect regulating module, including a neural network intelligent model, left and right ear system space audio effect regulating module carries out MFCC feature extraction and utilizes the filter bank to obtain the audio signal of different frequency channels to left ear sound signal and right ear sound signal of left and right ear space audio capture module output respectively, the characteristic parameter input that MFCC drawed carries out analysis processes for neural network intelligent model, carry out the audio enhancement to the sound of selected appointed kind, carry out the audio suppression to the sound of non-appointed kind, thereby obtain the left ear signal gain G of signal on different frequency channels ALi And the right ear signal gain G ARi And the left ear signals S on all frequency bands obtained after the processing of the filter bank are processed simultaneously FLi And the right ear signal S FRi The left ear signal S on each frequency band FLi And the right ear signal S FRi Corresponding left ear signal gain G ALi And the right ear signal gain G ARi Multiplying and outputting sound signals of specified types in the specified area direction after sound effect adjustment, namely left ear sound signals S in each frequency band OLi With the right ear sound signal S ORi
A sound channel balance module for balancing the input left ear sound signal S OLi With the right ear sound signal S ORi Firstly, the sound signal S of the left ear is obtained OLi With the right ear sound signal S ORi And (4) carrying out balance adjustment on the volume of the left ear and the volume of the right ear after each frame of the volume of each frequency band.
According to the above main feature, each microphone array comprises four microphones, wherein the second microphone and the third microphone are external microphones, the fourth microphone is a speech microphone for capturing the ambient sound in the outside space, and the first microphone is an internal microphone for capturing the sound in the ear canal.
According to the above main feature, the left and right ear space audio capturing modules each include 14 plural gaussian mixture models corresponding to 14 directions of a predefined auditory field space, wherein the auditory field space is represented by a sphere with an artificial center, and the space is divided into 14 directions in an axisymmetric manner according to an upper, lower, left, right, front, rear, upper left front space center direction, upper right front space center direction, upper left rear space center direction, upper right rear space center direction, lower left front space center direction, lower right front space center direction, lower left rear space center direction, and lower right rear space center direction.
According to the main characteristics, the 14 complex Gaussian mixture intelligent models of the left and right ear space audio capturing modules respectively correspond to 14 auditory space directions, each complex Gaussian mixture intelligent model analyzes and processes the input sound signals and captures the space audio signals according to the sound direction selected by the user, 14 paths of captured sound signals are output, the likelihood value that the direction represented by each complex Gaussian mixture intelligent model is consistent with the current sound source direction is calculated, and the likelihood value is used for calculating the likelihood valueNormalization processing is carried out on the 14 likelihood values to obtain the weight of the sound source in each direction, and the weight coefficient omega in the unselected direction k K, weight coefficient ω in selected direction k And (k) carrying out IFFT (inverse fast Fourier transform) processing on the sound signals captured by each complex Gaussian mixture intelligent model and then carrying out weighting calculation to obtain left ear sound signals S in the direction of the selected specified area SL With the right ear sound signal S SR The weighted operation formula is as follows:
Figure BDA0003957494310000041
wherein S k A sound signal captured for each CGMM smart model.
According to the main characteristics, when the left and right ear space sound effect adjusting modules extract MFCC characteristics, the characteristics are extracted according to frames, each frame is divided into n (n = 20-40) frequency bands every 10ms, and each frequency band is provided with 160 sampling points.
According to the main characteristics, the intelligent transparent transmission system of the earphone needs to train the NN model so as to support the selected transparent transmission sound types, and simultaneously can train the NN model according to the needs of the user, update the parameters of the NN model of the earphone and provide the selectable sound types according to the needs.
According to the main characteristics, the spatial sound effect adjusting module acquires the left ear signals S of different frequency bands by frames by utilizing a Filter Bank FLi And the right ear signal S FRi Wherein i = 1-n, n is the number of frequency bands divided by each frame signal, and is an integer between 20 and 40.
According to the above main feature, the sound channel balancing module calculates the input left ear sound signal S according to the following formula OLi With the right ear sound signal S ORi Volume of each frequency band of each frame:
Figure BDA0003957494310000051
where j is the sound per frameSampling points of the sound signal on a frequency band, and then calculating the volume E of each frequency band of each frame of the left ear and the right ear Li /E Ri And carrying out balance adjustment on the volumes of the left ear and the right ear, wherein the reference standard of the adjustment is that the maximum difference value BEmax of the volumes of the left ear and the right ear on a passband obtained by actual test is taken as a boundary value, and the volume difference of the two ears cannot exceed the boundary value. If the volume difference between the two ears is within the boundary value, the expectation is met, and balance adjustment is not needed; if the volume difference between the two ears exceeds the boundary value, the corresponding frequency band signal S is correspondingly adjusted OLi /S ORi Equilibrium coefficient of (G) Li /G Ri Will balance the coefficient G Li /G Ri As a corresponding left ear sound signal S OLi With the right ear sound signal S ORi Weight coefficient for signal balance adjustment of each frequency band, for left ear sound signal S OLi With the right ear sound signal S ORi Weighting to obtain final playing sound, so as to realize the aim of left and right ear sound effect balance, and the specific balance adjusting method comprises the following steps:
setting: e diff =|E Li -E Ri |;
When E is diff <BEmax, no adjustment is required;
when E is diff >At BEmax, set G i =(1/2)*(E diff– BEmax)
When E is Li >E Ri Then G Li =1-G i /E Li ;G Ri =1+G i /E Ri
When E is Li <E Ri Then G Li =1+G i /E Li ;G Ri =1-G i /E Ri
Then, the sound signals of each frame and each frequency range are weighted to obtain the target sound signal
Figure BDA0003957494310000061
According to the main characteristics, the earphone intelligent transparent transmission system is matched with an application program, and a user can select the sound direction and the sound type through the application program.
Compared with the prior art, the method has the advantages that the spatial audio in the direction of the specified area is captured by using the microphone array of the double-ear earphone and the pre-trained complex Gaussian mixture model in different directions, the sound effect of the specified type of sound in the direction of the specified area is adjusted by using a neural network intelligent model, the sound effect balance adjustment processing is carried out on the double-ear sound channel, and then the specified type of sound in the direction specified by the user is transmitted to the user, so that the intelligent and personalized ideal sound effect experience of different users and different application requirements in different environments is realized, the hearing aid capability of the earphone in daily life is enhanced and extended, and the requirement that people use the earphone as a daily life helper is practically met.
[ description of the drawings ]
Fig. 1 is a schematic diagram of a composition framework of an intelligent transparent transmission system for an earphone according to the present invention.
Fig. 2A and fig. 2B are distribution diagrams of microphone arrays of the earphone, respectively.
Fig. 3A, 3B, and 3C are schematic diagrams of a defined listening space.
Fig. 4 is a schematic diagram of a setting flow in a specific use.
Fig. 5 is a schematic diagram of the working principle of the spatial audio capturing module.
FIG. 6 is a schematic diagram of the operation principle of the system spatial sound effect adjusting module.
Fig. 7 is a schematic diagram of the operation principle of the left and right ear channel balance module.
[ detailed description ] embodiments
Fig. 1 is a schematic diagram of a frame of an intelligent transparent transmission system for an earphone according to the present invention. The earphone intelligent transparent transmission system comprises a left ear microphone array, a right ear microphone array, a spatial audio capture module, a system spatial sound effect adjusting module and a left ear and right ear sound channel balancing module. The composition and function of each module will be described in detail below.
As shown in fig. 2A and 2B, in an implementation, the left and right ear microphone arrays are symmetrically disposed corresponding to two ears, each microphone array includes four microphones, wherein the second microphone M2 and the third microphone M3 are external microphones, the fourth microphone M4 is a speech microphone for capturing ambient sounds in the outside space, and the first microphone M1 is an internal microphone for capturing sounds in the ear canal.
The left and right ear microphone arrays respectively acquire left ear environment sound signals S LM1 ~S LM4 With the right ear ambient sound signal S RM1 ~S RM4 And respectively input to the left ear space audio capture module and the right ear space audio capture module.
The left ear space audio capturing module and the right ear space audio capturing module each include a plurality of Complex Gaussian Mixture models (CGMM smart models), the CGMM smart models are Gaussian Mixture Models (GMMs) trained by using Complex Model coefficients, the left ear space audio capturing module and the right ear space audio capturing module perform FFT (Fast Fourier Transform) on the inputted sound signals, the FFT-transformed four signals are simultaneously inputted to 14 CGMM smart models of the left ear space audio capturing module and the right ear space audio capturing module, each CGMM smart Model analyzes and processes the inputted signals according to the sound direction selected by the user to capture the spatial audio signals in the designated direction, and calculates likelihood values of the direction represented by each CGMM smart Model and the current direction to obtain weighting coefficients in each direction, the spatial audio signals captured by each CGMM smart Model are weighted and processed by IFFT (IFFT) to obtain weighted coefficients in the left ear space audio signal, and the Inverse Transform is performed on the spatial audio signals in the designated direction to obtain weighted coefficients in each direction, and the spatial audio signals captured by each CGMM smart Model are weighted and processed by IFFT (IFFT) to obtain weighted coefficients in the left ear space audio signals in the designated direction, and the Inverse Transform regions SL With the right ear sound signal S SR
In specific implementation, a listening space is defined, as shown in fig. 3A, 3B and 3C, and is a schematic view of the defined listening space, specifically, a sphere with an artificial center is used to represent the listening space of a user, the space is divided into 14 directions in an axisymmetric manner according to an upper, lower, left, right, front, rear, upper left front space center direction, upper right front space center direction, upper left rear space center direction, upper right rear space center direction, lower left front space center direction, lower right front space center direction, lower left rear space center direction and lower right rear space center direction, and each direction is assigned with a number (as shown in table 1 below) so that the user can select a sound source direction of a transparent sound as needed.
Correspondingly, a CGMM intelligent model is trained in 14 directions of the listening space. The codes of the 14 CGMM intelligent models correspond to the codes of the 14 directions of the listening space one by one, and are used for distinguishing the sound direction, as shown in table 1 below.
Direction numbering Corresponding CGMM Direction of sound source
1 CGMM1 On the upper part
2 CGMM2 Lower part
3 CGMM3 Left side of
4 CGMM4 Right side
5 CGMM5 Front side
6 CGMM6 Rear end
7 CGMM7 Upper left front space center direction
8 CGMM8 Upper right front space center direction
9 CGMM9 Upper left rear space center direction
10 CGMM10 Upper right rear space center direction
11 CGMM11 Lower left front space center direction
12 CGMM12 Lower right front space center direction
13 CGMM13 Lower left rear space center direction
14 CGMM14 Lower right rear space center direction
Table 1: space direction numbering table
Left ear sound signal S in specified region direction SL With the right ear sound signal S SR Respectively inputting the signals to a left ear system space sound effect adjusting module and a right ear system space sound effect adjusting module based on a neural network intelligent model (NN intelligent model for short), firstly simultaneously extracting the characteristics of MFCC (Mel Frequency Cepstrum Coefficient) of the input sound signals and utilizing Fbank (Filter Bank) to obtain the audio signals of different Frequency bands by the left ear system space sound effect adjusting module and the right ear system space sound effect adjusting module, inputting the characteristic parameters extracted by MFCC to the NN intelligent model for analysis, carrying out sound effect enhancement processing on the selected specified type of sound, carrying out sound effect suppression on the non-specified type of sound, thereby obtaining the gain G of the signals on different Frequency bands ALi (left)/G ARi (right) (i = 1-n, n is the number of frequency bands divided by each frame signal, and is an integer between 20 and 40), and the left ear signal S on each frequency band obtained after Fbank processing FLi And the right ear signal S FRi Wherein i = 1-n, n is the number of frequency bands divided by each frame signal and is an integer between 20 and 40, and the signal S on each frequency band is divided into a plurality of frequency bands FLi /S FRi And respective gains G ALi /G ARi Multiplying and outputting sound signals of specified types in the specified area direction after sound effect adjustment, namely left ear sound signals S in each frequency band OLi With the right ear sound signal S ORi Wherein i = 1-n, n is the number of frequency segments divided by each frame signal, and is an integer of 20-40.
In specific implementation, the NN model of the spatial sound effect adjustment module needs to be trained so as to support the selected type of transparent transmission sound, for example, the user can select a speaking sound, a whistling sound, a doorbell sound, a child cry sound, and the like as the transparent transmission sound. Besides the optional sound types provided by the system in advance, the NN model can be trained according to the needs of the user, the NN model parameters of the earphone can be updated, and the optional sound types can be provided according to the needs.
Left ear sound signal S with specified type in specified direction after sound effect adjustment OLi With the right ear sound signal S ORi The sound signals are input to a left ear and right ear sound channel balancing module, the left ear and right ear sound channel balancing module firstly respectively obtains the volume of each frame frequency range of each sound channel, then the obtained volume of each frame frequency range of the left ear and the right ear is respectively calculated to obtain the balance coefficient of each frame frequency range signal of the left ear and the right ear to be used as the weight coefficient for sound signal balance adjustment, then the sound signals of each frame frequency range of the left ear and the right ear are respectively weighted to obtain the finally adjusted and balanced left ear sound signals S BL With the right ear sound signal S BR Then the left ear sound signal S BL With the right ear sound signal S BR And the sound effect is respectively output to the loudspeakers of the left earphone and the right earphone for playing, so that the ideal sound effect experience expected by a user in a real environment is realized.
Fig. 4 is a schematic diagram of a specific setup procedure in use. In specific implementation, the earphone intelligent transparent transmission system is used in cooperation with an application program (APP), and a user can select a sound direction (specifically, as shown in table 1, 14 directions are selectable) and a sound type (provided by the system in advance or provided according to user requirements) through the application program (APP), so that transparent transmission function setting is completed.
For the sake of understanding, the following describes the detailed operation of each functional module.
As shown in fig. 5, the spatial audio capture module performs analysis and judgment and spatial audio capture on the input environmental sound signal according to the sound direction selected by the sound direction selector according to the setting selection of the user on the APP. Wherein the environment sound signals input by the left and the right ears respectively have 4 paths S LM1 ~S LM4 And S RM1 ~S RM4 Each path of input signals are processed by FFT and then input to 14 CGMM intelligent models simultaneously, the 14 CGMM intelligent models correspond to 14 auditory spatial directions respectively, and each CGMM intelligent model is used for inputting sound according to a sound direction selected by a userAnalyzing and processing the input sound signal and capturing the space audio signal, outputting 14 paths of captured sound signals, simultaneously calculating the likelihood value that the direction represented by each CGMM intelligent model is consistent with the current sound source direction, and normalizing the 14 likelihood values to obtain the weight of the sound source in each direction, wherein the weight coefficient omega in the unselected direction k K, weight coefficient ω in selected direction k =1 × weight k, the sound signals captured by the CGMM smart models are weighted after IFFT to obtain left ear sound signals S in the direction of the selected designated area SL With the right ear sound signal S SR The weighted operation formula is as follows:
Figure BDA0003957494310000111
wherein S k A sound signal captured for each CGMM smart model.
As shown in FIG. 6, the sound signal S in the direction of the designated area SL /S SR After the audio signals are input into the left and right ear space sound effect adjusting modules, MFCC feature extraction is carried out simultaneously, audio signals of different frequency bands are obtained by using FBank, the feature extraction is processed according to frames, and each 10ms (millisecond) is a frame. In general, when human auditory evaluation is performed, the auditory detection range is generally 125 Hz-8 kHz, so the passband of audio analysis is generally 0-8 kHz, and according to the Nyquist sampling theorem, a sampling rate of 16kHz is sufficient, so that each frame has 160 sampling points; each frame is divided into n (n is an integer between 20 and 40) frequency bands, and each frequency band has 160 sampling points.
The FBank outputs the sound signal S on each frequency band of each frame after processing FLi /S FRi (ii) a The MFCC parameters after the MFCC feature extraction processing are input to an NN intelligent model for analysis processing, the NN intelligent model performs sound effect enhancement processing on the selected specified type of sound, and performs sound effect suppression on the non-specified type of sound, so that signal gain G on each frequency band of each frame of signal is obtained ALi /G ARi
Treating FBankSignal S of each frequency band of each frame FLi /S FRi Gain G corresponding to each frequency band signal of each frame ALi /G ARi Multiplying and outputting sound signals S in each frequency band of each frame OLi /S ORi Namely, the sound signals of the appointed type in the appointed area direction after the sound effect is adjusted.
Since the left and right ear specified direction environmental sounds are not sounds in absolute directions but sounds in specified direction areas, after spatial audio capture and spatial audio effect adjustment, the volumes (amplitudes) of two ear sound signals may be inconsistent, and the left and right ear sound volumes need to be balanced and adjusted to obtain the ideal auditory sensation of the user.
As shown in fig. 7, according to the input left and right ear sound signals S OLi /S ORi Firstly, the volume of each frequency band of each frame of the left ear and the right ear is obtained, and the calculation formula is as follows:
Figure BDA0003957494310000131
where j is the sample point on one frequency band for each frame of sound signal.
According to the calculated volume E of each frame of each frequency band of the left ear and the right ear Li /E Ri The volume of the left ear and the right ear are adjusted in a balanced manner, and the reference standard of the adjustment is that the maximum difference BEmax of the volume of the left ear and the volume of the right ear on a passband obtained by actual test (the actual test result is that the volume difference of the left ear and the right ear is within 6dB, namely BEmax =6 dB) is taken as a boundary value, and the volume difference of the two ears cannot exceed the boundary value. If the volume difference between the two ears is within the boundary value, the expectation is met, and balance adjustment is not needed; if the volume difference between the two ears exceeds the boundary value, the corresponding frequency band signal S is correspondingly adjusted OLi /S ORi Equilibrium coefficient of (G) Li /G Ri G is to be Li /G Ri As signals S corresponding to each frequency band OLi /S ORi Weight coefficient of balance adjustment, pair S OLi /S ORi And performing weighting processing to obtain final playing sound, thereby realizing the aim of balancing the left and right ear sound effects. The specific balance adjustment method comprises the following steps:
setting: e diff =|E Li -E Ri |;
When E is diff <BEmax, no adjustment is required;
when E is diff >At BEmax, set G i =(1/2)*(E diff– BEmax)
When E is Li >E Ri Then G Li =1-G i /E Li ;G Ri =1+G i /E Ri
When E is Li <E Ri Then G Li =1+G i /E Li ;G Ri =1-G i /E Ri
Then, the sound signals of each frame and each frequency range are weighted to obtain a target sound signal
Figure BDA0003957494310000132
Will signal S BL /S BR And respectively output to the loudspeakers of the left earphone and the right earphone for playing.
Compared with the prior art, the method has the advantages that the spatial audio in the direction of the specified area is captured by using the microphone array of the double-ear earphone and the pre-trained complex Gaussian mixture model in different directions, the sound effect of the specified type of sound in the direction of the specified area is adjusted by using a neural network intelligent model, the sound effect balance adjustment processing is carried out on the double-ear sound channel, and then the specified type of sound in the direction specified by the user is transmitted to the user, so that the intelligent and personalized ideal sound effect experience of different users and different application requirements in different environments is realized, the hearing aid capability of the earphone in daily life is enhanced and extended, and the requirement that people use the earphone as a daily life helper is practically met.
It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims (9)

1. An intelligent transparent transmission system for earphones, comprising:
the left and right ear microphone arrays are arranged in bilateral symmetry corresponding to the ears and comprise a plurality of microphones for capturing audio signals;
the left and right ear space audio capturing module comprises a plurality of complex Gaussian mixture intelligent models corresponding to the space direction, the left and right ear space audio capturing module respectively carries out FFT (fast Fourier transform) on sound signals collected by a left and right ear microphone array and then simultaneously inputs the sound signals to each complex Gaussian mixture intelligent model, each complex Gaussian mixture intelligent model analyzes and processes the input sound signals according to the sound direction selected by a user to capture the space audio signals in the appointed direction, simultaneously calculates the likelihood value that the direction represented by each complex Gaussian mixture intelligent model is consistent with the current sound source direction so as to obtain the weight coefficient of the sound source in each direction, and carries out weighting processing after IFFT (inverse fast Fourier transform) on the space audio signals captured by each complex Gaussian mixture intelligent model so as to respectively obtain the left ear sound signals and the right ear sound signals in the appointed area direction;
left and right ear system space audio effect regulating module, including a neural network intelligent model, left and right ear system space audio effect regulating module carries out MFCC feature extraction and utilizes the filter bank to obtain the audio signal of different frequency channels to left ear sound signal and right ear sound signal of left and right ear space audio capture module output respectively, the characteristic parameter input that MFCC drawed carries out analysis processes for neural network intelligent model, carry out the audio enhancement to the sound of selected appointed kind, carry out the audio suppression to the sound of non-appointed kind, thereby obtain the left ear signal gain G of signal on different frequency channels ALi And the right ear signal gain G ARi And the left ear signals S on all frequency bands obtained after the processing of the filter bank are processed simultaneously FLi And the right ear signal S FRi The left ear signal S on each frequency band FLi And the right ear signal S FRi Corresponding left ear signal gain G ALi And the right ear signal gain G ARi Multiplying and outputting sound signals of specified types in the direction of the specified area after sound effect adjustmentI.e. left ear sound signals S at each frequency band OLi With the right ear sound signal S ORi
A sound channel balance module for balancing the input left ear sound signal S OLi With the right ear sound signal S ORi Firstly, the sound signal S of the left ear is obtained OLi With the right ear sound signal S ORi And (4) carrying out balance adjustment on the volume of the left ear and the volume of the right ear after each frame of the volume of each frequency band.
2. The intelligent transparent transmission system of earphones according to claim 1, wherein: each microphone array comprises four microphones, wherein the second microphone and the third microphone are external microphones, the fourth microphone is a talking microphone for capturing outside space environment sound, and the first microphone is an internal microphone for capturing sound in an ear canal.
3. The intelligent transparent transmission system for earphones according to claim 2, wherein: the left and right ear space audio capturing modules each comprise 14 complex Gaussian mixture models corresponding to 14 directions of a predefined listening field space, wherein the listening field space is represented by a sphere with an artificial center, and the listening field space is divided into 14 directions according to an upper, lower, left, right, front, rear, upper left front space center direction, upper right front space center direction, upper left rear space center direction, upper right rear space center direction, lower left front space center direction, lower right front space center direction, lower left rear space center direction and lower right rear space center direction in an axial symmetry mode.
4. The intelligent transparent transmission system for earphones according to claim 3, wherein: the 14 complex Gaussian mixture intelligent models of the left and right ear space audio capturing module respectively correspond to 14 auditory space directions, each complex Gaussian mixture intelligent model analyzes and processes input sound signals and captures space audio signals according to the sound direction selected by a user, 14 paths of captured sound signals are output, and the direction represented by each complex Gaussian mixture intelligent model and the current sound source are calculated simultaneouslyThe likelihood values with the same direction are normalized to obtain the weight of the sound source in each direction, and the weight coefficient omega in the unselected direction k K, weight coefficient ω in selected direction k And (k) carrying out IFFT (inverse fast Fourier transform) processing on the sound signals captured by each complex Gaussian mixture intelligent model and then carrying out weighting calculation to obtain left ear sound signals S in the direction of the selected specified area SL With the right ear sound signal S SR The weighting operation formula is as follows:
Figure FDA0003957494300000031
wherein S k A sound signal captured for each CGMM smart model.
5. The intelligent transparent transmission system for earphones according to claim 4, wherein: when the left and right ear spatial sound effect adjusting module extracts MFCC features, feature extraction is processed according to frames, each frame is divided into n (n = 20-40) frequency bands every 10ms, and each frequency band is provided with 160 sampling points.
6. The intelligent transparent transmission system for earphones according to claim 5, wherein: the intelligent transparent transmission system of the earphone needs to train the NN model so as to support the selected transparent transmission sound types, and simultaneously can train the NN model according to the needs of a user, update the NN model parameters of the earphone and provide the selectable sound types according to the needs.
7. The intelligent transparent transmission system for earphones according to claim 6, wherein: the spatial sound effect adjusting module acquires left ear signals S of different frequency bands by frames by utilizing a Filter Bank FLi And the right ear signal S FRi Wherein i = 1-n, n is the number of frequency bands divided by each frame signal, and is an integer between 20 and 40.
8. The headset of claim 7Can pass through system, its characterized in that: the sound channel balance module calculates the input left ear sound signal S according to the following formula OLi With the right ear sound signal S ORi Volume of each frequency band of each frame:
Figure FDA0003957494300000041
wherein j is a sampling point of each frame of sound signal on a frequency band, and then according to the calculated volume E of each frequency band of the left ear and the right ear Li /E Ri Carrying out balance adjustment on the volumes of the left ear and the right ear, wherein the reference standard of the adjustment is that the maximum difference value BEmax of the volumes of the left ear and the right ear on a passband obtained by actual test is taken as a boundary value, and the volume difference of the two ears cannot exceed the boundary value; if the volume difference between the two ears is within the boundary value, the method is in accordance with the expectation and does not need to carry out balance adjustment; if the volume difference between the two ears exceeds the boundary value, the corresponding frequency band signal S is correspondingly adjusted OLi /S ORi Equilibrium coefficient of (G) Li /G Ri Will balance the coefficient G Li /G Ri As a corresponding left ear sound signal S OLi With the right ear sound signal S ORi Weight coefficient for signal balance adjustment of each frequency band, for left ear sound signal S OLi With the right ear sound signal S ORi Weighting to obtain final playing sound, so as to realize the aim of left and right ear sound effect balance, and the specific balance adjusting method comprises the following steps:
setting: e diff =|E Li -E Ri |;
When E is diff <BEmax, no adjustment is required;
when E is diff >At BEmax, set G i =(1/2)*(E diff –BEmax)
When E is Li >E Ri Then G Li =1-G i /E Li ;G Ri =1+G i /E Ri
When E is Li <E Ri Then G Li =1+G i /E Li ;G Ri =1-G i /E Ri
Then, the sound signals of each frame and each frequency range are weighted to obtain the target sound signal
Figure FDA0003957494300000051
9. The intelligent transparent transmission system for earphones according to claim 8, wherein: the earphone intelligent transparent transmission system is matched with an application program, and a user can select the sound direction and the sound type through the application program.
CN202211468553.0A 2022-11-22 2022-11-22 Intelligent transparent transmission system for earphone Pending CN115835080A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211468553.0A CN115835080A (en) 2022-11-22 2022-11-22 Intelligent transparent transmission system for earphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211468553.0A CN115835080A (en) 2022-11-22 2022-11-22 Intelligent transparent transmission system for earphone

Publications (1)

Publication Number Publication Date
CN115835080A true CN115835080A (en) 2023-03-21

Family

ID=85530286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211468553.0A Pending CN115835080A (en) 2022-11-22 2022-11-22 Intelligent transparent transmission system for earphone

Country Status (1)

Country Link
CN (1) CN115835080A (en)

Similar Documents

Publication Publication Date Title
Luo et al. Adaptive null-forming scheme in digital hearing aids
US9681246B2 (en) Bionic hearing headset
Van den Bogaert et al. The effect of multimicrophone noise reduction systems on sound source localization by users of binaural hearing aids
CN104185130B (en) Hearing aid with spacing wave enhancing
US10587962B2 (en) Hearing aid comprising a directional microphone system
AU2016201028A1 (en) Signal enhancement using wireless streaming
CN105376668B (en) A kind of earphone noise-reduction method and device
CN105848078A (en) A binaural hearing system
CN105981409A (en) Conversation assistance system
Widrow A microphone array for hearing aids
CN110740412A (en) Hearing device comprising a speech presence probability estimator
CN113316073A (en) Hearing aid system for estimating an acoustic transfer function
CN112995876A (en) Signal processing in a hearing device
CN113498005A (en) Hearing device adapted to provide an estimate of the user&#39;s own voice
US20200296492A1 (en) Wearable directional microphone array system and audio processing method
Maj et al. Noise reduction results of an adaptive filtering technique for dual-microphone behind-the-ear hearing aids
EP1410382B1 (en) Method of noise reduction in a hearing aid and hearing aid implementing such a method
Maj et al. SVD-based optimal filtering for noise reduction in dual microphone hearing aids: a real time implementation and perceptual evaluation
Corey Microphone array processing for augmented listening
CN113038318A (en) Voice signal processing method and device
CN115835080A (en) Intelligent transparent transmission system for earphone
Corey et al. Binaural audio source remixing with microphone array listening devices
Sunohara et al. Low-latency real-time blind source separation with binaural directional hearing aids
CN214799882U (en) Self-adaptive directional hearing aid
EP4207804A1 (en) Headphone arrangement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination