CN116887160B - Digital hearing aid howling suppression method and system based on neural network - Google Patents

Digital hearing aid howling suppression method and system based on neural network Download PDF

Info

Publication number
CN116887160B
CN116887160B CN202311152648.6A CN202311152648A CN116887160B CN 116887160 B CN116887160 B CN 116887160B CN 202311152648 A CN202311152648 A CN 202311152648A CN 116887160 B CN116887160 B CN 116887160B
Authority
CN
China
Prior art keywords
moment
convergence
time
voice signal
stationary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311152648.6A
Other languages
Chinese (zh)
Other versions
CN116887160A (en
Inventor
章调占
张志平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiuyi Shenzhen Medical Technology Co ltd
Original Assignee
Jiuyi Shenzhen Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiuyi Shenzhen Medical Technology Co ltd filed Critical Jiuyi Shenzhen Medical Technology Co ltd
Priority to CN202311152648.6A priority Critical patent/CN116887160B/en
Publication of CN116887160A publication Critical patent/CN116887160A/en
Application granted granted Critical
Publication of CN116887160B publication Critical patent/CN116887160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/50Customised settings for obtaining desired overall acoustical characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The embodiment of the specification discloses a digital hearing aid howling suppression method and system based on a neural network, and relates to the technical field of hearing aids. Wherein the method comprises the following steps: acquiring a voice signal received by a digital hearing aid; acquiring the state of each moment in the voice signal based on a neural network; determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristic of the amplitude-frequency peak value of each moment; determining a non-stationary moment based on the convergence stationary coefficient; determining a step length adjusting proportion based on the time interval between the non-stable time and the howling time, and acquiring a convergence step length corresponding to each time based on the step length adjusting proportion and the convergence stability coefficient; and carrying out echo cancellation on the voice signal based on an NLMS algorithm and the convergence step length so as to inhibit howling in the digital hearing aid.

Description

Digital hearing aid howling suppression method and system based on neural network
Technical Field
The present application relates to the field of hearing aids, and in particular, to a digital hearing aid howling suppression method and system based on a neural network.
Background
The digital hearing aid can convert the electric signal from the microphone into a digital signal, and convert the digital signal into an analog signal to be sent to the receiver after processing, so that the increasing requirements of hearing impaired people on the quality of the hearing aid can be met, and the quality of life of the hearing impaired people is improved.
Howling is one of the main factors affecting the use experience of digital hearing aids in the use process, and is generated because the amplified sound of the hearing aid is transmitted to a microphone through spatial or structural vibration and becomes an input signal again to be circularly amplified, so that resonance occurs on certain frequencies of the sound signal until the feedback signal reaches a saturated output state, and howling is caused. Howling is classified into internal howling, which is a howling condition generated by structural destruction due to aging or vibration of internal components of the hearing aid, falling or displacement, and external howling, which is an external howling, because amplified sound overflows from the auditory canal to be received by the microphone again.
The howling suppression method at the present stage mainly comprises three suppression directions of frequency shifter suppression, wave trap suppression and self-adaptive feedback suppression, wherein in a scene with high tone quality requirement, the howling frequency points have diffusion characteristics, and the frequency shifter suppression effect is poor; the suppression of the wave trap depends on accurate detection of the howling frequency, but it is difficult to obtain accurate howling frequency in the real-time conversation process; adaptive feedback suppression is more damaging to the voice and if residual suppression is present, it still produces a weaker whistle in the actual use of the hearing aid.
Based on this, it is necessary to study a howling suppression method for a digital hearing aid to effectively solve the howling problem of the digital hearing aid.
Disclosure of Invention
An aspect of embodiments of the present specification provides a digital hearing aid howling suppression method based on a neural network, the method comprising:
acquiring a voice signal received by a digital hearing aid;
acquiring the state of each moment in the voice signal based on a neural network;
determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristic of the amplitude-frequency peak value of each moment, wherein the convergence stability coefficient is used for representing the probability of harmonic peaks at each moment in the voice signal;
determining a non-stationary moment based on the convergence stationary coefficient;
determining a step length adjusting proportion based on the time interval between the non-stable time and the howling time, and acquiring a convergence step length corresponding to each time based on the step length adjusting proportion and the convergence stability coefficient;
and carrying out echo cancellation on the voice signal based on an NLMS algorithm and the convergence step length so as to inhibit howling in the digital hearing aid.
In some embodiments, the acquiring, based on the neural network, a state of each time instant in the speech signal includes:
Denoising the voice signal, and performing time domain transformation on the denoised voice signal to obtain a corresponding time domain diagram;
acquiring short-time energy of each moment in the voice signal based on the time domain diagram, and a short-time autocorrelation coefficient of the voice signal when delay is k;
and taking the short-time energy and the short-time autocorrelation coefficient at each moment as an instantaneous vector at each moment in a time domain diagram corresponding to the voice signal, and inputting the instantaneous vector into a trained state detection model to obtain a state at each moment in the voice signal, wherein the states comprise a convergence state and a non-convergence state.
In some embodiments, the determining the convergence and stability coefficient corresponding to each time according to the state of each time in the voice signal and the distribution characteristic of the amplitude-frequency peak value of each time includes:
acquiring each peak value of the amplitude value from the frequency domain diagram of the voice signal, and taking each unequal peak value as a peak value stage;
for each of said peak levels;
acquiring all corresponding moments of the peak value in the time domain diagram of the voice signal, taking a time difference value between two adjacent moments as interval time, and recording a sequence formed by all interval time as an interval time sequence;
Acquiring all frequencies corresponding to the peak value in the frequency domain diagram of the voice signal, taking a difference value between two adjacent frequencies as an interval frequency, and recording a sequence formed by all interval frequencies as an interval frequency sequence;
determining the peak-to-peak fluctuation degree corresponding to each moment based on the interval time sequence and the interval frequency sequence;
determining a convergence distance corresponding to each moment based on the state;
and obtaining a convergence stability coefficient corresponding to each moment according to the ratio of the peak-to-peak fluctuation degree to the convergence distance.
In some embodiments, the degree of peak-to-peak variability is calculated based on the following formula:
wherein,for the peak-to-peak variability corresponding to the ith moment, m is the statistical number of the peak levels in the speech signal, +.>For the interval time sequence corresponding to the peak level corresponding to the ith moment,/th moment>For the interval time sequence corresponding to the b-th peak level,/th peak level>For the interval frequency sequence corresponding to the peak level corresponding to the ith moment, +.>For the interval frequency sequence corresponding to the b-th peak level,/a>For interval time sequence->And->DTW distance between>For the interval frequency sequence->And->DTW distance between.
In some embodiments, the convergence distance is calculated based on the following formula:
Wherein,for the convergence distance corresponding to the i-th moment, < >>M is the length of the reference sequence corresponding to each convergence time, s is the sequence number corresponding to the convergence time, and +.>For the power value at the i-th instant, +.>Power value at j-th moment in reference sequence for s-th convergence moment,/th>For the ith moment and +.>The variation coefficient of the data set formed by the amplitude differences at the convergence time.
In some embodiments, the determining a non-stationary time based on the convergence stationary coefficient comprises:
forming convergence sequences by using convergence stability coefficients corresponding to all moments in a non-convergence state according to a time sequence, and acquiring mutation points in the convergence sequences by using a BG sequence segmentation algorithm;
and taking the moment corresponding to the abrupt point and all the moments in the convergence state as non-stable moments in the voice signal.
In some embodiments, the determining the step size adjustment ratio based on the time interval between the non-stationary time and the howling time includes:
calculating a linear prediction coefficient of each non-stationary moment, wherein the linear prediction coefficient is used for representing the deviation condition of each non-stationary moment;
Arranging the linear prediction coefficients corresponding to each non-stationary moment according to a time sequence to obtain a corresponding linear prediction sequence;
for each non-stationary moment;
determining a convergence contribution ratio based on the linear prediction sequences corresponding to the front and rear adjacent moments and the linear prediction sequences corresponding to the front and rear adjacent moments;
determining an information effective ratio based on the ratio of the reconstruction variable quantity corresponding to the information effective ratio to the sum of the reconstruction variable quantities corresponding to all non-stationary moments;
and obtaining a step length adjusting ratio corresponding to each non-stationary moment according to the convergence contribution ratio and the information effective ratio.
In some embodiments, the convergence contribution ratio is calculated based on the following formula:
wherein,the convergence contribution ratio corresponding to the y-th non-stationary moment; />、/>、/>The linear prediction sequences are respectively corresponding to the y-1 th, the y-1 th and the y+1 th non-stationary moments; />、/>Respectively the sequences->And->、/>And->、/>And->Pearson correlation coefficient therebetween;
the information effective ratio is calculated based on the following formula:
wherein,for the information effective ratio of the y-th non-stationary moment, M is the number of non-stationary moments,/>The reconstruction variable quantity corresponding to the y-th non-stationary moment;
The step size adjustment ratio is calculated based on the following formula:
wherein,and adjusting the proportion for the step length corresponding to the y-th non-stationary moment.
In some embodiments, the convergence step is calculated based on the following formula:
wherein,for the convergence ratio at the ith moment, M is the number of non-stationary moments, +.>Convergence stability coefficient corresponding to the ith moment and the yth non-stable moment respectively, ++>The scale is adjusted for the step corresponding to the y-th non-stationary moment,for convergence step maximum, +.>Representing a normalization function->For maximum convergence ratio, +.>Is a regulatory factor.
Another aspect of embodiments of the present specification also provides a digital hearing aid howling suppression system based on a neural network, the system comprising:
the acquisition module is used for acquiring the voice signal received by the digital hearing aid;
the state determining module is used for acquiring the state of each moment in the voice signal based on the neural network;
the convergence stability coefficient determining module is used for determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristics of the amplitude-frequency peak value of each moment, and the convergence stability coefficient is used for representing the probability of harmonic peaks existing at each moment in the voice signal;
A non-stationary time determining module, configured to determine a non-stationary time based on the convergence stationary coefficient;
the convergence step length determining module is used for determining a step length adjusting proportion based on the time interval between the non-stable time and the howling time and acquiring a convergence step length corresponding to each time based on the step length adjusting proportion and the convergence stable coefficient;
and the howling suppression module is used for carrying out echo cancellation on the voice signal based on an NLMS algorithm and the convergence step length so as to suppress howling in the digital hearing aid.
The digital hearing aid howling suppression method and system based on the neural network provided in the embodiments of the present disclosure may have at least the following beneficial effects: (1) The state of each moment in the voice signal is detected through the neural network, a convergence stability coefficient is constructed according to the distribution characteristics of the amplitude-frequency peak value of each moment, the amplitude-frequency distribution of the adjacent moment of the peak point is considered, the problem that the effect is not ideal due to evaluation based on single moment in the traditional howling detection method can be avoided, and the detection precision of the howling signal in the digital hearing aid is improved; (2) By constructing the step length adjustment proportion based on the time interval between the non-stationary time and the howling time, the amplitude-frequency characteristic of each time in the voice signal can be fully utilized, meanwhile, the coefficient energy distribution state in the adaptive filter is evaluated according to the approaching degree of each non-stationary time and the howling time, the convergence step length in the NLMS algorithm can be acquired in a self-adaptive manner in the subsequent process, the problem that the convergence speed and the precision are difficult to adjust is solved, and the suppression effect of the suppression system on the howling is improved.
Additional features will be set forth in part in the description which follows. As will become apparent to those skilled in the art upon review of the following and drawings, or may be learned by the production or operation of the examples. The features of the present specification can be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations set forth in the detailed examples below.
Drawings
The present specification will be further described by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:
fig. 1 is a schematic diagram of an exemplary application scenario of a digital hearing aid squeal suppression system based on a neural network according to some embodiments of the present description;
fig. 2 is an exemplary block diagram of a digital hearing aid squeal suppression system based on a neural network, according to some embodiments of the present description;
fig. 3 is an exemplary flow chart of a neural network-based digital hearing aid squeal suppression method according to some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
It should be appreciated that as used in this specification, a "system," "apparatus," "unit" and/or "module" is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
The following describes in detail a digital hearing aid howling suppression method and system based on a neural network according to an embodiment of the present disclosure with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an exemplary application scenario of a digital hearing aid squeal suppression system based on a neural network according to some embodiments of the present description.
Referring to fig. 1, in some embodiments, an application scenario 100 of a neural network-based digital hearing aid howling suppression system may include a speech acquisition device 110, a storage device 120, a processing device 130, a speech output device 140, and a network 150. The various components in the application scenario 100 may be connected in a variety of ways. For example, the voice acquisition device 110 may be connected to the storage device 120 and/or the processing device 130 through the network 150, or may be directly connected to the storage device 120 and/or the processing device 130. As another example, the storage device 120 may be directly connected to the processing device 130 or connected via the network 150. For another example, the speech output device 140 may be connected to the storage device 120 and/or the processing device 130 through the network 150, or may be directly connected to the storage device 120 and/or the processing device 130.
The speech gathering device 110 may be part of a digital hearing aid, for example, in some embodiments the speech gathering device 110 may be a microphone in the digital hearing aid that may be used to receive ambient sound. In some embodiments, the voice capture device 110 may be a single microphone or an array microphone. In some embodiments, the voice capturing device 110 may capture the environmental voice in real time, and send the captured voice signal to the processing device 130 for processing. In some embodiments, the voice capture device 110 may have a separate power source that may send the captured voice signals to other components (e.g., the storage device 120, the processing device 130, the voice output device 140) in the application scenario 100 in a wired or wireless manner. In some embodiments, the application scenario 100 may include a plurality (e.g., two or more) of voice capturing devices 110, where the plurality of voice capturing devices 110 may capture sounds in the environment from different directions, and the processing apparatus 130 may perform noise reduction processing based on the voice signals captured by the different voice capturing devices 110 to remove noise in the voice signals.
In some embodiments, the voice acquisition device 110 may send its acquired voice signals to the storage device 120, the processing device 130, the voice output device 140, etc. over the network 150. In some embodiments, the speech signals acquired by the speech acquisition device 110 may be processed by the processing apparatus 130. For example, processing device 130 may determine a convergence step size for each time instant based on the speech signal and echo cancel the speech signal based on the NLMS algorithm and the convergence step size to suppress howling in the digital hearing aid. In some embodiments, the speech signal and/or the speech signal processed by the processing device 130 may be sent to the storage device 120 for recording or to the speech output means 140 for feedback to the user (e.g. a digital hearing aid user).
Network 150 may facilitate the exchange of information and/or data. The network 150 may include any suitable network capable of facilitating the exchange of information and/or data of the application scenario 100. In some embodiments, at least one component of the application scenario 100 (e.g., the speech acquisition device 110, the storage device 120, the processing device 130, the speech output device 140) may exchange information and/or data with at least one other component in the application scenario 100 over the network 150. For example, the processing device 130 may obtain the speech signal acquired in the current environment from the speech acquisition apparatus 110 and/or the storage device 120 via the network 150. For another example, the processing device 130 may send the processed voice signal to the voice output apparatus 140 through the network 150.
Storage 120 may store data, instructions, and/or any other information. In some embodiments, the storage device 120 may store data obtained from the speech acquisition apparatus 110 and/or the processing device 130. For example, the storage device 120 may store the voice signal collected by the voice collection apparatus 110; for another example, the storage device 120 may store a voice signal obtained by processing the voice signal by the processing device 130. In some embodiments, the storage device 120 may store data and/or instructions that the processing device 130 uses to perform or use to implement the exemplary methods described in this specification. In some embodiments, the storage device 120 may include mass memory, removable memory, volatile read-write memory, read-only memory (ROM), and the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state disks, and the like. In some embodiments, the storage device 120 may be part of a digital hearing aid. In some embodiments, storage device 120 may be implemented on a cloud platform. For example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-layer cloud, or the like, or any combination thereof.
In some embodiments, the storage device 120 may be connected to the network 150 to communicate with at least one other component in the application scenario 100 (e.g., the speech acquisition apparatus 110, the processing device 130, the speech output apparatus 140). At least one component in the application scenario 100 may access data, instructions, or other information stored in the storage device 120 through the network 150. In some embodiments, the storage device 120 may be directly connected or in communication with one or more components (e.g., the speech acquisition apparatus 110, the processing device 130) in the application scenario 100. In some embodiments, the storage device 120 may be part of the speech acquisition apparatus 110 and/or the processing device 130.
The processing device 130 may process data and/or information obtained from the speech acquisition apparatus 110, the storage device 120, and/or other components of the application scenario 100. In some embodiments, the processing device 130 may obtain a voice signal from the voice acquisition device 110 and/or the storage device 120, process the voice signal to determine a convergence step size corresponding to each time, and echo cancel the voice signal based on the NLMS algorithm and the convergence step size, thereby suppressing howling in the digital hearing aid. In some embodiments, processing device 130 may retrieve pre-stored computer instructions from storage device 120 and execute the computer instructions to implement the neural network-based digital hearing aid howling suppression method described in this specification.
In some embodiments, the processing device 130 may be local or remote. For example, in some embodiments, the processing device 130 may be part of a digital hearing aid; as another example, in some embodiments, the processing device 130 may be a single server or a group of servers. The server group may be centralized or distributed, and the voice signals collected by the voice collecting device 110 may be sent to the server for processing. In some embodiments, processing device 130 may access information and/or data from voice acquisition apparatus 110 and/or storage device 120 over network 150. In some embodiments, the processing device 130 may be directly connected to the speech acquisition apparatus 110 and/or the storage device 120 to access information and/or data. In some embodiments, the processing device 130 may be implemented on a cloud platform. For example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, and the like, or any combination thereof.
The voice output device 140 may receive and/or output a voice signal, which may include a voice signal collected by the voice collection device 110 and/or a voice signal processed by the processing apparatus 130. For example, in some embodiments, the voice signal collected by the voice collection device 110 may be directly sent to the voice output device 140 for output. For another example, in some embodiments, the voice signal collected by the voice collecting apparatus 110 may be sent to the processing device 130 to perform denoising, howling suppression, and other processes, and then the processed voice signal is sent to the voice output apparatus 140 to be output. In some embodiments, the speech output device 140 may be part of a digital hearing aid, for example, the speech output device 140 may be a speaker (also called an earpiece) of the digital hearing aid.
It should be noted that the above description about the application scenario 100 is only for illustration and description, and does not limit the application scope of the present specification. Various modifications and changes to the application scenario 100 may be made by those skilled in the art under the guidance of the present specification. However, such modifications and variations are still within the scope of the present description. For example, the voice capture device 110, the voice output device 140, etc. may include more or fewer functional components.
Fig. 2 is a block diagram of a digital hearing aid squeal suppression system based on a neural network according to some embodiments of the present description. In some embodiments, the neural network-based digital hearing aid howling suppression system 200 shown in fig. 2 may be applied to the application scenario 100 shown in fig. 1 in software and/or hardware, for example, may be configured in software and/or hardware to the processing device 130 for processing the speech signal collected by the speech collection apparatus 110, determining a convergence step size corresponding to each time based on the speech signal, and performing echo cancellation on the speech signal based on the NLMS algorithm and the convergence step size, thereby suppressing howling in the digital hearing aid.
Referring to fig. 2, in some embodiments, the neural network-based digital hearing aid howling suppression system 200 may include an acquisition module 210, a state determination module 220, a convergence stationary coefficient determination module 230, a non-stationary time determination module 240, a convergence step size determination module 250, and a howling suppression module 260.
The acquisition module 210 may be used to acquire the speech signal received by the digital hearing aid.
The state determination module 220 may be configured to obtain a state of each time instant in the speech signal based on a neural network.
The convergence stability coefficient determining module 230 may be configured to determine a convergence stability coefficient corresponding to each time according to a state of each time in the speech signal and a distribution characteristic of a amplitude-frequency peak value of each time, where the convergence stability coefficient is used to characterize a probability that a harmonic peak exists at each time in the speech signal.
The non-stationary moment determination module 240 may be configured to determine a non-stationary moment based on the convergence stationary coefficient.
The convergence step size determining module 250 may be configured to determine a step size adjustment ratio based on the time interval between the non-stationary time and the howling time, and obtain a convergence step size corresponding to each time based on the step size adjustment ratio and the convergence stationary coefficient.
Howling suppression module 260 may be configured to echo cancel the speech signal based on an NLMS algorithm and the convergence step size to suppress howling in the digital hearing aid.
For further details regarding the above-mentioned respective modules, reference may be made to other locations in the present specification (e.g. fig. 3 and related descriptions thereof), and no further description is given here.
It should be appreciated that the neural network based digital hearing aid howling suppression system 200 and its modules shown in fig. 2 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system of the present specification and its modules may be implemented not only with hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also with software executed by various types of processors, for example, and with a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the digital hearing aid howling suppression system 200 based on a neural network is provided for illustrative purposes only and is not intended to limit the scope of the present description. It will be appreciated by those skilled in the art from this disclosure that various modules may be combined arbitrarily or constituting a subsystem in connection with other modules without departing from this concept. For example, the acquisition module 210, the state determination module 220, the convergence stability factor determination module 230, the non-stationary moment determination module 240, the convergence step size determination module 250, and the howling suppression module 260 described in fig. 2 may be different modules in one system, or may be one module to implement the functions of two or more modules described above. As another example, the digital hearing aid howling suppression system 200 based on the neural network may further include a preprocessing module, which may be used to perform preprocessing such as wiener filtering denoising on the foregoing voice signal. Such variations are within the scope of the present description. In some embodiments, the foregoing modules may be part of the processing device 130.
Fig. 3 is an exemplary flow chart of a digital hearing aid squeal suppression method based on a neural network according to some embodiments of the present description. In some embodiments, method 300 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (instructions run on a processing device to perform hardware simulation), or the like, or any combination thereof. In some embodiments, one or more operations in the flowchart of the neural network-based digital hearing aid howling suppression method 300 shown in fig. 3 may be implemented by the processing device 130 shown in fig. 1. For example, method 300 may be stored in storage device 120 in the form of instructions and invoked and/or executed by processing device 130. The specific implementation of method 300 is described in detail below in conjunction with FIG. 3.
Referring to fig. 3, in some embodiments, a digital hearing aid squeal suppression method 300 based on a neural network may include:
step 310, a speech signal received by a digital hearing aid is acquired. In some embodiments, step 310 may be performed by the acquisition module 210.
The voice signal received by the digital hearing aid may refer to the voice signal acquired by the voice acquisition device 110 in the digital hearing aid for the current environment. In some embodiments, the voice signal acquired by the voice acquisition apparatus 110 may be stored in the storage device 120, and the acquisition module 210 may acquire the voice signal received by the digital hearing aid from the storage device 120. In some embodiments, the acquisition module 210 may be communicatively coupled to the voice capture device 110, and the acquisition module 210 may acquire the voice signal received by the digital hearing aid directly from the voice capture device 110.
In some embodiments, an audio acquisition device or audio recording software may be utilized to obtain a voice signal during use of the digital hearing aid. For example, in some embodiments, both ends of the connection wire or adapter may be connected to the digital hearing aid and the computer, respectively, and then the audio recording software in the computer is used to obtain the speech signal received by the digital hearing aid during use.
In some embodiments, in order to eliminate noise influence of the voice signal in the process of collection and transmission, a wiener filtering algorithm may be used to denoise the collected voice signal. Wiener filtering denoising is a well-known technique, and the detailed process thereof is not described in the specification.
In some embodiments, it has been found through research that, in the frequency domain of the voice signal, the howling frequency point power is high, and is a peak value in the spectrogram of the whole signal, which is larger than the power value of other voice signals or noise signals, that is, the energy of the normal voice signal is mainly concentrated in a lower frequency band, for example, the voice signal of a person is mainly concentrated in the range from 300Hz to 3kHz, and the energy distribution of the howling signal is wider; in the time domain of the voice signal, the power of the howling frequency point has a rapid increasing process, and is always kept after reaching the peak value. In addition, normal speech signals have a relatively clear harmonic component, while howling signals may exhibit nonlinear distortion, noise, etc., and audio signals containing howling do not contain harmonic peaks at the howling frequency.
The adaptive filter NLMS (Normalized LMS) algorithm can obtain a frequency response similar to the feedback channel by adjusting the parameters of the digital filter, and then subtracting the digital filtered output from the acquired voice signal to cancel the echo and suppress howling. However, since the NLMS algorithm is difficult to adjust between the convergence speed and the accuracy in the process of suppressing howling, the present embodiment considers the adaptive acquisition of the convergence step size by using the state of the speech signal.
Step 320, obtaining a state of each moment in the speech signal based on the neural network. In some embodiments, step 320 may be performed by the state determination module 220.
In some embodiments, to reduce subjective impact during use of the digital hearing aid, n speech signals with duration K may be continuously collected, and the speech signal after the t-th denoising may be denoted as x (t). For the denoised speech signal x (t), a time domain transformation may be performed to obtain a corresponding time domain map. Further, the state determination module 220 may obtain short-time energy of each time instant in the speech signal x (t) based on the time domain graph, and a short-time autocorrelation coefficient of the speech signal x (t) when the delay is k. Wherein the magnitude of k takes the checked value 2s. If there is howling at a certain moment, the short-time energy will have a larger change, and the autocorrelation coefficient of the speech signal x (t) will also have a larger change (the short-time energy and the short-time autocorrelation coefficient are known techniques, and detailed processes are not repeated). Based on this, in some embodiments of the present disclosure, the short-time energy and the short-time autocorrelation coefficient at each time may be used as an instantaneous vector at each time in the time domain diagram corresponding to the speech signal x (t), and further, the instantaneous vector may be used as input data to input a trained state detection model, to obtain the state at each time in the speech signal x (t). Wherein the states include a converging state and a non-converging state.
In some embodiments, the aforementioned state detection model may be trained as follows:
taking the difference value between the short-time energy of each moment in the voice signal and the short-time energy of the previous moment as an energy variable of the voice signal at the current moment; then, acquiring a segmentation threshold value of the energy variable by using an Ojin threshold algorithm, regarding the time when the energy variable is larger than or equal to the segmentation threshold value as convergence time, and regarding the time when the energy variable is smaller than the segmentation threshold value as non-convergence time; and finally, marking the instantaneous vector corresponding to the convergence time as 0, marking the instantaneous vector corresponding to the non-convergence time as 1, and training the initial state detection model by taking all the marked instantaneous vectors as training samples until the training is finished when the preset condition is reached, thereby obtaining the trained state detection model.
In some embodiments, the structure of the state detection model may be a convolutional neural network, which may use Adam as an optimization algorithm, use a cross entropy function as a loss function, and output as a speech signal state corresponding to each time. Because the training of neural networks is a well-known technique, the specific training process is not described in this specification.
And 330, determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristics of the amplitude-frequency peak value of each moment. In some embodiments, step 330 may be performed by the convergence stationary coefficient determination module 230.
Amplitude-frequency peak refers to the maximum amplitude of a signal at a certain frequency. In some embodiments, the individual peaks of the amplitude may be obtained from a frequency domain plot of the speech signal x (t), and then each of the unequal peaks may be taken as a peak level, e.g., the peak level may be recorded as 1 to L from low to high, respectively.
Since the amplitude distribution of the normal speech signal has a certain periodicity, and the amplitude of the howling frequency has a gradually increasing aperiodicity, and such aperiodic variations do not occur only once according to the degree of howling. That is, each peak may correspond to more than one time or frequency.
Based on this, for each level of peak value, all the time points corresponding to the peak value in the time domain diagram of the speech signal x (t) can be obtained, the time difference between adjacent two time points is taken as interval time, and the sequence consisting of all the interval time is taken as an interval time sequence. And secondly, acquiring all frequencies corresponding to the peak value in a frequency domain diagram of the voice signal x (t), taking a difference value between two adjacent frequencies as an interval frequency, and recording a sequence formed by all interval frequencies as an interval frequency sequence.
It should be noted that, in the voice signal x (t), if the voice data at a certain time destroys the periodicity of the signal amplitude at a neighboring time and the time belongs to the convergence state, the NLMS algorithm at the time should increase the convergence speed to prevent howling.
Based on the analysis, a convergence and stability coefficient V can be constructed to characterize the probability of harmonic peaks at each instant in the speech signal. In some embodiments, the convergence plateau coefficient for the ith time instant may be calculated based on the following formula:
Wherein,for the peak-to-peak variability corresponding to the ith moment, m is the statistical number of peak levels in the speech signal, +.>For the interval time sequence corresponding to the peak level corresponding to the ith moment,/th moment>For the interval time sequence corresponding to the b-th peak level,/th peak level>For the interval frequency sequence corresponding to the peak level corresponding to the ith moment, +.>For the interval frequency sequence corresponding to the b-th peak level,/a>For interval time sequence->And->DTW (Dynamic Time Warping ) distance between, +.>For the interval frequency sequence->And->DTW distance between. The DTW distance is a known technique, and a specific process is not described herein. In the above formula, the degree of peak-to-peak fluctuation +. >The larger the value of (c) is, the larger the periodic fluctuation between the i-th time and the rest of the peaks in the speech signal x (t) is.
For the convergence distance corresponding to the i-th moment, < >>M is the length of the reference sequence corresponding to each convergence time, s is the sequence number corresponding to the convergence time, and +.>For the power value at the i-th instant,power value at j-th moment in reference sequence for s-th convergence moment,/th>For the ith moment and +.>The variation coefficient of the data set formed by the amplitude differences at the convergence time. In the present embodiment, the reference sequence is a sequence including 7 times, which includes 3 adjacent times and convergence times, about each convergence time, and the empirical value of M is 7. The coefficient of variation is a well-known technique and will not be described in detail in this specification. It can be appreciated that convergence distance ∈ ->The smaller the value of (c) indicates that the closer the i-th moment is to the convergence state, the more likely the NLMS algorithm is to be in the convergence state at the i-th moment.
Specifically, in some embodiments of the present disclosure, the convergence stationary coefficient determination module 230 may determine the peak-to-peak variability corresponding to each time based on the foregoing interval time sequence and interval frequency sequence The method comprises the steps of carrying out a first treatment on the surface of the Then, the convergence distance ++corresponding to each time is determined based on the state of each time acquired based on the neural network in the foregoing process>The method comprises the steps of carrying out a first treatment on the surface of the Finally, convergence stationary coefficient determination module 230 can be determined according to the ratio of the peak-to-peak variability to the convergence distance (+)>) Obtaining convergence stability coefficient corresponding to each moment>
In this specification, the convergence stationary factor may be used to reflect the probability of harmonic peaks being present at each instant in the speech signal. The larger the periodic fluctuation between the i-th time and the rest of the peak values in the speech signal x (t), the larger the difference in the interval time between the peak level corresponding to the i-th time and the rest of the peak levels,the greater the value of (2), the greater the difference in frequency between the peak level corresponding to the ith moment and the remaining peak level, the +.>The greater the value of (2); the closer the i-th moment is to the convergence state, the more unstable the difference in amplitude between the i-th moment and the convergence moment is, +.>The larger the value of (2), the smaller the difference between the power value at the i-th moment and the power value at the convergence moment, the +.>The smaller the value of (i) the more likely it is that the i-th instant is close to the highest value in the amplitude-frequency distribution of the speech signal x (t), i.e.>The larger the value of (c), the worse the periodicity of the speech signal at the i-th moment, the less stable the short-term energy of the NLMS algorithm at the i-th moment, the more likely the i-th moment is in a converged state, and the more likely the i-th moment is to contain howling.
It should be noted that, in the embodiment of the present disclosure, the convergence stability coefficient is calculated in the above manner, and the amplitude-frequency distribution at the time adjacent to the peak point is considered, so that the evaluation error based on a single time in the conventional howling detection method can be avoided, and therefore, the detection accuracy of the howling signal in the digital hearing aid can be improved to a certain extent.
Step 340, determining a non-stationary moment based on the convergence stationary coefficient. In some embodiments, step 340 may be performed by the non-stationary moment determination module 240.
In some embodiments, the convergence stability coefficients corresponding to all the moments in the non-convergence state determined in the foregoing process may be formed into a convergence sequence according to a time sequence, and a BG (Bernaola Galvan) sequence segmentation algorithm is used to obtain a mutation point in the convergence sequence, and then the moment corresponding to the mutation point and all the moments in the convergence state are used together as non-stable moments when howling may exist in the speech signal x (t). The BG sequence segmentation algorithm is a well-known technique, and specific processes thereof are not described in the specification.
Step 350, determining a step adjustment ratio based on the time interval between the non-stationary time and the howling time, and obtaining a convergence step corresponding to each time based on the step adjustment ratio and the convergence stationary coefficient. In some embodiments, step 350 may be performed by convergence step determination module 250.
If the non-stationary moment y is caused by howling, according to the characteristic that the power of the howling frequency point in the time domain is rapidly increased and is always maintained after the maximum value is reached, the y+1st non-stationary moment is caused by the disappearance of howling and the recovery of the voice signal to a stable state. Based on this, it is possible to consider the specific deviation situation that characterizes each non-stationary moment with a linear prediction coefficient.
Specifically, in some embodiments, the convergence step size determining module 250 may calculate the linear prediction coefficient of each non-stationary time, and then arrange the linear prediction coefficients corresponding to each non-stationary time according to a time sequence, to obtain a corresponding linear prediction sequence.
When the digital hearing aid is in an initial stage or the external acoustic environment suddenly changes, the coefficient of the adaptive filter is changed more severely, the short-time average value also has obvious fluctuation, but the change of the long-time average value is slower, namely the short-time average value of coefficient energy in the adaptive filter is more different from the long-time average value, noise or howling possibly exists in a voice signal x (t) input into the adaptive filter, and the adaptive filter is in a convergence state. In this case, the convergence speed can be increased by selecting a larger step size in the NLMS algorithm, and howling is prevented; otherwise, if each moment in the voice signal is in a stable state, the coefficient energy distribution of the adaptive filter is stable, and at the moment, the convergence speed can be reduced by selecting a smaller step length in the NLMS algorithm, so that the error between the output signal and the expected signal is smaller. In other words, the closer to the non-stationary instants in the speech signal x (t), the more the step size should be amplified.
Based on the analysis, a step adjustment ratio U can be constructed for characterizing the degree of change in the state of the speech signal at each non-stationary instant of the speech signal. Specifically, in some embodiments, the step size at the y-th non-stationary time adjusts the ratioThe calculation can be performed as follows:
;/>
wherein,the convergence contribution ratio corresponding to the y-th non-stationary moment; />、/>、/>The linear prediction sequences are respectively corresponding to the y-1 th, the y-1 th and the y+1 th non-stationary moments; />、/>Respectively the sequences->And->、/>And->、/>And->Pearson correlation coefficient therebetween.The larger the value of (c) is, the greater the contribution to the convergence state at the y-th non-stationary instant is.
For the information effective ratio of the y-th non-stationary time, M is the number of non-stationary times of the speech signal x (t), and +.>The reconstruction variable quantity corresponding to the y-th non-stationary moment. In the present description embodiment, the reconstructionThe acquisition process of the variation is as follows: deleting the y-th non-stationary moment from the voice signal x (t), reconstructing by using the signals at the rest moments to obtain a reconstructed signal, and taking the French Distance (Frechet Distance) between the amplitude-frequency characteristic curve of the reconstructed signal and the amplitude-frequency characteristic curve of the voice signal x (t) as a reconstruction variable quantity. / >The larger the value of (c) is, the smaller the effective information in the speech signal contained in the y-th non-stationary time is, and the more likely it is to be in a converged state.
And adjusting the proportion for the step length corresponding to the y-th non-stationary moment. In some embodiments, the convergence contribution ratio corresponding to the y-th non-stationary moment calculated by the above formula may be +.>Information effective ratio with the y-th non-stationary timeMultiplying to obtain the step length adjusting proportion corresponding to the y-th non-stable moment>
In the present embodiment, the step size is adjusted in proportion toThe degree of change in the speech state at each non-stationary instant in the speech signal may be reflected. The closer the y-th non-stationary time is to the howling time, the smaller the association between the y-th non-stationary time and the adjacent time is, +.>、/>The smaller the value of (2), the closer the y-th non-stationary moment is to the howling moment, its neighborsThe greater the probability that two non-stationary moments are caused by howling, the +.>The greater the value of (2); the smaller the effective information contained in the voice signal at the y-th non-stationary moment is, the larger the maximum difference between amplitude-frequency characteristic curves before and after signal reconstruction is, +.>The greater the value of +.>The greater the value of (2); i.e. < ->The larger the value of (c), the more likely the y-th non-stationary time is to be near the howling time, and the closer the coefficient energy distribution in the adaptive filter is to the convergence state.
In the embodiment of the specification, the amplitude-frequency characteristic of each moment in the voice signal is utilized to calculate the step length adjustment proportion, and meanwhile, the coefficient energy distribution state in the adaptive filter is evaluated according to the approaching degree of each non-stable moment and the howling moment, so that the convergence step length in the NLMS algorithm can be acquired in a self-adaptive manner in the subsequent process, and the problem that the convergence speed and the precision are difficult to adjust is solved.
Further, after calculating the step adjustment ratio corresponding to each time, the convergence step determination module 250 may obtain the convergence step at each time according to the convergence stability coefficient and the step adjustment ratio corresponding to each time. Specifically, in some embodiments, the convergence step size corresponding to the i-th time instantThe calculation can be performed as follows:
wherein,for the convergence ratio at the ith moment, M is the number of non-stationary moments, +.>Convergence stability coefficient corresponding to the ith moment and the yth non-stable moment respectively, ++>The scale is adjusted for the step corresponding to the y-th non-stationary moment,for convergence step maximum, +.>Representing a normalization function->For maximum convergence ratio, +.>Is a regulatory factor. Wherein (1) >The magnitude of (2) may be taken as a checked value; />The magnitude of (2) can be taken as a checked value of 5, which has the effect of avoiding that the data range after normalization is overlarge due to overlarge parameters.
In the present specification, the above-mentioned factors are usedMaximum value of empirical value convergence stepAnd the value of the length M of the reference sequence is only exemplary. In some embodiments, the above listThe lifting value can be adjusted according to actual needs under the allowable condition.
And step 360, performing echo cancellation on the voice signal based on the NLMS algorithm and the convergence step length to suppress howling in the digital hearing aid. In some embodiments, step 360 may be performed by howling suppression module 260.
Through the steps, the adaptive convergence step length corresponding to each moment in the voice signal can be determined. Further, the howling suppression module 260 may perform echo cancellation on the voice signal received by the digital hearing aid based on the NLMS algorithm and the adaptive convergence step size corresponding to each time instant to suppress howling in the digital hearing aid.
Specifically, in some embodiments, the step adjustment ratio of each time may be obtained according to the above steps, and then the non-stationary time corresponding to the maximum value of the step adjustment ratio is used as the howling start time, and the improved NLMS algorithm is used to perform howling suppression on the voice signal in the digital hearing aid.
The howling suppression process of the suppression system is as follows: and taking the voice signal in the digital hearing aid and the feedback signal of the adaptive filter as the digital hearing aid input signal of the suppression system, and operating the improved NLMS algorithm to obtain a feedback estimation signal. Further, taking the difference result of the digital hearing aid input signal and the feedback estimation signal as a real input signal of the suppression system, enabling the real input signal to reach a howling detection module through a gain module in the suppression system, detecting whether the howling initial moment exists through the step length adjusting proportion, stopping the operation of the NLSM algorithm by the suppression system if the howling initial moment exists, starting a trap to suppress howling of the real input signal until the howling initial moment does not exist, closing the trap, and operating the improved NLMS algorithm again to eliminate echoes to obtain the feedback estimation signal.
Through the above process, noise and howling in the voice signal received by the digital hearing aid can be removed, and the processed voice signal is obtained. Further, the processed voice signal may be transmitted to the voice output device 140 for feedback to the user. It should be noted that in the embodiments of the present disclosure, by suppressing noise and howling in a speech signal received by a digital hearing aid, the sound output quality of the digital hearing aid may be improved, thereby improving the user experience.
In summary, the possible benefits of the embodiments of the present disclosure include, but are not limited to: (1) In the digital hearing aid howling suppression method and system based on the neural network provided by some embodiments of the present disclosure, the state of each moment in a voice signal is detected through the neural network, and a convergence stability coefficient is constructed according to the distribution characteristics of the amplitude-frequency peak value at each moment, so that the amplitude-frequency distribution at the moment adjacent to the peak point is considered, the problem of undesirable effect caused by evaluation based on a single moment in the traditional howling detection method can be avoided, and the detection precision of the howling signal in the digital hearing aid is improved; (2) According to the digital hearing aid howling suppression method and system based on the neural network, provided by some embodiments of the present disclosure, the amplitude-frequency characteristic of each time in a voice signal can be fully utilized by constructing a step length adjustment ratio based on the time interval between the non-stationary time and the howling time, and meanwhile, the coefficient energy distribution state in the adaptive filter is evaluated according to the approach degree of each non-stationary time and the howling time, so that the convergence step length in the NLMS algorithm can be obtained in a subsequent process in a self-adaptive manner, the problem that adjustment between the convergence speed and the precision is difficult is solved, and the suppression effect of the suppression system on the howling is improved.
It should be noted that, the benefits that may be generated by different embodiments may be different, and in different embodiments, the benefits that may be generated may be any one or a combination of several of the above, or any other benefits that may be obtained.
While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.
Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the specification can be illustrated and described in terms of several patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the present description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the specification may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
The computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer storage medium may be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.
The computer program code necessary for operation of portions of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, vb net, python and the like, a conventional programming language such as C language, visual Basic, fortran2003, perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute entirely on the user's computer or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.
Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing processing device or mobile device.
Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.
In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the content of this specification, documents that are currently or later attached to this specification in which the broadest scope of the claims to this specification is limited are also. It is noted that, if the description, definition, and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition, and/or use of the term in this specification controls.
Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims (9)

1. The digital hearing aid howling suppression method based on the neural network is characterized by comprising the following steps of:
Acquiring a voice signal received by a digital hearing aid;
acquiring the state of each moment in the voice signal based on a neural network;
determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristic of the amplitude-frequency peak value of each moment, wherein the convergence stability coefficient is used for representing the probability of harmonic peaks at each moment in the voice signal;
determining a non-stationary moment based on the convergence stationary coefficient;
determining a step length adjusting proportion based on the time interval between the non-stable time and the howling time, and acquiring a convergence step length corresponding to each time based on the step length adjusting proportion and the convergence stability coefficient;
echo cancellation is carried out on the voice signal based on an NLMS algorithm and the convergence step length so as to inhibit howling in the digital hearing aid;
the acquiring the state of each moment in the voice signal based on the neural network comprises the following steps:
denoising the voice signal, and performing time domain transformation on the denoised voice signal to obtain a corresponding time domain diagram;
acquiring short-time energy of each moment in the voice signal based on the time domain diagram, and a short-time autocorrelation coefficient of the voice signal when delay is k;
And taking the short-time energy and the short-time autocorrelation coefficient at each moment as an instantaneous vector at each moment in a time domain diagram corresponding to the voice signal, and inputting the instantaneous vector into a trained state detection model to obtain a state at each moment in the voice signal, wherein the states comprise a convergence state and a non-convergence state.
2. The method for suppressing howling in a digital hearing aid based on a neural network according to claim 1, wherein said determining the convergence and stability coefficient corresponding to each time based on the state of each time in the speech signal and the distribution characteristics of the amplitude-frequency peak at each time comprises:
acquiring each peak value of the amplitude value from the frequency domain diagram of the voice signal, and taking each unequal peak value as a peak value stage;
for each of said peak levels;
acquiring all corresponding moments of the peak value in the time domain diagram of the voice signal, taking a time difference value between two adjacent moments as interval time, and recording a sequence formed by all interval time as an interval time sequence;
acquiring all frequencies corresponding to the peak value in the frequency domain diagram of the voice signal, taking a difference value between two adjacent frequencies as an interval frequency, and recording a sequence formed by all interval frequencies as an interval frequency sequence;
Determining the peak-to-peak fluctuation degree corresponding to each moment based on the interval time sequence and the interval frequency sequence;
determining a convergence distance corresponding to each moment based on the state;
and obtaining a convergence stability coefficient corresponding to each moment according to the ratio of the peak-to-peak fluctuation degree to the convergence distance.
3. A digital hearing aid squeal suppression method based on neural network according to claim 2, characterized in that the peak-to-peak variability is calculated based on the following formula:
wherein,for the peak-to-peak variability corresponding to the ith moment, m is the statistical number of the peak levels in the speech signal, +.>For the interval time sequence corresponding to the peak level corresponding to the ith moment,/th moment>For the interval time sequence corresponding to the b-th peak level,/th peak level>For the interval frequency sequence corresponding to the peak level corresponding to the ith moment, +.>For the interval frequency sequence corresponding to the b-th peak level,/a>For interval time sequence->And->DTW distance between>For the interval frequency sequence->And->DTW distance between.
4. A digital hearing aid howling suppression method based on a neural network according to claim 2, characterized in that the convergence distance is calculated based on the following formula:
Wherein,for the convergence distance corresponding to the i-th moment, < >>M is the length of the reference sequence corresponding to each convergence time, s is the sequence number corresponding to the convergence time, and +.>For the power value at the i-th instant, +.>Power value at j-th moment in reference sequence for s-th convergence moment,/th>For the ith moment and +.>The variation coefficient of the data set formed by the amplitude differences at the convergence time.
5. The digital hearing aid howling suppression method based on a neural network according to claim 2, wherein said determining a non-stationary moment based on the convergence stationary coefficient comprises:
forming convergence sequences by using convergence stability coefficients corresponding to all moments in a non-convergence state according to a time sequence, and acquiring mutation points in the convergence sequences by using a BG sequence segmentation algorithm;
and taking the moment corresponding to the abrupt point and all the moments in the convergence state as non-stable moments in the voice signal.
6. The digital hearing aid howling suppression method based on a neural network according to claim 5, wherein said determining a step size adjustment ratio based on the time interval of the non-stationary time instant and the howling time instant comprises:
Calculating a linear prediction coefficient of each non-stationary moment, wherein the linear prediction coefficient is used for representing the deviation condition of each non-stationary moment;
arranging the linear prediction coefficients corresponding to each non-stationary moment according to a time sequence to obtain a corresponding linear prediction sequence;
deleting each voice signal at a non-stationary moment in the voice signals to obtain voice signals at the remaining moment, and reconstructing the voice signals at the remaining moment to obtain reconstructed signals; taking the Fregming distance between the amplitude-frequency characteristic curve of the reconstructed signal and the amplitude-frequency characteristic curve of the voice signal as the reconstruction variation of each non-stationary moment;
for each non-stationary moment;
determining a convergence contribution ratio based on the linear prediction sequences corresponding to the front and rear adjacent moments and the linear prediction sequences corresponding to the front and rear adjacent moments; determining an information effective ratio based on the ratio of the reconstruction variable quantity corresponding to the information effective ratio to the sum of the reconstruction variable quantities corresponding to all non-stationary moments;
and obtaining a step length adjusting ratio corresponding to each non-stationary moment according to the convergence contribution ratio and the information effective ratio.
7. The digital hearing aid howling suppression method based on a neural network according to claim 6, wherein the convergence contribution ratio is calculated based on the following formula:
Wherein,the convergence contribution ratio corresponding to the y-th non-stationary moment; />、/>、/>The linear prediction sequences are respectively corresponding to the y-1 th, the y-1 th and the y+1 th non-stationary moments; />、/>、/>Respectively the sequences->And->、/>And->、/>And->Pearson correlation coefficient therebetween;
the information effective ratio is calculated based on the following formula:
wherein,for the information effective ratio of the y-th non-stationary moment, M is the number of non-stationary moments,/>The reconstruction variable quantity corresponding to the y-th non-stationary moment;
the step size adjustment ratio is calculated based on the following formula:
wherein,and adjusting the proportion for the step length corresponding to the y-th non-stationary moment.
8. The digital hearing aid howling suppression method based on a neural network according to claim 6, wherein the convergence step size is calculated based on the following formula:
wherein,for the convergence ratio at the ith moment, M is the number of non-stationary moments, +.>Convergence stability coefficient corresponding to the ith moment and the yth non-stable moment respectively, ++>Adjusting the ratio for the step corresponding to the y-th non-stationary moment,/->For convergence step maximum, +.>Representing a normalization function->For maximum convergence ratio, +.>Is a regulatory factor.
9. A digital hearing aid squeal suppression system based on a neural network comprising:
The acquisition module is used for acquiring the voice signal received by the digital hearing aid;
the state determining module is used for acquiring the state of each moment in the voice signal based on the neural network;
the convergence stability coefficient determining module is used for determining a convergence stability coefficient corresponding to each moment according to the state of each moment in the voice signal and the distribution characteristics of the amplitude-frequency peak value of each moment, and the convergence stability coefficient is used for representing the probability of harmonic peaks existing at each moment in the voice signal;
a non-stationary time determining module, configured to determine a non-stationary time based on the convergence stationary coefficient;
the convergence step length determining module is used for determining a step length adjusting proportion based on the time interval between the non-stable time and the howling time and acquiring a convergence step length corresponding to each time based on the step length adjusting proportion and the convergence stable coefficient;
the howling suppression module is used for performing echo cancellation on the voice signal based on an NLMS algorithm and the convergence step length so as to suppress howling in the digital hearing aid;
the acquiring the state of each moment in the voice signal based on the neural network comprises the following steps:
denoising the voice signal, and performing time domain transformation on the denoised voice signal to obtain a corresponding time domain diagram;
Acquiring short-time energy of each moment in the voice signal based on the time domain diagram, and a short-time autocorrelation coefficient of the voice signal when delay is k;
and taking the short-time energy and the short-time autocorrelation coefficient at each moment as an instantaneous vector at each moment in a time domain diagram corresponding to the voice signal, and inputting the instantaneous vector into a trained state detection model to obtain a state at each moment in the voice signal, wherein the states comprise a convergence state and a non-convergence state.
CN202311152648.6A 2023-09-08 2023-09-08 Digital hearing aid howling suppression method and system based on neural network Active CN116887160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311152648.6A CN116887160B (en) 2023-09-08 2023-09-08 Digital hearing aid howling suppression method and system based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311152648.6A CN116887160B (en) 2023-09-08 2023-09-08 Digital hearing aid howling suppression method and system based on neural network

Publications (2)

Publication Number Publication Date
CN116887160A CN116887160A (en) 2023-10-13
CN116887160B true CN116887160B (en) 2024-01-12

Family

ID=88257224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311152648.6A Active CN116887160B (en) 2023-09-08 2023-09-08 Digital hearing aid howling suppression method and system based on neural network

Country Status (1)

Country Link
CN (1) CN116887160B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1262012A (en) * 1997-05-07 2000-08-02 艾利森公司 Improved acoustic echo canceler for telecommunications system
JP2006005619A (en) * 2004-06-17 2006-01-05 Matsushita Electric Ind Co Ltd Adaptive equalizer, adaptive equalization method, and recording medium
CN102056068A (en) * 2009-08-03 2011-05-11 伯纳方股份公司 A method for monitoring the influence of ambient noise on stochastic gradient algorithms during identification of linear time-invariant systems
KR101389804B1 (en) * 2012-11-16 2014-04-29 (주) 로임시스템 Howling cancelation apparatus with high-speed modeling of feed back channel
CN109788400A (en) * 2019-03-06 2019-05-21 哈尔滨工业大学(深圳) A kind of neural network chauvent's criterion method, system and storage medium for digital deaf-aid
KR102130505B1 (en) * 2019-05-02 2020-07-06 남서울대학교 산학협력단 Apparatus and method for removing feedback signal of hearing aid through deep learning
CN112689056A (en) * 2021-03-12 2021-04-20 浙江芯昇电子技术有限公司 Echo cancellation method and echo cancellation device using same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170311095A1 (en) * 2016-04-20 2017-10-26 Starkey Laboratories, Inc. Neural network-driven feedback cancellation
EP4133751A1 (en) * 2020-04-09 2023-02-15 Starkey Laboratories, Inc. Hearing device with feedback instability detector that changes an adaptive filter

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1262012A (en) * 1997-05-07 2000-08-02 艾利森公司 Improved acoustic echo canceler for telecommunications system
JP2006005619A (en) * 2004-06-17 2006-01-05 Matsushita Electric Ind Co Ltd Adaptive equalizer, adaptive equalization method, and recording medium
CN102056068A (en) * 2009-08-03 2011-05-11 伯纳方股份公司 A method for monitoring the influence of ambient noise on stochastic gradient algorithms during identification of linear time-invariant systems
KR101389804B1 (en) * 2012-11-16 2014-04-29 (주) 로임시스템 Howling cancelation apparatus with high-speed modeling of feed back channel
CN109788400A (en) * 2019-03-06 2019-05-21 哈尔滨工业大学(深圳) A kind of neural network chauvent's criterion method, system and storage medium for digital deaf-aid
KR102130505B1 (en) * 2019-05-02 2020-07-06 남서울대학교 산학협력단 Apparatus and method for removing feedback signal of hearing aid through deep learning
CN112689056A (en) * 2021-03-12 2021-04-20 浙江芯昇电子技术有限公司 Echo cancellation method and echo cancellation device using same

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种自适应数字助听器声反馈消除算法;唐燕;张玲华;;计算机技术与发展(第05期);全文 *
唐燕 ; 张玲华 ; .一种自适应数字助听器声反馈消除算法.计算机技术与发展.2013,(第05期),全文. *
啸叫快速抑制的助听器回声抵消算法;梁瑞宇;王侠;王青云;赵力;邹采荣;;声学学报(第02期);全文 *
梁瑞宇 ; 王侠 ; 王青云 ; 赵力 ; 邹采荣 ; .啸叫快速抑制的助听器回声抵消算法.声学学报.2016,(第02期),全文. *

Also Published As

Publication number Publication date
CN116887160A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN109686381B (en) Signal processor for signal enhancement and related method
US10622009B1 (en) Methods for detecting double-talk
JP5666444B2 (en) Apparatus and method for processing an audio signal for speech enhancement using feature extraction
US5757937A (en) Acoustic noise suppressor
CN103874002B (en) Apparatus for processing audio including tone artifacts reduction
JP6169849B2 (en) Sound processor
JP4764995B2 (en) Improve the quality of acoustic signals including noise
EP2643834B1 (en) Device and method for producing an audio signal
EP2643981B1 (en) A device comprising a plurality of audio sensors and a method of operating the same
CN111161752A (en) Echo cancellation method and device
MX2014011556A (en) Apparatus and method for improving the perceived quality of sound reproduction by combining active noise cancellation and perceptual noise compensation.
US11961504B2 (en) System and method for data augmentation of feature-based voice data
CN112242147A (en) Voice gain control method and computer storage medium
CN106782586B (en) Audio signal processing method and device
CN114664321A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN112530451A (en) Speech enhancement method based on denoising autoencoder
US11380312B1 (en) Residual echo suppression for keyword detection
CN111883154A (en) Echo cancellation method and apparatus, computer-readable storage medium, and electronic apparatus
CN117321681A (en) Speech optimization in noisy environments
Chang Warped discrete cosine transform-based noisy speech enhancement
Shankar et al. Influence of MVDR beamformer on a speech enhancement based smartphone application for hearing aids
CN116030823B (en) Voice signal processing method and device, computer equipment and storage medium
CN116887160B (en) Digital hearing aid howling suppression method and system based on neural network
CN110808058B (en) Voice enhancement method, device, equipment and readable storage medium
EP2660814B1 (en) Adaptive equalization system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant