EP4134954B1 - Procédé et dispositif d'amélioration du signal audio - Google Patents

Procédé et dispositif d'amélioration du signal audio Download PDF

Info

Publication number
EP4134954B1
EP4134954B1 EP21190351.3A EP21190351A EP4134954B1 EP 4134954 B1 EP4134954 B1 EP 4134954B1 EP 21190351 A EP21190351 A EP 21190351A EP 4134954 B1 EP4134954 B1 EP 4134954B1
Authority
EP
European Patent Office
Prior art keywords
audio signal
values
spectral
level
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP21190351.3A
Other languages
German (de)
English (en)
Other versions
EP4134954C0 (fr
EP4134954A1 (fr
Inventor
Markus Vieweg
Dr. Bernd Dominik Schäfer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Optimic GmbH
Original Assignee
Optimic GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Optimic GmbH filed Critical Optimic GmbH
Priority to EP21190351.3A priority Critical patent/EP4134954B1/fr
Publication of EP4134954A1 publication Critical patent/EP4134954A1/fr
Application granted granted Critical
Publication of EP4134954C0 publication Critical patent/EP4134954C0/fr
Publication of EP4134954B1 publication Critical patent/EP4134954B1/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/007Electronic adaptation of audio signals to reverberation of the listening space for PA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/009Signal processing in [PA] systems to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems

Definitions

  • the present invention relates to a method for enhancing an audio signal.
  • the method is preferably carried out in real time, so that it is suitable for essentially simultaneous recording and playback of audio signals.
  • audio signals are often recorded under unfavorable acoustic conditions using microphones.
  • a desired speech signal component is overlaid by unwanted background noise during the recording, which impairs the quality of the audio signal, particularly with regard to speech intelligibility.
  • the audio signal can be reverberated due to the spatial conditions or as a result of a large distance between the speaker and the microphone, so that the speech component of the audio signal is difficult to understand when played back over loudspeakers despite amplification. For this reason, the actual advantage of acoustic amplification of the audio signal is often not sufficient to ensure satisfactory speech signal quality and speech intelligibility.
  • the problems mentioned above are particularly relevant in the field of mobile audio technology used, for example, for trade fairs, because it has to be compatible with a wide variety of acoustic environments and, as a rule, little time is available to optimally adjust the audio processing devices.
  • there is often no way at all to optimize the audio devices for a particular speaker for example with regard to the appropriate distance between the speaker and the microphone.
  • differences between different speakers cause problems. For example, different speakers, who have different voice characteristics (e.g. different speaker volume and frequency composition), in particular due to age and gender differences, cannot be treated with the same audio devices with a constant configuration in such a way that a high voice signal quality is reliably achieved.
  • Methods for improving the voice signal quality are from the documents U.S. 2016 0 019 905 A1 , U.S. 2017 0 047 080 A1 , U.S. 6,295,364 B1 , US 2010 012 1 634 A1 , US 2006 024 7 922 A1 and Schepker etc al., Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression, Interspeech 2013.
  • the object is achieved by a method having the features of claim 1.
  • a method for improving an audio signal has at least the following steps: receiving an audio signal with a plurality of amplitude values, the audio signal having speech at least in sections; detecting speech portions of the audio signal; filtering the audio signal with at least one level filter to reduce signal level variations of the audio signal in the detected speech sections; and filtering the audio signal with at least one equalization filter to reduce spectral variations of the audio signal in the detected speech segments.
  • the method also includes the following steps: determining a feedback frequency, which represents a feedback of the audio signal; Filtering the audio signal with a feedback filter on the basis of the determined feedback frequency in order to reduce spectral components of the audio signal that represent feedback.
  • Filtering with the at least one equalization filter includes a step of determining coarse spectral values on the basis of fine spectral values of the audio signal, the coarse spectral values representing the fine spectral values with a lower spectral resolution than the fine spectral values. Furthermore, first equalization weights are determined, which indicate a deviation of the coarse spectral values from predetermined represent reference spectral values. The audio signal is also weighted with the first equalization weights to bring the spectral values into agreement with the reference spectral values.
  • Determining the feedback frequency includes the steps of: determining a subset of spectral values of the audio signal that violate a predetermined spectral threshold; determining a plurality of first spectral parameter values on the basis of the subset, each of the first spectral parameter values representing a predetermined relation between an associated spectral value of the subset and at least one temporally and/or spectrally adjacent spectral value; and determining the feedback frequency based on the plurality of first spectral parameter values.
  • time sections of the audio signal are detected which contain speech and can be referred to as speech sections.
  • the audio signal is then processed with a level filter and an equalization filter to reduce certain variations in the audio signal.
  • variations can be treated both within an audio signal and between different audio signals.
  • the level filter is used to reduce signal level variations in order to standardize the level of the audio signal. For example, very loud and quiet parts of the speech signal are attenuated or amplified in sections, resulting in a uniform signal level overall adjusts
  • different signal levels result, for example, from variable distances between a speaker and the recording microphone and from the acoustic properties of the surrounding room.
  • the resulting level variations are compensated by the level filter, so that the subjective signal quality improves.
  • an equalization filter is used to reduce spectral variations in the audio signal.
  • spectral variations occur due to different speakers, who impress their own spectral characteristics on the audio signal with their voices.
  • there is a spectral coloring due to the acoustic environment during the recording and possibly due to the sound equipment used, in particular the microphone and its alignment relative to the speaker.
  • the spectral components in certain frequency ranges that are relevant for speech intelligibility are not masked, or only to a small extent, by other spectral components.
  • the acoustic environmental conditions often lead to the speech-relevant parts being variably superimposed by other signal parts in the same or in neighboring frequency ranges, so that the speech-relevant parts cannot always be perceived equally well.
  • Such changes in the signal can be determined from the spectral variations over time and can therefore be treated by a suitable filter.
  • the equalization filter is used to reduce spectral variations in the detected speech sections. In this way, the audio signal can be standardized in spectral terms in order to To increase signal quality, especially with regard to good speech intelligibility.
  • the method allows a fully automatic signal improvement to take place.
  • a previous or operational manual setting or readjustment of filter parameters is therefore not necessary, i.e. the parameters of the level and/or equalization filter can be permanently set when the method is carried out as intended or are set automatically by a computing unit.
  • the process ensures excellent signal improvement for a wide variety of audio signals, even in particularly difficult acoustic environments.
  • the method is particularly robust against acoustic variations of any kind and is therefore particularly suitable for professional use in practice.
  • the method can be used in real time, i.e. with a latency of less than 20 ms, preferably less than 10 ms, in particular 6 ms.
  • the filtering of the audio signal does not necessarily have to be restricted to the detected speech sections.
  • an equalization filter can also be effective outside of speech sections with regard to special spectral components that are caused, for example, by feedback.
  • the audio signal is filtered at least in the speech sections because these are particularly important for speech intelligibility.
  • certain aspects of the filtering can be restricted to the speech sections.
  • the method includes a step of determining a plurality of spectral values on the basis of the amplitude values, the amplitude values representing the audio signal in a time domain and the spectral values representing the audio signal in a frequency domain.
  • the detection of the speech sections, the filtering with the at least one level filter and/or the filtering with the at least one equalization filter takes place on the basis of the amplitude values and/or the spectral values.
  • the filtering is thus performed on the basis of two different representations of the audio signal, namely time domain and frequency domain values of the audio signal. The efficiency and reliability of the process is increased in this way.
  • the spectral values can be determined on the basis of the time-domain amplitude values using known frequency-space transforms, such as the Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • the spectral values are preferably formed by the absolute value of the frequency coefficients (spectral amplitude values), which can be determined particularly efficiently by FFT on the basis of the time domain amplitude values.
  • the advantageous use of the amplitude values and the spectral values thus requires comparatively few computer resources.
  • the detection of the speech sections comprises at least the following steps: determining at least a first energy parameter value on the basis of the amplitude values, wherein the first energy parameter value represents an average energy of a portion of the speech signal; determining at least one second spectral parameter value based on spectral values of the audio signal, the at least one second spectral parameter value representing a harmonic spectral structure of the portion; and detecting the segment as a speech segment if the at least one first energy parameter value violates a first energy parameter threshold and/or the at least one second spectral parameter value violates a spectral parameter threshold.
  • the detection of speech sections based on time domain and spectral parameters has proven to be particularly useful for reliably detecting both noise-like sections (e.g. consonants) and tonal sections (e.g. vowels) and evaluating them by threshold comparison to distinguish between speech sections and noise sections.
  • the threshold values mentioned can in principle be permanently set. However, the reliability of the detection of speech sections can be improved in a particular way by adapting the first energy parameter threshold value and/or the first spectral parameter threshold value as a function of time. For example, the signal level of the audio signal can be used to set the thresholds to ensure that the thresholds are aligned with the current energy level.
  • the filtering of the audio signal with the at least one level filter comprises at least the following: determining at least one level parameter value on the basis of the amplitude values, the level parameter value having a represents mean level of the audio signal for a detected speech segment; determining at least one compensation weight based on the at least one level parameter value; and weighting the audio signal with the at least one compensation weight to reduce the signal level variations of the audio signal.
  • the at least one level parameter value can generally comprise a plurality of level parameter values which indicate the level for detected speech sections of different lengths.
  • first and second level parameter values can be determined, the first level parameter values representing the mean level of the audio signal with a first time resolution and the second level parameter values representing the mean level of the audio signal with a second time resolution.
  • the first and second time resolution differ from each other.
  • short-term and long-term effects of human auditory perception can be advantageously taken into account.
  • brief level peaks (clipping) can be detected by level parameter values with a short time resolution and used for filtering.
  • moderate level variations which only become perceptible after a minimum duration, can be recorded using level parameter values with greater time resolution.
  • the compensation weight for the level filter is then determined based on the first and second level parameter values.
  • the first level parameter values are preferably formed on the basis of a plurality of consecutive energy averages. These can be smoothed to obtain first loudness values that form the first level parameter values.
  • the second level parameter values are preferably formed by second volume values. These can are in turn formed on the basis of a plurality of consecutive energy averages, with a larger number of energy averages being smoothed than the first volume values, so that the second level parameter values each indicate the level for a longer period of time than the first level parameter values.
  • the second time resolution is therefore preferably greater than the first time resolution.
  • At least some of the amplitude values are preferably grouped into time segments of the audio signal. Then the loudness values for at least some of the time periods are determined based on the grouped amplitude values, each of the loudness values representing the loudness of one of the time periods of the audio signal.
  • the terms “energy” and “level” each represent an intensity or magnitude of the amplitude values.
  • Level values can therefore basically be viewed as energy values of the audio signal and vice versa, with a different unit for both values being possible but not mandatory (e.g. the normalized logarithmic unit dB can be provided for the level in contrast to the energy).
  • the term “level” creates a functional reference to the level filter in particular.
  • the term “loudness” represents the intensity of the amplitude values considering the auditory perceptibility.
  • first compensation weights and second compensation weights are determined, with the first compensation weights being determined in order to reduce signal level variations with at least one level that is greater than a predetermined level threshold value, with the second compensation weights being determined in order to reduce the signal level of the Adjust audio signal to a predetermined value.
  • the audio signal in the detected speech sections is set to a basic level so that moderate volume fluctuations from the listener's point of view can be compensated.
  • the first compensation weights are preferably determined on the basis of the first level parameter values and the second compensation weights are determined on the basis of the second level parameter values. In this way, the filtering can be carried out in a particularly hearing-friendly manner.
  • the filtering with the at least one equalization filter includes a step of determining coarse spectral values on the basis of fine spectral values of the audio signal, the coarse spectral values representing the fine spectral values with a lower spectral resolution than the fine spectral values. Furthermore, first equalization weights are determined, which represent a deviation of the coarse spectral values from predetermined reference spectral values. The audio signal is also weighted with the first equalization weights to bring the spectral values into agreement with the reference spectral values.
  • the fine spectral values are preferably formed by the spectral values mentioned above, which can be efficiently determined in particular by FFT.
  • the spectral resolution of these spectral values is significantly higher than the resolution that can be resolved by the human ear.
  • the frequency resolution of the coarse spectral values preferably corresponds to the resolution of human hearing, so that on this basis an aurally appropriate equalization is made possible.
  • the reference spectral values used for this purpose represent a reference spectrum for achieving a high voice quality of audio signals.
  • the coarse spectral values can be obtained, for example, by octave band filtering of the fine spectral values.
  • the filtering with the at least one equalization filter includes a weighting of the audio signal with second equalization weights, the second equalization weights being predetermined.
  • second equalization weights can be provided which, in contrast to the first equalization weights, are not determined dynamically but are fixed in advance.
  • the second equalization weights can be used to weaken spectral components, for example, which are always a hindrance to high voice quality and can therefore be given a negative gain factor.
  • the method includes filtering the audio signal with at least one compressor in order to reduce a dynamic range of the audio signal.
  • a plurality of different sets of parameters can be provided, which are selected as a function of an amount of the audio signal and are used as a basis for filtering with the at least one compressor.
  • the multiple parameter sets can advantageously differ from one another in terms of a degree of compression.
  • the multiple sets of parameters may include a first set of parameters to reduce the dynamic range of the audio signal with a first degree of compression, wherein the multiple sets of parameters include a second set of parameters to reduce the dynamic range of the audio signal with a second degree of compression that is greater than the first degree of compression .
  • the plurality of parameter sets preferably have a third parameter set in order to match the dynamic range of the audio signal with a third Reduce compression level that is less than the first compression level.
  • the compressor can be thought of as a special level filter because a reduction in dynamic range is accompanied by a reduction in level and level variations.
  • the method further comprises the following steps: determining a feedback frequency, which represents a feedback of the audio signal; Filtering the audio signal with a feedback filter on the basis of the determined feedback frequency in order to reduce spectral components of the audio signal that represent feedback.
  • the spectral values already present are preferably used to determine the feedback frequency, so that they do not have to be determined again for this purpose.
  • Feedback occurs when reproduced signal components are recorded again by the microphone and amplified, resulting in an unstable system state that is acoustically perceptible through strong resonance, e.g. through humming or a shrill whistling sound.
  • the feedback filter counteracts such coupling effects so that the signal quality is not affected.
  • the feedback filter can be viewed as a special equalization filter.
  • determining the feedback frequency preferably comprises the following steps: determining a subset of spectral values of the audio signal that violate a predetermined spectral threshold value; determining a plurality of first spectral parameter values based on the subset, each of the first spectral parameter values having a represents a predetermined relation between an associated spectral value of the subset and at least one temporally and/or spectrally adjacent spectral value; and determining the feedback frequency based on the plurality of first spectral parameter values.
  • the computing effort for determining the feedback frequency can be greatly reduced by the threshold-based preselection of spectral values, so that the real-time capability of the method is enhanced.
  • a predetermined relation between spectral values can be formed, in particular, by mathematically linking the spectral values, for example by using mathematical operators such as division or addition. In this way, certain properties of the spectrum that are typical of a feedback frequency can be efficiently detected.
  • the step-by-step reduction of the feedback filter preferably takes place according to the scheme of a finite automaton.
  • a pause filter is provided for filtering the audio signal in order to reduce the audio signal in areas outside of the detected speech sections. In this way, for example, temporal masking effects caused by background noise can be weakened.
  • the audio signal can be filtered with a noise filter in order to reduce the audio signal in areas with amplitude values that violate a predetermined noise threshold value.
  • a noise filter is preferably used.
  • the audio signal is filtered with a bandpass filter.
  • a lower limit frequency of the bandpass filter is preferably in a range from 50 to 100 Hz.
  • An upper limit frequency of the bandpass filter is preferably in a range from 8000 to 10000 Hz.
  • the method aspects described above can be stored as instructions in a non-volatile memory. If the instructions are executed by an arithmetic unit, the arithmetic unit is prompted by the instructions to execute the method described according to one embodiment. In general, the method can thus be partially or fully implemented by a computer.
  • the object of the invention is achieved by a device having the features of the independent device claim.
  • the input interface has a connector for a microphone to capture the audio signal.
  • at least one output interface is provided for outputting the audio signal.
  • the Output interface has a connection for an audio playback device, eg a public address system with one or more sound transducers.
  • the device also has a computing unit for executing a method for improving the audio signal. The method is designed according to one of the preceding embodiments.
  • the device is preferably designed as a compact audio device, so that it is particularly suitable for mobile use.
  • the device preferably has a non-volatile memory in which commands for executing the method are stored.
  • the memory can be coupled to the processing unit.
  • the computing unit preferably includes an analog-to-digital converter and a digital-to-analog converter.
  • the enhancement of the audio signal can thus be based at least in part on a digital version of the audio signal.
  • the method can thus be carried out particularly efficiently on the one hand. On the other hand, a high filtering quality can be guaranteed.
  • the input or output interface can be implemented as a wired interface in order to ensure compatibility with other professional audio devices and to minimize transmission losses.
  • the interfaces can be wireless in each case, in which case the interfaces can also be combined to form a common wireless interface for this purpose.
  • the device also includes a preamplifier for the audio signal, which can be coupled to the input interface.
  • a preamplifier for the audio signal can be coupled to the input interface.
  • the audio signal can advantageously be amplified to a predetermined level range before sampling.
  • a plurality of predetermined amplification values can be provided for the preamplifier, with one of the amplification values preferably being selected automatically or by an operator of the device and the amplification being used as a basis.
  • the device preferably has an electrical supply for the input interface. This enables electrical supply of a connected sound transducer, e.g. a microphone, via the input interface in the sense of a so-called phantom power supply.
  • a connected sound transducer e.g. a microphone
  • the device also has a switching device which can be coupled to the input interface, the output interface and/or the processing unit in order to optionally connect the input interface to the output interface via the processing unit.
  • the computing unit can be bypassed. In this way, an output of the audio signal can also be guaranteed in the event of a malfunction of the processing unit.
  • the device is preferably provided with a cooling device. All components of the device, including the processing unit, can thus be accommodated in a compact housing, with e.g.
  • the computing unit can advantageously have a single-board computer, so that the device can be made particularly compact overall.
  • the device can also have a housing, in which in particular all electrical components of the device can be accommodated in order to be protected from external influences in this way.
  • the computing unit can have one or more processors and a memory in which instructions for executing the method can be stored.
  • the device preferably has at least one external communication interface.
  • the device can be equipped with a network interface, eg an Ethernet interface, or a bus interface to be connected via a network or directly to a user terminal, for example a PC or a mobile terminal such as a laptop.
  • a connection to wireless end devices can also take place via the Internet in order to enable a connection to a central server (cloud).
  • the control interface can also be in the form of a wireless interface, so that the device can be connected directly to a mobile end device (eg via Bluetooth or a local wireless network). Communication with the device, for example for the purpose of configuration, can thus be particularly done comfortably.
  • control data for example filter parameters for executing the described method for improving an audio signal
  • the communication interface can be designed to transmit the audio signal to a mobile terminal device or a central server.
  • the audio signal can be stored in the end device or in a cloud, for example for documentation purposes.
  • the communication interface is preferably designed as an Ethernet interface, which also enables transmission of audio signals (eg using Dante, Milan, AES (Advanced Encryption Standard).
  • a firmware of the device can be updated via a communication interface of the device.
  • a communication interface is preferably provided in the form of a separate bus interface, which is used in particular to connect a storage medium, e.g. a mass storage device in the form of a USB stick or the like.
  • configuration and/or update data can be stored on the storage medium, which are transmitted to the device in order to update the locally stored data.
  • the audio signal can be output to the storage medium for recording purposes and stored in the storage medium.
  • the device is preferably equipped with an operating interface in order to be able to control the recording of the audio signal directly on the device.
  • a non-claimed aspect of the disclosure relates to a method for selectively enhancing a first audio signal using an audio processing means, wherein the first audio signal comprises at least portions of speech and the method comprises at least the following steps: determining whether the audio processing means has a predetermined health status; If the audio processing means has the predetermined fitness state, performing a method of enhancing the first audio signal using the audio processing means to provide a second audio signal; If the audio processing means does not have the predetermined health status, providing the first audio signal.
  • the method thus enables the audio processing means to be used selectively depending on its health status. Malfunctions of the audio processing means therefore do not lead to no audio signal being output and user satisfaction being impaired.
  • the method can be implemented in particular by a switching device, which can be implemented in a device, for example as a switchable relay.
  • the switching functionality can also be implemented by the computing unit itself.
  • a separate switching device has the advantage of protection against a complete failure of the processing unit, in which no transmission of the signal can take place.
  • the methods disclosed herein can preferably be carried out with the device described. However, it is also possible to carry out the method in whole or in part on any computer, in particular a central server.
  • the audio signal can be captured locally and transmitted to a server where signal enhancement is performed.
  • the enhanced signal can then be sent to a local receiver for playback with a sound transducer.
  • An analog audio signal is captured with a microphone (not shown) (step 10), the audio signal having a plurality of speech sections and a plurality of noise sections.
  • the speech sections have speech and form a speech signal component.
  • the noise sections are formed by all other sections that do not have speech, especially in pauses in speaking.
  • the audio signal is pre-amplified, ie electronically amplified as an analog signal with an amplification factor.
  • a preamplifier in 1 not shown, a fixed gain can be set.
  • a user can select one of a number of preset gain values as a function of a recording-related basic level in order to relieve a subsequent level filter for reducing level variations.
  • the pre-amplified audio signal is converted in step 14 from an analogue signal to a digital signal. This is preferably done using an analog-to-digital converter which samples the analog signal at a predetermined sampling rate, e.g., 48,000 Hz. Alternatively, step 14 can also take place after step 16, which is explained below.
  • the audio signal is processed with a level filter in step 16 in order to compensate for variations in the signal level.
  • the level filter is operated as a function of first filter data 44, which is based on of the audio signal at the output of the level filter can be determined in step 18. They include first volume values, detected speech sections and detected level peaks. Level peaks are detected signal levels that are greater than a predetermined level threshold value, in which the signal overdrives (clipping).
  • the volume values are determined for individual blocks of the audio signal, which preferably each have a length of 64 sample values.
  • a first loudness value is determined by summing the squared sample values of the block and then taking the square root of the sum. So-called RMS values (Root Mean Square) are formed in this way, each of which represents an average energy of the underlying block of sampled values.
  • the RMS values of several blocks are preferably used for the level filter.
  • the RMS values of the current block and the previous block are evaluated together, with a level peak being detected if at least one of the two RMS values exceeds a predetermined threshold value, for example ⁇ 3 dB. If a level peak is detected, this information is taken into account as part of the first filter data 44 in step 16 .
  • the gain of the level filter is reduced sharply and rapidly in step 16, for example at a rate of -3 dB within 200 ms. This effectively removes level peaks.
  • Level peaks are preferably filtered regardless of whether the relevant section of the audio signal is a speech section or not.
  • the level filter of step 16 is further configured to adjust the level of the audio signal to a predetermined value.
  • the RMS values of the current block and a large number of several previous blocks, for example 30 previous blocks, are used.
  • the RMS values are smoothed across the blocks considered, removing short-term fluctuations that are irrelevant to human perception (except for the level peaks, which are treated separately).
  • the median of the RMS values under consideration is preferably formed for smoothing in order to obtain second volume values which indicate the current signal level in an aurally correct manner.
  • a compensation weight is then determined, which represents the difference between a predetermined reference value and the current second volume value. For example, the current volume value can be subtracted from a reference volume of -20 dB to form a compensation weight.
  • the compensation weight is then weighted, e.g. multiplied, with the audio signal to bring the loudness in line with the reference loudness.
  • the maximum change in the compensation weight over time is preferably limited, for example to 5 dB per second. This avoids unnatural fluctuations in the volume of the audio signal.
  • the adjustment of the signal level with reference to the reference volume is preferably only carried out in those sections of the audio signal which have been detected as speech sections.
  • the information as to which sections have been detected as speech sections is made known to the level filter of step 16 as part of the filter data 44 .
  • the detection of speech segments takes place in step 18 and is explained below with reference to 2 explained.
  • Speech sections are detected on the basis of amplitude values 54 and spectral values 56, with the amplitude values 54 representing the audio signal in the time domain and the spectral values 56 representing the audio signal in the frequency domain.
  • the amplitude values 54 are formed by the sample values of the digital audio signal after step 14.
  • the spectral values 56 are determined block by block using fast Fourier transformations (FTP) on the basis of the amplitude values 54 . In principle, however, other frequency transformations can also be used.
  • FTP fast Fourier transformations
  • the block length for determining the spectral values 56 is preferably 1024 amplitude values (sampling values), with adjacent blocks preferably overlapping by half and the relevant amplitude values of each block being weighted with a Hann window before the transformation, in order to avoid unwanted spectral components caused by the block boundaries become to reduce. Furthermore, the spectral values 56 are weighted with a predetermined factor, so that the spectral values 56 are normalized to a range between 0 and 1. The factor depends in particular on the window used. In the case of the preferred Hann window, a factor of 0.00391 can advantageously be used.
  • a first parameter value is formed by the RMS value described above based on the amplitude values 54 .
  • the first parameter value may also be referred to as Short Time Energy (STE) because it represents the average energy over a relatively short length block of 64 amplitude values. If the first parameter value exceeds an associated threshold (step 62), the first parameter value indicates a speech portion, otherwise a noise (non-speech) portion. High RMS values can be caused in particular by consonants and thus indicate speech.
  • a second parameter value is determined on the basis of the spectral values 56 and indicates the form of a harmonic overtone structure of the frequency spectrum.
  • the second parameter value represents a measure of the spectral flatness of the frequency spectrum represented by the spectral values 56 (Spectral Flatness, SF).
  • the second parameter value is preferably determined by dividing the geometric mean of the spectral values 56 and the arithmetic mean of the spectral values 56 .
  • the second parameter value is then compared to an associated threshold (step 62). If the threshold is exceeded, the second parameter value indicates a speech section, otherwise a noise section. High values of the second parameter indicate noise-like blocks that are atypical for speech.
  • the second parameter refers to a significantly longer block length of 1024 due to the spectral values, so that the usually significantly shorter consonants are not significant compared to an otherwise tonal characteristic.
  • a third parameter value is also determined, which indicates whether a maximum of the spectral values 56 lies in a predetermined frequency range. For this purpose, it is preferably determined whether the spectral value, the amount of which forms a maximum compared to the other spectral values 56 of a block (step 58), is in a frequency range between 70 and 250 Hz, ie it is checked whether the maximum spectral value represents a frequency that greater than a lower frequency threshold and is less than an upper frequency threshold (step 62). If true, the third parameter value indicates a speech portion, otherwise a noise portion.
  • the fundamental frequency of speech is generally in the range between 70 and 250 Hz, so that a maximum of the spectral values 56 in this range indicates speech.
  • Adaptive threshold values are preferably provided for the first and second parameter values in order to compensate for variable distances between a respective speaker and the recording microphone.
  • the threshold value is determined adaptively for a block of interest based on the parameter values of several previous blocks (step 60), the previous blocks preferably comprising detected speech sections and noise sections. For example, the first parameter values of 30 previous blocks classified as speech section and the first parameter values of thirty previous blocks classified as noise section are used to determine the threshold value for the first parameter value.
  • the first parameter values are summed up for each section type and the sums obtained are subtracted from each other.
  • the result is weighted with a weighting factor to get the associated threshold for the first parameter value of the current block. This ensures that the threshold value is adjusted to the current level of the first parameter value in order to avoid incorrect classifications.
  • the weighting factor is preferably set between 0 and 1 and controls the sensitivity of the detection.
  • the threshold value for the second parameter is preferably also determined according to the principle of the threshold value for the first parameter. In this case, however, the calculation rule is inverted, since the second parameter indicates language with a decreasing amount and is therefore im is inversely correlated with speech compared to the first parameter. Consequently, the sum of the second parameter values for speech sections is subtracted from the sum of the second parameter values for noise sections and given a weighting factor, preferably between 0 and 1, which controls the sensitivity of the detection.
  • step 64 the three parameters are evaluated together and it is determined whether or not the parameter values violate the associated threshold criterion. If two of the three parameter values indicate a speech section, i.e. violate the associated threshold criterion, the block in question is provisionally detected as a speech section.
  • a change between a speech section and a noise section and vice versa is only permitted if a predetermined number of consecutive blocks have been classified as speech section or noise section (step 66 and 68). For example, after a block detected as a noise section, five immediately consecutive blocks must be provisionally detected as a speech section in order to finally detect these blocks as a speech section (step 70). Otherwise, the blocks are still detected as noise sections (step 72). Conversely, after a block detected as a speech section, for example eight immediately consecutive blocks must be provisionally detected as a noise section in order to finally detect these blocks as noise sections (step 72). Otherwise, the blocks are still detected as sections of speech (step 70).
  • step 20 the audio signal is weighted with a fixed amplification factor in order to compensate in advance for level losses caused by subsequent filters.
  • the signal can be amplified by 3 to 6 dB.
  • step 22 the audio signal is filtered with a noise filter adapted to reduce very quiet portions of the audio signal.
  • a noise filter adapted to reduce very quiet portions of the audio signal.
  • very quiet signal sections do not contain any relevant information and in this respect can at most negatively affect the perceived voice quality.
  • the risk of feedback is reduced by reducing the signal level in very quiet signal sections.
  • a so-called noise gate can be used as a noise filter, which is adapted to suppress quiet signal sections.
  • a threshold value which is compared with the current signal level, is used as the criterion for recognizing quiet signal sections. If the current signal level falls below the threshold, the noise filter is activated.
  • the threshold value is preferably well below the reference volume set in step 16 .
  • the threshold can be -55 dB. If the threshold value is not reached, the audio signal is reduced with a ratio in the range of 5 to 10. Values in the range of 10 ms or 100 ms are preferably used as rise time (attack time) and decay time (release time).
  • second filter parameters 46 are determined, which are used for the subsequent steps 32, 34 and 36.
  • the second filter parameters 46 include, on the one hand, the speech sections 52 already detected in step 18.
  • Octave spectral values 48 are also determined, which in comparison to the spectral values 56 have a coarser Have spectral resolution that is modeled on human auditory perception.
  • the spectral values 56 determined, for example, by means of FFT are filtered with an octave filter bank.
  • the octave filter bank comprises a total of eight filters that overlap in the spectral range and are 3 are represented by way of example by magnitude frequency responses 37 over the frequency F and the magnitude G.
  • the frequency responses 37 have their respective maximum at a filter-specific mid-frequency fc and fall towards smaller and larger frequency values.
  • the center frequencies fc are preferably 63, 125, 250, 500, 1000, 2000, 4000 and 8000 Hz.
  • the cut-off frequencies (magnitude frequency response of -3 dB) can be calculated generically on the basis of the respective center frequency fc.
  • the lower cutoff frequency is 32fc/45 and the upper cutoff frequency is 45fc/32.
  • the weighted spectral values falling into a respective filter are summed up, with the weights each representing the absolute value frequency response at the frequency of the spectral value in question.
  • step 24 feedback frequencies 50 are also determined, which are used as part of the filter data 46 for a feedback filter, which is used in step 34.
  • the determination of the feedback frequencies is based on 4 explained in more detail.
  • a maximum value analysis is used to select a number of candidates from the spectral values 56, which represent possible feedback frequencies. For example, those spectral values can be sought out as candidates from the spectral values 56 which in each case have the highest absolute value of all spectral values of a block and are adjacent to spectral values with a similar absolute value. The candidates thus represent the maxima of pronounced extrema of the spectrum.
  • three parameter values are determined (step 74) and compared to a respective threshold (step 78).
  • the threshold values are preferably permanently set for each parameter because the parameters are generally insensitive to a voice signal volume that is low compared to the background noise.
  • a first parameter represents the ratio between the magnitude of the candidate and the associated harmonics (Peak-to-Harmonic Ratio, PHPR).
  • the first two harmonics are used, i.e. the spectral values that represent double and triple the frequency compared to the candidate.
  • High PHPR values indicate a feedback frequency (feedback frequency) because speech usually has a clear overtone structure with harmonics.
  • a second parameter represents the ratio between the magnitude of the candidate and the magnitude of immediately neighboring spectral values (Peak-to-Neighboring Ratio, PNPR).
  • the first three adjacent spectral values in each frequency direction are preferably used.
  • High PNPR values indicate a feedback frequency because speech tends to have less steep frequency maxima.
  • a third parameter represents the course of the absolute value of the candidate over time (Interframe Magnitude Slope Deviation, IMSD).
  • the mean increase in the absolute value of the candidate and a number of adjacent spectral values is preferably determined over five previous blocks.
  • Positive IMSD values of, for example, 0.5 dB typically indicate a feedback frequency, because the amount of the fundamental frequency of speech does not usually increase over several blocks.
  • the feedback frequency is preferably determined as a maximum of the spectrum in the region of the candidate in question.
  • the spectrum is interpolated on the basis of the candidate and the adjacent spectral values with an interpolation function (e.g. by parabolic interpolation) and then the maximum of the interpolation function is formed.
  • this maximum can lie between two spectral values, so that the interpolated maximum is more precise.
  • the feedback frequency determined in this way is used as part of the filter data 50 for the feedback filter (step 34).
  • the underlying candidate In order to relieve the computer resources, it is preferred not to subject the underlying candidate to the parameter analysis again for a predetermined period of time after a successfully determined feedback frequency if the candidate is identified as such again. For example, the same candidates are not checked again within a 1 second time window to determine whether or not they represent a feedback frequency. Instead, the feedback frequency determined for the temporally previous candidate is adopted for the subsequent, same candidate, because there is a high probability that the same feedback frequency will also be used for the subsequent candidate would be determined. Only after the predetermined time has elapsed is a relevant candidate checked again.
  • a so-called bell filter is provided in the feedback filter, the center frequency of which is set to the specific feedback frequency.
  • the Q value of the filters is preferably set to a fixed value.
  • the gain of the filter is preferably adjusted adaptively, as shown below with reference to FIG figure 5 is explained.
  • the algorithm illustrated implements a finite state machine (FSM) which is initially in an inactive state 90, ie the bell filter has a gain of 0 dB and does not affect the audio signal.
  • FSM finite state machine
  • an active state 92 is entered, in which the bell filter is operated with full (negative) gain.
  • a change is made to a first reduction state 94 if the feedback frequency has not been determined again by then and the active state is therefore retained (feedback 96).
  • the bell filter has a reduced gain, for example 2/3 of full gain.
  • the feedback filter is thus operated with reduced effectiveness.
  • a second predetermined time Y has elapsed
  • a change is made to a second reduction state 98 if the feedback frequency has not been determined again by then and the active state is retained (feedback 96).
  • the time-dependent adaptation of the feedback filter is advantageous for several reasons. On the one hand, it ensures that a specific feedback frequency is filtered for a sufficiently long time. Feedback typically lasts for at least a few 100 ms, so long enough filtering is required to effectively suppress the feedback. In addition, due to the gradual reduction of the feedback filter, audible distortion of the audio signal is reduced.
  • the audio signal is filtered with a two-stage compressor to remove peak levels that can lead to audible distortion.
  • a first compressor stage is activated at a signal level above a first threshold and filters the audio signal with a first filter that reduces moderate level peaks with a low degree of compression (e.g. ratio 20, rise time 10 ms, decay time 100 ms).
  • the second compressor stage is activated at a signal level above a second threshold, which is greater than the first threshold.
  • the audio signal is then filtered with a second filter to remove extreme peaks particularly effectively. For this purpose, a stronger degree of compression is selected (e.g. ratio 1000, rise time 0.1 ms, decay time 5 ms).
  • the second compressor stage provides an emergency filter to ensure that all amplitude values are below a critical maximum value
  • the audio signal is bandpass filtered to remove potential spurious signals.
  • speech signal components are predominantly on the Frequency range between 70 and 8000 Hz is limited so that spectral components outside this frequency range can be filtered.
  • a double-cascaded second-order high-pass filter is preferably combined with a double-cascaded second-order low-pass filter as the band-pass filter.
  • the high-pass filter and the low-pass filter preferably each have an edge steepness of 24 dB per octave.
  • the limit frequencies are preferably in the range between 60 and 80 Hz (lower limit frequency) and between 8000 and 10000 Hz (upper limit frequency).
  • the Q values of the filters should extend over an octave and have values in the range of 1.4, for example.
  • step 30 the audio signal is filtered with a second compressor in order to reduce the dynamic range of the audio signal.
  • a second compressor in order to reduce the dynamic range of the audio signal.
  • a filter with a relatively mild degree of compression which is in particular lower than the degree of compression of the first compressor from step 28, is used as the compressor.
  • a low ratio can be selected, which should not exceed the value of three.
  • longer rise and decay times in the range of 0.5 and 1 second are preferably provided.
  • the audio signal is filtered with an equalizer to reduce spectral variations.
  • the equalizer is operated with eight bell filters whose center frequencies correspond to those of the octave band filters from 3 correspond, which are used to determine the octave spectral values.
  • the Q values of the bell filters are preferably set to cover about an octave each.
  • a separate amplification factor is provided for each bell filter, which is determined as a function of the octave spectral values 48 and predefined reference spectral values.
  • the reference spectral values correspond in their spectral resolution to the octave spectral values, so that each octave spectral value is assigned a reference spectral value.
  • the reference spectral values together form a reference spectral curve, the shape of which is correlated with a high level of speech intelligibility and can be determined, for example, by spectral evaluation of a large number of undisturbed speech signals, e.g. on the basis of a mean value of the octave-filtered spectrum.
  • Each octave spectral value is compared to an associated reference spectral value in order to determine an amplification factor which represents the deviation between the octave spectral value and the associated reference spectral value.
  • a relevant octave spectral value is below the associated reference spectral value, for example, an amplification factor for the bell filter of this spectral range is determined such that weighting the octave spectral value with the weighting factor at least approximately results in the reference spectral value.
  • the gain factors are adjusted in this way to bring the frequency spectrum of the audio signal into agreement with the reference spectral curve and thus reduce spectral variations within the audio signal and between different audio signals. For example, characteristics of different speakers or spectral influences are compensated for by different microphone positions in favor of high speech intelligibility.
  • the amplification factors are preferably limited above and below.
  • the change in the amplification factors over time is also limited.
  • the bell filters for filtering the audio signal in step 32 are preferably used only for filtering blocks that are classified as Speech section have been detected. Thus, the fitting of the spectrum to the reference spectral curve is limited to speech sections. Any distortions and inefficient use of computing resources are thus avoided.
  • the filtering with the equalizer or bell filters in step 32 can cause undesired variations in the signal level.
  • the audio signal is preferably weighted with a correction factor which is determined as the mean value of the sign-inverted weighting factors.
  • step 36 the audio signal is filtered with a pause filter in order to reduce the signal level in areas outside the detected speech sections, i.e. in speech pauses, and in this way to reduce background noise.
  • the speech sections detected in step 18 or 24 are used as filter data 52 .
  • Those sections of the audio signal which have not been detected as speech sections form noise sections which are filtered by the pause filter.
  • the audio signal is preferably weighted in the detected noise sections with a fixed negative gain factor of, for example -3 dB.
  • step 38 the audio signal is filtered with a further equalizer in order to compensate for the effects of the different filtering.
  • a filter bank consisting of 23 bell filters between 50 Hz and 10 kHz is preferably used for this purpose.
  • the filters preferably each extend over a third of an octave, with the Q value being adjustable to 4.3.
  • a fixed negative gain factor is preferably provided for each bell filter.
  • step 40 the audio signal can be analyzed for test purposes during a development phase. This option is purely optional and not necessary for a later application of the method in practice.
  • step 42 the now improved audio signal is first transformed into an analog signal by means of a digital-to-analog converter and then made available via an output interface. From there, the audio signal can be picked up for playback via a sound reinforcement system. It is also conceivable for the digital audio signal to be output instead of an analogue version, provided the sound reinforcement system has a digital signal input for the audio signal.
  • the audio device 102 has a housing 104 indicated schematically.
  • the external dimensions of the housing 104 are preferably no larger than a few centimeters, for example a maximum of 10 centimeters, so that the housing 104 is compact overall and is also particularly suitable for mobile applications.
  • the audio device 102 has an input interface 112 for receiving an analog audio signal and an output interface for outputting the enhanced audio signal.
  • the device also has a USB-C interface 110 and an Ethernet interface 108 .
  • the USB-C interface 110 can generally be embodied as a power supply interface for connecting to an external power supply. It does not necessarily have to be designed according to the USB-C standard.
  • one or more wireless interfaces can be provided in order to wirelessly receive audio signals and/or control signals and/or electrical energy from outside and/or transmit them to a receiver (not shown).
  • the input interface 112 and the output interface 106 are preferably each designed as XLR interfaces, so that conventional sound transducers can be connected directly to the audio device 102 via XLR connectors.
  • the audio device 102 can thus be used in particular in an in 7 shown arrangement in which the input interface 112 is connected to a microphone 134 for detecting an audio signal from a speaker, not shown. Furthermore, the output interface 106 is connected via an amplifier 130 to a loudspeaker 132 or a public address system with multiple loudspeakers in order to reproduce the audio signal enhanced by the audio device 102 . Speaker 132 and microphone 134 are in the same room, such as a conference room or the like. The signal improvement takes place in real time, so that the audio signal recorded with the microphone 134 can be played back essentially simultaneously via the loudspeaker 132 and thus ensures an acoustically advantageous amplification of the audio signal.
  • the audio device 102 also includes a manual interface 128, which is 6 is indicated only schematically and is generally set up to receive control data for the audio device 102 directly at the audio device 102 by manual input from a user.
  • the audio signal is first recorded with the microphone 134 and fed to a preamplifier 116 via the input interface 112 .
  • the audio signal then reaches the output interface 106 either via a processing unit 114 or directly.
  • the processing unit 114 can receive a specification from the outside via the interfaces 108, 110 and/or 128 that defines whether the audio signal is to be routed through the processing unit 114 and improved by it or not.
  • the arithmetic unit 114 can determine its functionality for executing the method for improving the audio signal by means of a self-diagnosis and set the switch position of the switching device 118 depending on the test.
  • switching device 118 can connect input interface 112 directly to output interface 106 via preamplifier 116, with switching device 118 only being switched over if processing unit 114 is fully functional, including the necessary power supply, in order to connect input interface 112 to processing unit 114 connect to.
  • This ensures that the audio signal can be tapped from the output interface 106 independently of any malfunction of the computing unit 114 and a failure of the energy supply.
  • the audio device 102 is therefore particularly well suited for professional use.
  • the preamplifier 116 can be operated with variable gain.
  • a respective amplification value can be set by the arithmetic unit 114 . This can, for example, be selected directly on the device 102 by means of the interface 128 from a predetermined number of different amplification values, for example three amplification values.
  • the selection of the amplification value can be conveyed visually to the operator by means of an illuminated display, eg by means of a number of LED diodes, on the audio device 102 .
  • By setting the preamplification appropriately large level variations can preferably already be compensated for in the analog signal, so that digital noise due to high amplification of the digital signal can be avoided.
  • the interface 110 is provided for the energy supply of the audio device 102, which can be connected to a mains source by means of an associated supply cable in order to operate the audio device 102 in mains operation.
  • the audio device 102 can be supplied via an energy store integrated in the housing 104, for example an electric battery 126.
  • the rechargeable battery 126 is coupled to the interface 110 and can be charged via it.
  • another type of interface can also be provided for the power supply.
  • the device 102 is preferably equipped with an electrical protective device 120, which protects the electrical consumers of the audio device 102 from voltage damage.
  • electrical protective device 120 which protects the electrical consumers of the audio device 102 from voltage damage.
  • These include in particular the computing unit 114, a fan 124 for cooling the computing unit 114 and a phantom power supply device 122 which is coupled to the input interface 112.
  • the phantom power device 122 is used to supply the microphone 134 connected to the input interface 112 with electricity, for example with a microphone supply voltage of 48 volts.
  • the phantom power device 122 has a voltage converter, not shown in detail, in order to convert the supply voltage of the audio device 102, which is provided via the USB-C interface 110, for example 5 volts, into the microphone supply voltage.
  • the computing unit 114 is preferably designed as a single-board computer, so that the audio device 102 can be designed to be compact from this point of view and can also be produced inexpensively.
  • the computing unit 114 is configured in particular via a bus interface 107, which is preferably of the USB-A type.
  • the interface 107 is connected to a server or directly to a mobile terminal device connected (not shown) to access the computing unit 114 from the outside and optionally one or more configuration parameters for the method of 1 (e.g. threshold values, rise and decay times).
  • a configuration via the USB-C interface 110 is also conceivable.
  • USB stick or the like it is possible to connect a USB stick or the like to the interface 107, with the desired configuration data or new firmware being stored on the USB stick.
  • the data are then transmitted to the processing unit 114 automatically or after initiation by an operator via the interface 107 in order to update the configuration parameters or the firmware accordingly. This process can be performed by an end user of the device.
  • a detailed configuration of filter parameters by the end user is preferably not required. All the necessary configuration parameter values are already stored in an internal memory of the computing unit (not shown), so that the method ensures good results fully automatically under almost all usual acoustic environmental conditions.
  • the configuration parameter set can be adjusted remotely or locally via the interface 107 by a trained expert, for example. This means that there is no setup effort for the end user.
  • For commissioning in the application of 7 it is only necessary to connect the audio device 102 to the microphone 134 and the loudspeaker 132 via the interfaces 112 and 106 provided. Then the audio device 102 can be used directly in terms of a plug-and-play functionality. If battery operation is not desired, the audio device 102 is connected via the USB-C interface 110 connected to a power source (not shown) to power the audio device 102 electrically.
  • the audio device 102 also has a manual operating interface 113 (eg with a manually operable button) and a visual display device 109 (eg an LED).
  • a user of the audio device 102 can control a recording of the audio signal provided at the output interface 106 via the operating interface 113 .
  • the user first connects a USB stick or the like to the interface 107.
  • the USB stick is detected by the computing unit 114 and the user is shown on the display device 109 by activating a first display mode that the audio device 102 is ready to record.
  • the user interface 113 is then actuated in order to store the audio signal (in its digital form) in the USB stick.
  • the display device 109 indicates the successful start of the recording by activating a second display mode (eg flashing LED).
  • the audio signal is then stored continuously in a file on the USB stick.
  • the recording will stop automatically. This is indicated to the user by activating a third display mode on the display device 109 .
  • the recording can optionally be ended prematurely by operating the user interface 107 again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
  • Circuit For Audible Band Transducer (AREA)

Claims (12)

  1. Procédé d'amélioration d'un signal audio, en particulier en temps réel, le procédé comprenant au moins les étapes suivantes :
    - Réception d'un signal audio ayant plusieurs valeurs d'amplitude, le signal audio comprenant au moins des portions de parole ;
    - Détection de portions de parole du signal audio (18, 24) ;
    - filtrer le signal audio avec au moins un filtre de niveau (16) pour réduire les variations de niveau de signal du signal audio dans les portions de parole détectées ;
    - déterminer une fréquence de rétroaction (50) représentant une rétroaction du signal audio ;
    - filtrer le signal audio avec un filtre de Larsen (34) sur la base de la fréquence de Larsen déterminée (50) afin de réduire les composantes spectrales du signal audio représentant le Larsen ; et
    - filtrer le signal audio avec au moins un filtre d'égalisation (32) pour réduire les variations spectrales du signal audio dans les portions de parole détectées, le filtrage avec ledit au moins un filtre d'égalisation (32) comprenant:
    - déterminer des valeurs spectrales approximatives (48) sur la base de valeurs spectrales fines (56) du signal audio, les valeurs spectrales approximatives (48) représentant les valeurs spectrales fines (56) avec une résolution spectrale plus faible que les valeurs spectrales fines (56) ;
    - déterminer des premiers poids d'égalisation représentant un écart des valeurs spectrales grossières (48) par rapport à des valeurs spectrales de référence prédéterminées ;
    - pondérer le signal audio avec les premiers poids d'égalisation afin d'amener les valeurs spectrales du signal audio en conformité avec les valeurs spectrales de référence ;
    dans lequel la détermination de la fréquence de rétroaction (50) comprend :
    - déterminer un sous-ensemble de valeurs spectrales du signal audio qui violent un seuil spectral prédéterminé (74) ;
    - déterminer une pluralité de premières valeurs de paramètres spectraux sur la base du sous-ensemble, chacune des premières valeurs de paramètres spectraux représentant une relation prédéterminée entre une valeur spectrale associée du sous-ensemble et au moins une valeur spectrale adjacente temporellement et/ou spectralement (76) ; et
    - déterminer la fréquence de rétroaction (50) sur la base de la pluralité de premières valeurs de paramètres spectraux (78, 80, 82, 84).
  2. Procédé selon la revendication 1,
    comprenant en outre la détermination de plusieurs valeurs spectrales (56) sur la base des valeurs d'amplitude (54), les valeurs d'amplitude (54) représentant le signal audio dans un domaine temporel et les valeurs spectrales (56) représentant le signal audio dans un domaine fréquentiel, et la détection des parties de parole (18, 24), le filtrage avec le au moins un filtre de niveau (16) et/ou le filtrage avec le au moins un filtre d'égalisation (32) étant effectués sur la base des valeurs d'amplitude (54) et/ou des valeurs spectrales (56).
  3. Procédé selon la revendication 1 ou 2,
    dans lequel la détection des portions de parole (18, 24) comprend :
    - la détermination d'au moins une première valeur de paramètre d'énergie sur la base des valeurs d'amplitude (54), la première valeur de paramètre d'énergie représentant une énergie moyenne du signal audio pour plusieurs des valeurs d'amplitude (54) ;
    - déterminer au moins une deuxième valeur de paramètre spectral sur la base de valeurs spectrales (56) du signal audio, ladite au moins une deuxième valeur de paramètre spectral représentant une structure spectrale harmonique du signal audio pour une pluralité desdites valeurs spectrales (56) ; et
    - détecter une partie du signal audio en tant que partie vocale lorsque la au moins une première valeur de paramètre d'énergie viole une première valeur de seuil de paramètre d'énergie et/ou la au moins une deuxième valeur de paramètre spectral viole une valeur de seuil de paramètre spectral (62, 64), en particulier dans lequel la valeur seuil de paramètre d'énergie et/ou la valeur seuil de paramètre spectral sont adaptées en fonction du temps.
  4. Procédé selon l'une quelconque des revendications précédentes,
    dans lequel le filtrage du signal audio avec le au moins un filtre de niveau (16) comprend :
    - déterminer au moins une valeur de paramètre de niveau sur la base des valeurs d'amplitude (54), la valeur de paramètre de niveau représentant un niveau moyen du signal audio pour une section de parole détectée ;
    - déterminer au moins un poids de compensation sur la base de la au moins une valeur de paramètre de niveau ;
    - pondérer le signal audio avec l'au moins un poids de compensation afin de réduire les variations de niveau de signal du signal audio.
  5. Procédé selon la revendication 4,
    dans lequel ladite au moins une valeur de paramètre de niveau comprend des première et deuxième valeurs de paramètre de niveau pour une pluralité de sections de parole détectées, dans lequel lesdites premières valeurs de paramètre de niveau représentent le niveau moyen du signal audio avec une première résolution temporelle, dans lequel lesdites deuxièmes valeurs de paramètre de niveau représentent le niveau moyen du signal audio avec une deuxième résolution temporelle, dans lequel ladite deuxième résolution temporelle est supérieure à ladite première résolution temporelle, et dans lequel ladite au moins une pondération de compensation est déterminée sur la base desdites première et deuxième valeurs de paramètre de niveau,
    en particulier dans lequel les premières valeurs de paramètre de niveau sont formées par des valeurs moyennes d'énergie et/ou des premières valeurs de volume et les secondes valeurs de paramètre de niveau sont formées par des secondes valeurs de volume.
  6. Procédé selon la revendication 4 ou 5,
    dans lequel ledit au moins un poids de compensation comprend des premiers poids de compensation et des seconds poids de compensation, lesdits premiers poids de compensation étant déterminés pour réduire les variations de niveau de signal ayant au moins un niveau supérieur à un seuil de niveau prédéterminé,
    dans lequel les deuxièmes poids de compensation sont déterminés pour ajuster le niveau de signal du signal audio à une valeur prédéterminée.
  7. Procédé selon l'une quelconque des revendications précédentes,
    dans lequel le filtrage avec ledit au moins un filtre d'égalisation (32) comprend: pondérer le signal audio avec des deuxièmes poids d'égalisation (38), les deuxièmes poids d'égalisation étant prédéterminés.
  8. Procédé selon l'une quelconque des revendications précédentes, comprenant en outre :
    le filtrage du signal audio avec au moins un compresseur (28, 30) afin de réduire une plage dynamique du signal audio,
    en particulier dans lequel il est prévu pour l'au moins un compresseur (28, 30) plusieurs jeux de paramètres différents les uns des autres, qui sont sélectionnés en fonction d'une amplitude du signal audio et qui servent de base au filtrage avec l'au moins un compresseur, les plusieurs jeux de paramètres se différenciant les uns des autres par un degré de compression.
  9. Procédé selon l'une quelconque des revendications précédentes,
    dans lequel, lorsque la fréquence de rétroaction déterminée (50) disparaît entre des périodes de temps successives du signal audio, l'efficacité du filtre de rétroaction (34) est réduite progressivement (94, 98, 100) sur plusieurs pério des de temps.
  10. Procédé selon l'une quelconque des revendications précédentes, comprenant en outre :
    - filtrer le signal audio avec un filtre de pause (36) pour réduire le signal audio dans des zones situées à l'extérieur des sections de parole détectées ; et/ou
    - filtrer le signal audio avec un filtre de bruit (22) pour réduire le signal audio dans des zones ayant des valeurs d'amplitude qui violent un seuil de bruit prédéterminé ; et/ou
    - filtrer le signal audio avec un filtre passe-bande (26), une fréquence de coupure inférieure du filtre passe-bande étant de préférence comprise dans une plage de 50 à 100 Hz, et une fréquence de coupure supérieure du filtre passe-bande étant de préférence comprise dans une plage de 8000 à 10 000 Hz.
  11. Dispositif pour améliorer un signal audio, en particulier en temps réel, le signal audio comprenant de la parole,
    dans lequel le dispositif (102) comprend :
    - au moins une interface d'entrée (112) pour détecter un signal audio, l'interface d'entrée (112) comprenant une connexion pour un microphone (134) ;
    - au moins une interface de sortie (106) pour émettre le signal audio, l'interface de sortie (106) ayant une connexion pour un appareil de reproduction audio (130, 132) ; et
    - une unité de calcul (114) pour mettre en oeuvre un procédé d'amélioration du signal audio selon l'une quelconque des revendications précédentes.
  12. Dispositif selon la revendication 11,
    comprenant en outre :
    - un préamplificateur (116) pour le signal audio, le préamplificateur (116) pouvant être couplé à l'interface d'entrée (112) ; et/ou
    - une alimentation électrique (122) pour l'interface d'entrée (112) ; et/ou
    - un dispositif de commutation (118) qui peut être couplé à l'interface d'entrée (112), à l'interface de sortie (106) et à l'unité de calcul (114) ; et/ou
    - un dispositif de refroidissement (124) ;
    et/ou dans lequel l'unité de calcul (114) comprend un ordinateur monocarte; et/ou
    le dispositif (102) comprenant un boîtier (104) et/ou
    au moins une interface de communication externe (108, 110).
EP21190351.3A 2021-08-09 2021-08-09 Procédé et dispositif d'amélioration du signal audio Active EP4134954B1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21190351.3A EP4134954B1 (fr) 2021-08-09 2021-08-09 Procédé et dispositif d'amélioration du signal audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP21190351.3A EP4134954B1 (fr) 2021-08-09 2021-08-09 Procédé et dispositif d'amélioration du signal audio

Publications (3)

Publication Number Publication Date
EP4134954A1 EP4134954A1 (fr) 2023-02-15
EP4134954C0 EP4134954C0 (fr) 2023-08-02
EP4134954B1 true EP4134954B1 (fr) 2023-08-02

Family

ID=77264991

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21190351.3A Active EP4134954B1 (fr) 2021-08-09 2021-08-09 Procédé et dispositif d'amélioration du signal audio

Country Status (1)

Country Link
EP (1) EP4134954B1 (fr)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6295364B1 (en) * 1998-03-30 2001-09-25 Digisonix, Llc Simplified communication system
US7813931B2 (en) * 2005-04-20 2010-10-12 QNX Software Systems, Co. System for improving speech quality and intelligibility with bandwidth compression/expansion
GB2432091B (en) * 2005-10-20 2009-06-17 Protec Fire Detection Plc Improvements to a public address system having zone isolator circuits
EP2118885B1 (fr) * 2007-02-26 2012-07-11 Dolby Laboratories Licensing Corporation Enrichissement vocal en audio de loisir
GB2520048B (en) * 2013-11-07 2018-07-11 Toshiba Res Europe Limited Speech processing system
JP6386237B2 (ja) * 2014-02-28 2018-09-05 国立研究開発法人情報通信研究機構 音声明瞭化装置及びそのためのコンピュータプログラム

Also Published As

Publication number Publication date
EP4134954C0 (fr) 2023-08-02
EP4134954A1 (fr) 2023-02-15

Similar Documents

Publication Publication Date Title
DE69933141T2 (de) Tonprozessor zur adaptiven dynamikbereichsverbesserung
DE60222813T2 (de) Hörgerät und methode für das erhöhen von redeverständlichkeit
DE60120949T2 (de) Eine hörprothese mit automatischer hörumgebungsklassifizierung
DE60027438T2 (de) Verbesserung eines verrauschten akustischen signals
DE102006051071B4 (de) Pegelabhängige Geräuschreduktion
DE102006047965A1 (de) Hörhilfsgerät mit einer Okklusionsreduktionseinrichtung und Verfahren zur Okklusionsreduktion
DE10017646A1 (de) Geräuschunterdrückung im Zeitbereich
DE112011105908B4 (de) Verfahren und Gerät zur adaptiven Regelung des Toneffekts
EP2595414B1 (fr) Dispositif auditif avec un système pour réduire un bruit de microphone et procédé de réduction d'un bruit de microphone
EP1369994A2 (fr) Méthode d'augmentation de niveau des basses fréquences conforme a l'ouie et système de reproduction correspondant
EP2080197B1 (fr) Dispositif d'élimination du bruit dans un signal audio
DE102006019694B3 (de) Verfahren zum Einstellen eines Hörgeräts mit Hochfrequenzverstärkung
WO2001047335A2 (fr) Procede pour eliminer des composantes de signaux parasites dans un signal d'entree d'un systeme auditif, mise en oeuvre dudit procede et appareil auditif
EP3373599A1 (fr) Procédé de restriction de fréquence d'un signal audio et dispositif auditif fonctionnant selon ledit procédé
EP4134954B1 (fr) Procédé et dispositif d'amélioration du signal audio
DE60303278T2 (de) Vorrichtung zur Verbesserung der Spracherkennung
EP1416764B1 (fr) Procédé d'établissement des paramètres d'une prothèse auditive et dispositif pour la mise en oeuvre du procédé
DE102020107620B3 (de) System und Verfahren zur Kompensation des Okklusionseffektes bei Kopfhörern oder Hörhilfen mit verbesserter Wahrnehmung der eigenen Stimme
EP1351550A1 (fr) Procédé d'adaptation d'une amplification de signal dans une prothèse auditive et prothèse auditive
EP2190218B1 (fr) Système d'ensemble de filtres avec atténuation de bande affaiblie spécifique pour un dispositif auditif
DE102012204193B4 (de) Audioprozessor und Verfahren zum Verstärken oder Dämpfen eines empfangenen Audiosignals
EP3048813B1 (fr) Procédé et dispositif de suppression du bruit basée sur l'inter-corrélation de bandes secondaires
DE102004044565B4 (de) Verfahren zur Begrenzung des Dynamikbereichs von Audiosignalen und Schaltungsanordnung hierzu
EP2648424B1 (fr) Procédé de limitation du niveau de sortie pour des appareils auditifs
DE10159928A1 (de) Verfahren zum Vermeiden rückkopplungsbedingter Oszillationen in einem Hörgerät sowie Hörgerät

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 502021001146

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0021031600

Ipc: H04R0027000000

Ref legal event code: R079

Ipc: H04R0027000000

17P Request for examination filed

Effective date: 20220719

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 25/78 20130101ALI20230210BHEP

Ipc: G10L 21/0364 20130101ALI20230210BHEP

Ipc: H04R 27/00 20060101AFI20230210BHEP

INTG Intention to grant announced

Effective date: 20230313

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 502021001146

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: GERMAN

RAP4 Party data changed (patent owner data changed or rights of a patent transferred)

Owner name: OPTIMIC GMBH

U01 Request for unitary effect filed

Effective date: 20230901

U07 Unitary effect registered

Designated state(s): AT BE BG DE DK EE FI FR IT LT LU LV MT NL PT SE SI

Effective date: 20230907

U20 Renewal fee paid [unitary effect]

Year of fee payment: 3

Effective date: 20231005

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231103

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231102

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231202

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231103

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

U1N Appointed representative for the unitary patent procedure changed [after the registration of the unitary effect]

Representative=s name: PAVANT PATENTANWAELTE PARTGMBB; DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230802

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A