CN108028048B

CN108028048B - Method and apparatus for correlating noise and for analysis

Info

Publication number: CN108028048B
Application number: CN201680048720.2A
Authority: CN
Inventors: 托马斯·施波雷尔; 托比亚斯·克劳斯; 尤迪特·利贝特劳; 萨拉·开普林格; 汉纳·卢卡舍维奇; 迪特马尔·开普林格
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2015-06-30
Filing date: 2016-06-30
Publication date: 2022-06-21
Anticipated expiration: 2036-06-30
Also published as: JP6654209B2; KR102087832B1; JP6602406B2; KR20180022967A; US11003709B2; EP3317879A1; CA2990888A1; WO2017001611A1; KR20180025921A; KR102137537B1; US20180121540A1; JP2018528453A; US11880407B2; CA2990891A1; CN108028047B; US20180122398A1; JP2018525664A; EP3317878A1; EP3317878B1; CN108028048A

Abstract

Embodiments of the present invention provide a method for correlating noise of at least one signal class (e.g., interference noise) of a plurality of signal classes (e.g., interference noise and non-interference noise). The method comprises the following steps: "receiving the ambient noise" and "establishing whether the ambient noise or a set of parameters derived from the ambient noise satisfies a predefined rule describing a signal class of a plurality of signal classes". Starting from this, the following steps are carried out: "logging meets a predefined rule", "recording the ambient noise received for the migration time window", "deriving a parameter set from the ambient noise of the migration time window and storing the parameter set", or "issuing an activation signal for another device to associate the noise".

Description

Method and apparatus for correlating noise and for analysis

Technical Field

Embodiments of the present invention relate to a method and apparatus for associating/relating noise to/from at least one signal class of a plurality of signal classes, and to a method and apparatus for analysing noise of at least one signal class of a plurality of signal classes.

Background

For example, noise may be subdivided into signal classes, such as interference noise and non-interference noise. For example, a subdivision of more interfering noise and less interfering noise is also conceivable.

It is not always easy to classify interference noise. It is important to know that there are different factors that affect whether the noise is perceived as interference noise. Birds are not subjectively perceived as disturbing noise even when their chirps are large (objective measurable parameter: sound pressure level) and significantly different from other environmental noises (objective measurable parameter: dynamics). However, a significantly quieter aircraft pass will be perceived by more test personnel as disturbing noise than the bird just mentioned.

The result here is that, for example, using current methods, in order to allow a predictive noise assessment, the assessment must be left to the test person when it comes to interfering noise investigation environments, such as the health area of a hotel, hotel or workplace.

For example, a purely automated evaluation with respect to absolute loudness or volume or increase with respect to level is used as a first cue, but not sufficient for a final evaluation. Accordingly, improved methods are needed.

Disclosure of Invention

Main aspects of the invention

It is an object of another aspect to provide a concept that allows for identifying and analyzing signal classes of noise, such as interference noise and subjectively perceived interference noise.

This object is achieved by the independent claims.

Embodiments of the present invention provide a method for correlating noise of at least one signal class (e.g., interference noise) of a plurality of signal classes (e.g., interference noise and non-interference noise). The method comprises the steps of "receiving ambient noise" and "establishing whether the ambient noise or a set of parameters derived from the ambient noise satisfies a predefined rule describing a signal class of a plurality of signal classes". Thereby, the following steps are performed: "logging in that a predefined rule has been met", "recording the ambient noise received for the migration time window", "deriving a set of parameters from the ambient noise of the migration time window and storing the set of parameters", or "issuing an activation signal for another device to associate noise of at least one signal class".

Embodiments of this aspect are based on the following findings: from the database as determined for the corresponding other aspect (see below), it is possible to identify the presence of subjectively perceived disturbing noise, or to associate noise with a category, e.g. by comparing the current noise environment with noise from the database or parameters obtained from or stored in the database, such as audio fingerprints. The method can be performed in an automated manner and the stored database alone can be used to allow the assessment of the predicted noise conditions (bird singing versus air conditioning) without any subjective assessment by humans.

Identifying a rule match may illustratively be done by comparing the ambient noise to previously buffered ambient noise, or by comparing a currently derived parametric data set (audio fingerprint) to a previously determined parametric data set, or by deriving a psycho-acoustic parameter and comparing it to its predetermined threshold.

Another embodiment relates to an apparatus for correlating noise of at least one of a plurality of signal classes. The device comprises a microphone for continuously listening to the current ambient noise, a processor for comparing the current ambient noise with data stored in a database (recording the interference noise or parameters describing the interference noise) and an interface for outputting information once the interference noise has been identified in the current environment. Here, the data (e.g. a previously determined recording) and the previously determined audio fingerprint or the previously established threshold value of the psychoacoustic parameter may be stored internally or, according to a further embodiment, read externally using, for example, a database that has been determined according to further aspects.

From these identified objective interference noise or signal classes, the information can be further processed individually or in combination with an indication of time, an indication of location or classification of the interference noise into one of the classes (corresponding interference group: slight interference, high interference). According to a preferred embodiment, this information is output to an external database.

Since in this embodiment only an evaluation of the location or the one location is provided, or according to further embodiments it is also conceivable to extend the evaluation to several locations, such as several locations in a space or an outdoor environment, i.e. several neighboring locations (e.g. distributed over a city). Thus, a further embodiment provides a method wherein the steps of "recording", "comparing" and "outputting" are received for two adjacent positions. When there is information of two adjacent locations, for example, a relation between the recordings of the first location and the second location may be determined in order to determine a movement, a spatial extension or a direction of the subjectively perceived disturbing noise.

According to another embodiment, similar to identifying the interference noise, it is also conceivable to identify a different sequence (e.g. control instructions) which is used to output a corresponding control signal. Here, the recording associated with the control signal may be a voice command, or as previously described, a sound signal classified as interference noise. For example, the control signal is output by a device that itself performs the method, for example to start recording, or an external device, such as another device disposed at a different location that is switched to the recording mode by the control signal.

According to further embodiments, the device outlined above may further comprise a communication interface for communicating with a database to read previously determined interference noise or parameters, or for outputting information about interference noise. According to a further embodiment, it is also possible for a device to communicate with another device using the communication interface, so that interference noise can be obtained and/or analyzed for two adjacent locations.

Embodiments of sub-aspects provide a method for analyzing noise of a signal class. The method comprises the step of continuously recording the current ambient noise at a first location and a second location. Recording here in turn means recording the ambient noise directly or deriving it from a set of parameters related to the ambient noise, such as an audio fingerprint or psycho-acoustic parameters. In addition, for each recording, a comparison with a previously obtained recording of the subjectively perceived interference noise or with parameters describing the interference noise is performed in order to identify the interference noise for each position (first position and second position). The relationship between the recordings can be determined from the two recordings (first recording and second recording) comprising one interference noise at different positions, so that the generated interference noise can be analyzed more accurately, for example with respect to its position, extension or movement.

Embodiments of this aspect are based on the following findings: using two recorded relations of the same interference noise at two different locations it is possible to obtain extended information about the interference noise itself. Here, the interference noise in the respective environment (i.e. at the first and second locations) is first identified and, when identified, correlated with each other. Advantageously, it is possible here to obtain information about the movement of the interference noise or the spreading of the interference noise or the propagation direction of the interference noise. In addition, it is also possible to distinguish between local interference noise (i.e. at only one location) and global events (i.e. events occurring at several locations). Using this method it is possible to identify the propagation of characteristic noise events and their movements.

According to an embodiment, the step of determining the relationship between the first record and the second record is done by analyzing a level difference between the first record and the second record. Alternatively or in addition, it is also possible to establish a time offset in the step of determining the relationship, i.e. a delay time or run time offset between events in two recordings established at two different locations. In addition, the two recordings can also be evaluated with respect to differences in frequency and hall effect. Using all these analysis parameters it is possible to determine the distance between the noise source and the recording location, since the sound generally decreases with increasing distance and/or there is a frequency shift such that the upper frequencies are cancelled out.

According to a further embodiment, the method comprises analyzing the audio event or the respective source with respect to the distance between the first location and the second location, analyzing with respect to the movement of the subjective disturbance noise source and/or comparing with respect to the number of subjective disturbance noise sources. For example, the three analyses are based on evaluating the relationship between the first record and the second record, i.e. comparing the above-mentioned factors.

It is to be mentioned in this respect that the migration time window is preferably used for continuous recording. Further, as in the above-described aspect, it is also conceivable to read the noise to be compared from the outside.

It is noted here that the method can of course be extended to the third position.

In an embodiment according to this aspect, when the interfering signal has been determined at the first location, recording may be started at the second location to allow temporal analysis of the propagating interfering signal.

Another embodiment relates to a system for analyzing a signal of a signal class. The system comprises two units with one microphone, each for continuously recording the current ambient noise. The two units may be positioned at different locations, such as adjacent locations. Here, "recording" again means directly recording the ambient noise and deriving the ambient noise from the parameters (e.g. audio fingerprint). Furthermore, the system comprises at least one processor, which may be integrated in the first unit or the second unit and which is configured to identify noise by comparing the first recording and the second recording of the first and second unit with previously obtained audio fingerprints of signals of at least one recording/signal class or parameters of signals describing signal classes. Additionally, the processor is configured to establish a relationship between the first record and the second record.

According to an embodiment, the two units may be connected to each other via a communication interface, such as a radio interface.

According to a further embodiment, a computer program for performing one of the methods described above is provided.

Further developments are defined in the dependent claims.

Additional aspects

Embodiments of the present invention provide a method for generating a database from buffered records. The method comprises the steps of "receiving ambient noise", which illustratively comprises interference noise, and "buffering ambient noise for a transition time window", such as 30 or 60 seconds, for example or preferably more than 5 seconds. Alternatively, it is also conceivable that the method comprises the steps of "deriving a parameter set relating to the ambient noise" and "buffering a parameter set of the migration time window". Buffered ambient noise or buffered parameter sets are commonly referred to as recordings. In addition, the method comprises the step of "obtaining a signal" identifying a signal class (such as interference noise) of a plurality of signal classes (interference noise and non-interference noise) in the ambient noise. The third basic step is to "store the buffered record in response to a signal" in a memory such as an internal or external memory. These steps of obtaining and storing are repeated to build up a database comprising a plurality of buffered records of the same signal class.

Embodiments of the present invention are based on the following findings: using a device that continuously records and stores the relevant positions in the environment, a database may be built up in which the recordings or characteristics of the recordings (such as audio fingerprints or psycho-acoustic parameters) are stored, enabling such sound sequences to be identified at a later point in time. Here, the concept assumes, for example, that the step of "identifying subjective interference noise or a class of noise" is performed by a person using a button or key or another input interface to identify or mark the interference noise or signal class. This signal is used as an indicator to cut off the sequence or extract features from the current continuous recording and store them in a memory with a database to be formed. It is thus possible to easily build a library of interference noise or classifiers for unambiguously correlating sound description parameters, which then allows for a prediction of the subjective noise perception.

According to an embodiment, subjective disturbing noise may be described by parameters including individual parameters like volume, dynamic, range, dynamic increase, spectrum, monotone or repeated characters (e.g. audio fingerprints), or by psychoacoustic parameters like sharpness, pulse characteristics, roughness, pitch, varying intensity or volume. Thus, according to a further embodiment, the method comprises the step of determining a buffered recorded audio fingerprint or determining psycho-acoustic parameters. Usually, it is sufficient to store the recordings or audio fingerprints in a database, while the psycho-acoustic parameters represent additional information. An advantage of audio fingerprinting is that the recording is stored in an anonymous way.

In a single step of obtaining a signal from a user interface, such as a key press, another alternative or additional signal may also be obtained that subjectively evaluates the currently identified control noise. This subjective evaluation involves assigning the audio signal to a signal class (e.g., less interference or more interference). Here, the subjective assessment is stored in association with the corresponding part or parameter.

According to further embodiments, a timestamp may also be stored in conjunction with the portion or parameter. According to further embodiments, it is also conceivable to store the current position information by means of a GPS receiver, for example. It is also feasible to store the data to be buffered in a data-reduced manner in order not to make the database too large.

It is noted here that according to an embodiment, the memory or database is directly contained in the respective device performing the method, or may be provided externally according to another embodiment.

Further embodiments relate to a corresponding apparatus. The device comprises a microphone for continuous recording, a buffer for buffering, an interface for receiving the signal and a further memory for storing recordings belonging to a signal class (audio file, audio fingerprint or psychoacoustic parameters) related to the identified disturbing noise. According to further embodiments, the device may comprise an input interface (e.g. a key) with which the presence of subjective disturbance noise may be confirmed or noise may be generally assigned to a signal class. For example, the input device may be expanded by classifying one of several signal classes (i.e., by evaluation). According to still further embodiments, the device may further comprise a communication interface through which an external memory (external database) is connected.

According to a further embodiment, a computer program for performing one of the described methods is provided.

Drawings

Embodiments of the invention will be described in detail below with reference to the attached drawing figures, wherein:

FIG. 1a is a flow chart depicting a method of "building a database" according to aspect 1 in a basic variant;

FIG. 1b is a flow chart for depicting an expansion method according to aspect 1;

fig. 1c to 1f show variants of the apparatus of aspect 1;

fig. 2a is a flow chart of a method for depicting a corresponding basic variant of aspect 2 "identifying noise of a signal class";

FIG. 2b is a flow chart of an expanded embodiment of aspect 2;

FIG. 2c is a schematic block diagram of the apparatus of aspect 2;

FIG. 3a is a flow chart depicting a method of aspect 3 "analyzing noise of various signal classes";

fig. 3b is a schematic block diagram of the apparatus of aspect 3.

Detailed Description

Before discussing embodiments of the present aspect in more detail below, it is pointed out that elements and structures of the same effect have the same reference numerals, so that the description thereof is mutually applicable or interchangeable.

Fig. 1a shows a method 100 for building a database, comprising the steps of "receiving and recording 110 and signal reception using a microphone 11" 120. When the signal 120 has been received (see where decision 125), the record of step 110 is stored in a database, shown using step 130. This step 130 essentially represents the end of the method 100 (see end point 135).

With respect to the step of "recording 110" it is noted that there may typically be coding sub-steps at the time of recording. The encoding may also be implemented such that a so-called audio fingerprint is obtained, i.e. a derivation of the characteristic parameters for recording. The audio fingerprint is strongly compressed and thus anonymized when compared to the recording, wherein the audio fingerprint still allows the use of the audio fingerprint to identify comparable noise, i.e. noise of the same class. In general, an audio fingerprint may be described such that it is a representation of an audio signal that represents all of the essential features of the audio signal, enabling subsequent classification. Audio fingerprints are typically insufficient to allow decoding to form a true audio signal and thus privacy is protected. Similarly, or in parallel to the encoding, there may be a sub-step of deriving parameters describing the recording, such as psychoacoustic parameters.

The recording process 110 may also be described as a circular buffer, as recordings will typically be repeatedly overwritten and therefore only buffered for a predetermined period, such as for example 120 seconds, 60 seconds or 30 seconds, or typically more than 5 seconds. The circular buffer also provides the advantage of meeting privacy requirements. When the signal 120 is obtained, this time window of the ambient noise of the previous cycle is stored or eventually stored in another memory (e.g., a database) so that it is available later, using step 130. To efficiently build a database, the method 100 is repeatedly performed for several signals of one or different signal classes.

The method 100 is used to build a database in which subjective interference noise received (i.e. recorded) by the microphone 11 is identified. When the user has identified disturbing noise in the environment, the identification is done using a step performed by the user, which uses the keys 12 (or generally the user input interface 12) to exemplarily perform the "signal 120 output" step. Since the microphone 110 listens for ambient noise and buffers it in step 110, these interference noises are also recorded so that the buffered recording or a part thereof can be stored in a permanent memory for building a database (see step 130). In case the user does not recognize the disturbing noise, the method will be repeated, using the arrow from the subjective evaluation (decision element 125) to the starting point 101 to illustrate the procedure.

The advantage of this approach is that in this way a sufficiently wide database can be built, which comprises a plurality of records or parameters (such as audio fingerprints) associated with subjectively perceived disturbing noise.

It is noted here that this procedure may cause a dependency of the time point of the signal on the time window. Illustratively, the start of a time window at the time of the signal is a fixed distance, such as 30 seconds or 60 seconds, before the time of the signal, which may cause a correlation. In addition, the end of the time window may also depend on the time of the signal, such that the time of the signal coincides with the end of the time window, or there is a time distance of 5 seconds (ends before the time of the signal), for example. Typically, the correlation is chosen such that the recording time window will always precede the time of the signal, which may also be within the time window.

FIG. 1b shows an extension method 100' that also allows for building a database, but with extended information. The method 100' is generally based on the method 100 and is limited in its course by a start 101 and an end 135. Thus, the method 100' further comprises the following basic steps: the recording 110', receiving 120' is related to subjective noise assessment or generally to the assignment of the received signal to a signal of one of a plurality of signal classes (e.g. non-interfering noise, slightly interfering noise and highly interfering noise), as well as the storage of a buffer record 130 as with a database. In addition, steps 130 and 120' are connected through decision point 125.

In this embodiment, the recording step 110' is subdivided into two sub-steps, namely 110a ' and 110b '. For example, step 110a involves calculating psycho-acoustic parameters such as coarseness, clarity, volume, pitch, and pulse characteristics and/or variation intensity. Step 110b is generalized to determine the audio fingerprint describing the recording so that the audio fingerprint identification characteristic feature can be used again later.

There may be different input means to perform the step 120' of subjective noise assessment. These are "evaluation using keys or buttons on the device performing the method 100 '(reference numeral 12 a')", correlating subjective noise evaluations using questionnaires (reference numeral 12b '), or evaluating using smart devices (reference numeral 12 c'). To perform the step of subjective noise assessment 120', the three assessment variants 12a', 12b 'and 12c' may be used individually or in combination. Once the presence assessment (see decision point 125), the psycho-acoustic parameters (reference numeral 110a '), and/or the audio fingerprint (reference numeral 110b') are stored in memory, which is shown in step 130.

According to further embodiments, time and/or location information may be added in addition to the pure parameters or the fingerprint or part of the audio recording. These are also stored in step 130 and originate from a further step 132, which correspondingly comprises: a current location is determined and/or a current time is determined.

When the database has been built and has a corresponding size (see step 130), it may be evaluated by correlation or statistical evaluation as shown in step 132.

A typical application of the above-described methods 100 and 100' is, for example, a situation where the device is located in a hotel room and is monitoring the current ambient noise. When a guest of a hotel wants to be quiet and restful in the room of his hotel, but the disturbing noise prevents him from doing so, he or she can flag these disturbing noises. This may have the result that the sound of the room may not be too loud, but there may be some noise, such as air conditioning, which prevents the guest from falling asleep. Using the device, he or she may perform a subjective evaluation, i.e. classification into signal classes such as "interference", "very interference" or "high interference". The evaluation characterizes the noise situation evaluated using different parameters. Finally, the audio fingerprint, the psychoacoustic parameters, or in general the record associated with one of the signal classes is stored in a database.

Three variations of the device are discussed below with reference to fig. 1c, 1d and 1 e.

Fig. 1c shows a first device variant, namely a device 20, which is connected via an interface or radio interface to an actual signal processing unit (not shown) and is basically configured to emit a signal for identifying interfering signals or specific signal classes. Here, the device 22 in the present embodiment includes two

keys

24a and 24b that can perform subjective evaluation. These

keys

24a and 24b are associated with different signal classes.

The device 20 may illustratively be a smart device (e.g., tablet, smart watch, smart phone) that includes

virtual keys

24a and 24b integrated into an application. The application may also illustratively contain a questionnaire by which other information of a general quality may be collected from the user (e.g., hotel guest).

When the

button

24a or 24b is operated, a method of buffering ambient noise or deriving parameters and then actually storing is performed in the actual data collection apparatus. For example, the external device may be a server having microphones at respective monitoring locations.

Fig. 1d shows another variant, in which an internal microphone 26 for receiving ambient noise is integrated in the device 20' comprising the

buttons

24a and 24 b. Additionally or alternatively, the external microphone 26e may be connected to the device 20' via an interface.

Fig. 1e shows another variant of the device 20 "which no longer comprises buttons as input means, but which only comprises an internal microphone 26 or an optional or alternative external microphone 26e, and can be controlled using this voice command which can be associated with ambient noise of the signal class.

With reference to the devices 20' and 20 ", it should be noted in this connection that also several external microphones can be connected. It is also conceivable here to record structure-borne sound in addition to normal airborne sound (meaning that the respective device comprises a structure-borne sound receiver).

With reference to the embodiment of fig. 1c and 1d, it is noted that the

different buttons

24a and 24b may also be extended by other buttons. To distinguish the buttons, color coding may be provided: red-interference, yellow-indifference, green-very pleasant ambient noise (the latter being exemplarily applicable when the bird sings are clearly visible but perceived as ideal noise).

With reference to fig. 1c to 1d, it should be mentioned that the

devices

20, 20' and 20 "may additionally be integrated as software applications and additionally integrated in devices like smartphones, tablets or smartwatches. These software applications can allow the following functions:

-extending the detection of said noise quality by means of questionnaire techniques or different subjective acquisition techniques;

using sensor systems present in other devices (microphone, GPS, tilt sensor, biofeedback functions);

wireless or, if applicable, mechanical connection to the device for data communication developed herein;

-using the software developed herein to fully control the devices developed herein.

Fig. 1f shows the components of the device 20 "'. The device 20 "' comprises a microphone 26, optional calibration means 26k for calibrating the microphone, as well as a processing unit 42 and a memory 44.

The processing means 42 comprise a pre-processing 46 for encoding an audio file or for deriving an audio fingerprint, and a unit 48 for determining psycho-acoustic parameters. Both the metadata of the pre-processing 46 and the psycho-acoustic parameters of the unit 48 are written to the memory 44. In addition, the audio signal may be more accurately stored or stored in the memory 44 by the unit 49 controlled by a button, for example.

The calibration device 26k is used to provide defined sensitivity values for all sensors. Here, measurement or recording of the switch, frequency response, or compression, for example, is performed in advance.

From the stored audio samples, metadata (audio fingerprints of psycho-acoustic parameters) and the markers obtained by means of one of the input means from fig. 1c to 1d, an actual data analysis by means of the data analyzer 50 and an association with the respective signal class can then be performed.

It is noted here that the device is typically a mobile device and may therefore be supplied with electrical power typically using a battery or accumulator. Alternatively, a conventional power supply is also possible. For storing the record the device may further comprise a storage medium, such as a portable storage medium (e.g. an SD card) or a connection to a server. The connection to the server is done through a wired or glass fibre interface or even a radio interface. At the protocol level, there are different ways to do this, which will not be discussed in detail here.

To improve the evaluability, the device may further comprise means for accurate synchronization with other devices, such as, for example, a time code or a world clock. In addition, it is also conceivable to couple the device to a location determination unit, such as a GPS receiver, or to integrate it together, in order to determine which interfering noise has been determined at which location or is perceived as interfering.

It is noted here that, corresponding to other embodiments, the method 100 or 100' may also comprise a pre-calibration (reference calibration device 26 k). This means that, corresponding to an embodiment, the method 100 or 100' discussed above comprises a step of calibration.

With respect to aspect 1, it is noted that, corresponding to the embodiments, it is also conceivable that all these devices perform a data-inductive recording of the measurement data in order to inductively data. Data summarization may also be advantageous for long-term measurements. Depending on the degree of compression or error, it can be ensured that privacy is preserved, since the monitored data can always be compressed, so that essentially only parameters, such as psychoacoustic parameters (roughness, sharpness, pitch, etc.) or audio fingerprints, are recorded. It is again noted here that the exact decision whether to use a recording or audio fingerprint or merely psycho-acoustic parameters is essentially influenced by legal framework conditions for data and consumer protection.

As already discussed above, so-called "audio fingerprints" are used, wherein there are different variants thereof which will be discussed in more detail below. Many methods are known with which features or fingerprints can be extracted from an audio signal. U.S. Pat. No. 5,918,223 discloses a method for content-based analysis, storage, retrieval and segmentation of audio information. Analysis of the audio data generates a set of values called feature vectors, which can be used to classify and rank the similarity between individual audio segments. The volume, pitch, brightness of the tone, bandwidth and so-called Mel-frequency cepstral coefficients (MFCCs) of an audio piece are used as features to characterize or classify the audio piece. The value of each block or frame is stored and then first order derived with respect to time. To describe the variation over time, a statistic, such as a mean or standard deviation, of each of these features, including its first derivative, is calculated therefrom. The set of statistics forms a feature vector. The feature vector is thus a fingerprint of the audio piece and may be stored in a database.

Similar concepts for indexing and characterizing multimedia segments are disclosed by Yao Wang et al in the expert publication "multimedia content analysis" (IEEE journal of signal processing, 11 months 2000, pages 12 to 36). To ensure a valid association of an audio signal with a particular class, a number of features and classifiers have been developed. A time range feature or a frequency range feature is suggested as a feature for classifying the content of the multimedia segment. These include volume, pitch, which is the fundamental frequency of the audio signal shape, spectral characteristics (such as energy content of the frequency band relative to the total energy content), cut-off frequencies in the spectral process, etc. In addition to the short-term feature involving per block size of the so-called audio signal samples, a long-term quantity is suggested which is related to a longer period of the audio piece. Additional exemplary features are formed by forming time differences for each feature. The features obtained in the blocks are also rarely so directly passed on for classification, since they exhibit too high a data rate. One conventional form of further processing is to compute short-term statistics. In these statistics, for example, calculation of the mean, variance, and time correlation coefficient is performed. This reduces the data rate and on the other hand leads to an improved recognition of the audio signal.

WO 02/065782 describes a method of forming fingerprints to form a multimedia signal. The method involves extracting one or several features from the audio signal. The audio signal here is divided into segments, and processing of blocks and frequency bands is performed in each segment. By way of example, band-by-band computation of energy, pitch and standard deviation of the power density spectrum is mentioned.

From DE 10134471 and DE 10109648 an apparatus and a method for classifying an audio signal are known, in which a fingerprint is obtained by a measure of the pitch of the audio signal. Fingerprints here allow for a robust, content-based classification of audio signals. The documents mentioned here disclose several possibilities for generating measures of pitch on an audio signal. In this case, converting the segments of the audio signal into a spectral range is the basis for calculating the pitch. The pitch for one band or all bands may then be computed in parallel. However, a disadvantage of such a system is that as the distortion of the audio signal increases, the fingerprint no longer has sufficient expressiveness and it is no longer possible to identify the audio signal with satisfactory reliability. However, distortion occurs in many cases, particularly when audio signals are transmitted using a system with low transmission quality. This is particularly true in the case of mobile systems or strong data compression today. Such systems, like mobile phones, are mainly implemented for the bi-directional transmission of speech signals and often only music signals are transmitted with a very low quality. There are also additional factors that may have a negative impact on the quality of the transmitted signal, such as low quality microphones, channel interference and transcoding effects. For an apparatus for recognizing and classifying signals, the deterioration of signal quality may cause a serious deterioration of recognition performance. Inspection has revealed that, particularly when using a device or method according to DE 10134471 and DE 10109648, variations in the system while maintaining the recognition criterion of pitch (spectral flatness measure) do not result in a further significant improvement in recognition performance.

When it is assumed that a sufficient database comprising noise, such as interference noise of different signal classes, has been established, it is thus possible to search for a specific interference noise in any environment and then to see whether such interference noise has been identified. This method is illustrated in fig. 2 a.

Fig. 2a shows a method 200 comprising a step 210 of matching ambient noise received via the microphone 11 (step of reference reception 205) to a record from the database 15. Once a match has been found, as shown at decision 215, a signal is output, such as for logging or for excluding further action. The method is repeated as long as no match has been found, which is illustrated using an arrow to the starting point 201.

Corresponding to an embodiment, the respective audio fingerprint, rather than the recording, of the current ambient noise may be compared to the audio fingerprints previously stored in the database 15. The method here comprises determining an audio fingerprint of the current ambient noise and comparing it with audio fingerprints stored in the database 15.

Even when it is assumed in the method 200 that the ambient noise or audio fingerprint is identified/associated in advance by matching it with the ambient noise/audio fingerprint stored in the database 15, in general, the ambient noise may be monitored with respect to a rule. In the case of comparing ambient noise/audio fingerprints, the rule would mean "partial match".

Another such rule may be, for example, simply a volume value to be exceeded or a threshold value related to a psychoacoustic parameter to be exceeded. According to an embodiment, psychoacoustic parameters of the current ambient noise are derived, which are compared by predefined rules with predefined respective thresholds in order to identify the occurrence of such events.

According to an extended embodiment, the method may not only purely identify such disturbing noise, but may also correlate (i.e. classify) the noise as e.g. speech, motor noise, music, church chimes or shots.

One potential application scenario for performing this method, illustratively on a smart phone or a device specifically designed for this purpose, is where the device is located in a hotel room and is monitoring ambient noise. Here, data from the database 15 is used to evaluate the ambient noise and to store how many noise events that may be perceived as interference have occurred over time and which of them may be perceived as interference. This may be, for example, calculating the disturbing air conditioning noise over the course of a day. As an alternative to logging, an audio recording of this noise or a pre-buffered storage of the ambient noise may be performed (see above). The basic idea is that the hotel operator can use this method to predict and evaluate noise perception.

Even when the above embodiments have assumed that the noise of different signal classes is identified and associated separately from each other, it is mentioned here that according to further embodiments it is for example also possible to identify and associate several noises of different signal classes that may overlap.

Fig. 2b shows an expanded method 200' including additional steps between the step or point of decision 215 and the end 216.

These are counted events or used element variables 221, via step 220, thus obtaining the number of events 222. Optionally, audio recording may begin with an event that has been identified, as shown using step 230.

Fig. 2c shows another embodiment of the device 40. It comprises a processor 41 as a central unit which performs the actual steps of analysis/matching. First, it uses the internal microphone 26, wherein access to the external microphones 26e1 and 26e2 is also conceivable. For example, data for matching is stored in the internal memory 44.

Optionally, the processor is configured to determine and match audio fingerprints and/or psychoacoustic parameters in order to obtain corresponding rule matches.

To allow this functionality, another peripheral unit is optionally provided, such as an internal clock 55, a battery 56b or a power supply 56, which may also be typically implemented using a cable 56 k. Optionally, the processor also has access to a further sensor element 57, a control unit 58 (such as a record activation button) or a timer 59. Here, according to a further embodiment, the processor 41 may be further configured to perform an objective noise assessment in order to establish the association in connection with a subjective assessment (identifying subjective tone events).

According to an embodiment, the CPU may classify/categorize each identified noise of the signal classes in different evaluation matrices depending on the respective noise class according to the previously obtained subjective evaluation of the pleasure.

According to further embodiments, an external data storage 60 (such as an external hard disk or server) may also be provided for storing or loading the database. The connection may be a wired connection or a wireless connection. In wireless communication, corresponding to further embodiments, a communication interface 62 will be provided, such as a wireless interface 62w or a wired interface 62k enabling external access.

According to another aspect, a system is provided that consists essentially of two of the previously described devices 40 in combination with each other such that the two devices mutually activate upon receiving corresponding noise (i.e., signal class) in one of the devices. The system is used to analyze or evaluate the noise of the individual noise classes in more detail. Where the method discussed below in figure 3 is performed.

Fig. 3a shows a method 300 comprising steps of noise analysis corresponding to the method 200 or 200' performed at a first location and a second location. This means that step 210 exists twice (refer to 210a and 210 b).

The determined recordings or parameters, such as audio fingerprints at the two locations (generated by

steps

210a and 210b), are then compared in a further step 220.

According to an embodiment, the two steps 210 at two adjacent locations may be interdependent as illustrated using the optional step "audio recording on adjacent devices 211". Alternatively, another action may be performed at an adjacent device. The reason for this is that, for example, when a first device performing method 210a identifies noise and activates a second device performing method 210b, the same noise may be identified at different locations. It is finally mentioned here that, starting from the place of decision 215, there is another arrow pointing to the starting point 301, which basically implies the following fact: the method of noise analysis 210a will be performed until a corresponding match has been found.

Since the locations are typically spatially adjacent, it is possible to estimate the propagation of noise, velocity, or a larger noise source in this manner.

Illustratively, when comparing its own analysis to the analysis on different devices at the same time, when the same event has been identified at several devices, it can be determined whether this is a global event such as lightning (refer to reference numeral 232 after the domain of decision 321) or a local event (refer to reference numeral 324 after the domain of decision 321). For global event 323, typically the difference in level between "near" and "far" devices is negligible (level 1/r, r variation relative to r is small). For local events 324, the levels are very different (level 1/r, r varies greatly with respect to r). For example, the local event may be a call for help, an explosion, an open concert. For local events, other analyses may then be performed, i.e., analyses with respect to other parameters 325. From the shift in time or the shift in frequency, the amount of local events, propagation, or timelines may be determined. Determining that the global event 323 or the local event 324 (e.g., its analysis 325) is substantially the end 329 of the method.

For example, one possible application scenario is to distribute several devices in a city center. All devices are connected to each other by a data connection, such as a wired, wireless, ethernet or LAN connection. A connection using a server is also possible. All devices analyze the noise situation (psycho-acoustic parameters, audio fingerprints). One of the devices identifies characteristic events, such as signal classes previously classified in a database. The audio recording is triggered on the spot. At the same time, the device triggers a behavior, such as an action on one of the neighboring nodes. By comparing the two nodes, a global event and a local event can be distinguished, as already discussed above.

For embodiments where two different classes of noise are present at the same time, it is possible, according to another embodiment, to monitor the noise even when the noise does not behave identically (move) or is associated with local events on the one hand and global events on the other hand. For example, a motor vehicle moving from location a to C via B may be monitored for noise, whereas a siren (e.g., siren) sounds at location B over time. Another example is bird calls and traffic noise and fire truck-bird calls can be loud but pleasant, traffic noise quiet enough so as not to cause interference, but rather to monitor the fire truck over several devices and increase "attention" by signal category.

Method 300 is substantially performed by a system comprising two of devices 40 (fig. 2 c).

However, it is also possible that there is little change, as shown in fig. 3b, since an additional interface for connecting the two devices is provided.

Fig. 3b shows a device 70 comprising a microphone 26 and an optional calibration unit 26k on the input side. For example, the audio stream received by the microphone is preprocessed by the preprocessing 46 in order to derive an audio fingerprint (reference numeral 46a) or psycho-acoustic parameters (reference numeral 48). Meanwhile, an event or category may be identified (refer to reference numeral 50). By identifying an event/category, an automatic audio recording may be triggered on the one hand (reference numeral 50a1), or a control instruction may be issued, such as for activating another node (reference numeral 50a2 or another device 70'). The means for outputting control instructions 50a2 may illustratively activate a memory that subsequently receives and records data from the means for generating audio fingerprint 46a or the means for deriving psycho-acoustic parameters 48. The audio signal may also be stored in the memory 44, wherein the recording may also be enabled or prevented here by means of the button 49 a. In this embodiment, the CPU 41 may also be connected to a timer 59.

A device 70', which performs substantially the same function, is disposed at another adjacent position, except for the device 70. The device 70 'further comprises a memory 44 which has stored the audio result for the time segment when the device 70' has been activated by means of the activation means 50a2 or on the basis of noise which is identified and belongs to a class. The data analyzer 72 analyzes the recordings or audio fingerprints or psychoacoustic parameters from the memory 44 of the devices 70 and 70' in a next step (e.g. with respect to expansion). Here, however, it is advantageous for the data analyzer 72 to be connected to both memories of the other device, wherein it is mentioned here that the data analyzer 72 may be arranged in one of the devices 70 and 70' or externally with respect to both of them.

Corresponding to another embodiment, a button (e.g., button 24a ') may be integrated into device 70 such that device 70 also performs the functions of device 20, 20', or 20' ".

Optional element 50a' allows automatic triggering of recording after recognition of a classification. Alternatively, it is also conceivable here to start recording automatically when no noise is found in any signal class that has been obtained.

In other words, method 303 may describe that the functionality of method 200 is substantially covered, i.e. that noise such as speech, motor noise, music, kitchen tiles, shot sounds, etc. is recognized and classified, and that the functionality has been extended by analysis according to multiple microphones at different locations.

It is also noted herein that certain categories of automatic recording, such as explosions and shots, may also suggest terrorism, for example. Here, it may be useful for all neighboring nodes 70/70' to switch directly to recording.

In addition, it is also possible to automatically (e.g., temporally limited) record when a particular noise threshold is exceeded for a period of time. The recordings can also be extended to neighboring nodes in order to thus perform an accurate localization of the signal source by these longer recordings when combining multiple nodes (leading to investigation of interference sources, separating noise sources).

The potential application areas for the three cases are as follows:

-tourism, hotel, health sector, bike path, hiking path;

work protection (office work, machine factory, cabin workplace);

city planning (soundscape, noise mapping);

public safety (monitoring production facilities).

Combinations of methods 100/100', 200/200', and 300 or the functions of apparatuses 20/20'/20 "/20 '", 40, and 70/70' are also contemplated. Such examples are a combination of devices and methods for subjective assessment and recording and machine assessment of the device.

It is noted here that elements already discussed in connection with the different aspects may of course also be applied to the second aspect. Illustratively, the teachings relating to audio fingerprints or psychoacoustic parameters apply to all three aspects, wherein the teachings are only discussed in more detail in connection with the first aspect.

Although some aspects have been described in the context of a device, it is clear that these aspects also represent a description of the corresponding method, so that a block or element of a device also corresponds to a respective method step or feature of a method step. Similarly, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. For example, some or all of the method steps may be performed by (or using) a hardware device, such as a microprocessor, programmable computer, or electronic circuitry. In some embodiments, some or several of the most important method steps may be performed by such an apparatus.

The inventive encoded signals, such as audio signals or video signals or transport stream signals, may be stored on a digital storage medium or may be transmitted over a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the internet.

For example, the inventive encoded audio signal may be stored on a digital storage medium or may be transmitted over a transmission medium such as a wireless transmission medium or a wired transmission medium (e.g., the internet).

Embodiments of the invention may be implemented in hardware or software, depending on certain implementation requirements. This embodiment may be performed using a digital storage medium, such as a floppy disk, a DVD, a blu-ray disk, a CD, a ROM, a PROM, an EPROM, an EEPROM or FLASH memory, a hard disk drive or another magnetic or optical memory having electronically readable control signals stored thereon, which cooperate or are capable of cooperating with a programmable computer system such that the respective method is performed. Accordingly, the digital storage medium may be computer-readable.

Some embodiments according to the invention comprise a data carrier comprising electronically readable control signals capable of cooperating with a programmable computer system such that one of the methods described herein is performed.

Generally, embodiments of the invention may be implemented as a computer program product having a program code operable to perform one of the methods when the computer program product runs on a computer.

The program code can be stored, for example, on a machine-readable carrier.

Other embodiments include a computer program for performing one of the methods described herein, wherein the computer program is stored on a machine-readable carrier.

In other words, an embodiment of the inventive methods is thus a computer program comprising program code for performing one of the methods described herein, when the computer program runs on a computer.

Thus, another embodiment of the inventive method is a data carrier (or a digital storage medium or a computer readable medium) comprising a computer program recorded thereon for performing one of the methods described herein.

Thus, another embodiment of the inventive method is a data stream or a signal sequence representing a computer program for performing one of the methods described herein. For example, a data stream or signal sequence may be configured to be transmitted over a data communication connection (e.g., over the internet).

Another embodiment includes a processing apparatus, such as a computer or programmable logic device, configured or adapted to perform one of the methods described herein.

Another embodiment comprises a computer having installed thereon a computer program for performing one of the methods described herein.

Another embodiment according to the present invention comprises a device or system configured to transmit a computer program for performing one of the methods described herein to a receiver. The transfer may be performed electronically or optically. For example, the receiver may be a computer, mobile device, storage device, or the like. For example, the apparatus or system may comprise a file server for transmitting the computer program to the receiver.

In some embodiments, a programmable logic device (e.g., a field programmable gate array, FPGA) may be used to perform some or all of the functions of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor to perform one of the methods described herein. Generally, in some embodiments, the method is performed by any hardware device. This may be universally applicable hardware, such as a Computer Processor (CPU), or hardware dedicated to the method, such as an ASIC.

The above-described embodiments are merely illustrative of the principles of the present invention. It is to be understood that modifications and variations of the configurations and details described herein will be apparent to others skilled in the art. Therefore, the invention is intended to be limited only by the scope of the appended patent claims, and not by the specific details given herein by way of illustration and description of the embodiments.

Claims

1. A method (300) for analyzing noise of at least one signal class of a plurality of signal classes, comprising the steps of:

receiving (205) ambient noise at a first location;

establishing (210) whether the ambient noise of the first location or a set of parameters derived from the ambient noise of the first location satisfies a predefined rule describing a signal class of the plurality of signal classes;

receiving ambient noise at a second location;

establishing whether the ambient noise of the second location or a set of parameters derived from the ambient noise of the second location satisfies a predefined rule describing a signal class of the plurality of signal classes; and

determining (320) a relationship between the ambient noise at the first location and the ambient noise at the second location, or a relationship between a set of parameters derived from the ambient noise at the first location and a set of parameters derived from the ambient noise at the second location;

wherein determining (320) the relationship comprises: determining a difference in a level of ambient noise at the first location and the second location, wherein the ambient noise at the first location and the second location is produced by the same interference noise; and/or

Wherein determining (320) the relationship comprises: determining a time offset and/or a run-time offset of ambient noise at the first location and the second location; and/or

Wherein determining (320) the relationship comprises: determining a frequency difference and/or determining a hall effect in ambient noise at the first and second locations; and/or

Wherein the determining (320) comprises: an analysis (325) is performed with respect to a distance between the first location and the second location, a movement of a noise source is analyzed according to a relationship and/or a size of the noise source is analyzed.

2. The method of claim 1, wherein the establishing (210) is performed by comparing the ambient noise to a previously buffered recording, and the predefined rule defines that the ambient noise matches at least part of the previously buffered recording.

3. The method of claim 1, wherein the establishing (210) is performed by comparing a derived parameter set with a previously derived parameter set, and the predefined rule defines at least a partial match of the derived parameter set with the previously derived parameter set.

4. The method of claim 2, wherein the derived parameter set and the previously derived parameter set comprise audio fingerprints.

5. Method according to claim 2, wherein the previously buffered recordings and/or the previously derived parameter sets are received by an external database (15).

6. The method according to claim 1, wherein the derived set of parameters comprises psychoacoustic parameters, and wherein the establishing (210) is performed by evaluating the psychoacoustic parameters of the ambient noise, and the predefined rule comprises a threshold value of the psychoacoustic parameters.

7. The method of claim 6, wherein the psychoacoustic parameters include volume, clarity, pitch, roughness, pulse characteristics, and/or variation intensity.

8. The method of claim 1, wherein the information about the frequency of the noise of the signal class comprises the signal class when stored (220); or

Wherein the information comprises a time indication when the noise of the signal class is identified and/or a location indication where the noise of the signal class is identified.

9. The method according to claim 1, wherein the previously obtained record or parameter is read from an external database (15).

10. The method according to claim 1, wherein the steps of receiving (205), establishing (210) are repeated for adjacent locations.

11. The method of claim 10, further comprising: a relationship between ambient noise at a first location and ambient noise at a second location is determined (320), or a relationship between a set of parameters derived from ambient noise at the first location and a set of parameters derived from ambient noise at the second location is determined.

12. The method of claim 1, wherein the predefined rule describes ambient noise or a set of parameters for a control signal such that an activation signal is issued when the predefined rule has been met.

13. The method (300) according to any one of claims 1-12, wherein the received ambient noise at the first location and/or the second location is recorded for a migration time window, or wherein the set of parameters is derived from the ambient noise at the first location and/or the second location for the migration time window.

14. The method (300) according to any one of claims 1-12, wherein the establishing (210) is performed by comparing ambient noise to a previously buffered recording, and the predefined rule defines that ambient noise matches at least part of the previously buffered recording.

15. The method (300) according to any one of claims 1-12, wherein establishing (210) is performed by comparing a derived parameter set with a previously derived parameter set, and the predefined rule defines at least a partial match of the derived parameter set and the previously derived parameter set.

16. The method (300) of claim 15, wherein the derived set of parameters and the previously derived set of parameters comprise an audio fingerprint.

17. The method (300) according to any of claims 1-12, wherein the previously buffered records and/or the previously derived parameter sets are received from an external database (15).

18. A digital storage medium on which a computer program comprising program code is stored for performing the method according to any of claims 1-17 when the program is run on a computer.

19. An apparatus (40) for correlating noise of at least one signal class of a plurality of signal classes, comprising:

a microphone (11) for receiving ambient noise;

a processor (50) for establishing whether the ambient noise or a set of parameters derived from the ambient noise satisfies a predefined rule describing a signal class of the plurality of signal classes;

an interface for logging that the predefined rule has been met, or recording the environmental noise received for a migration time window, or deriving a set of parameters from the environmental noise of the migration time window and storing the set of parameters of the migration time window, or issuing an activation signal for another device to identify noise;

wherein the processor (50) is configured to determine (320) a relationship between the ambient noise at the first location and the ambient noise at the second location, or a relationship between a set of parameters derived from the ambient noise at the first location and a set of parameters derived from the ambient noise at the second location;

wherein the processor (50) is configured to determine (320) a relationship comprising: determining a difference in a level of ambient noise at the first location and the second location, wherein the ambient noise at the first location and the second location is produced by the same interference noise; and/or

Wherein the processor (50) is configured to determine (320) a relationship comprising: determining a time offset and/or a run-time offset of ambient noise at the first location and the second location; and/or

Wherein the processor (50) is configured to determine (320) a relationship comprising: determining a frequency difference and/or determining a hall effect in ambient noise at the first and second locations; and/or

Wherein the processor (50) is configured to analyze (325) a movement of a noise source and/or a size of a noise source according to a relation with respect to a distance between the first location and the second location.

20. Device (40) according to claim 19, wherein the device (40) comprises a communication interface (62) by means of which protocols can be output and/or by means of which the predefined rules can be read in and/or by means of which communication with another device is possible.

21. The device (40) of claim 20, wherein the device (40) is configured to be networked with another device (40') at a neighboring location in order to determine a relationship between the ambient noise of a first location and the noisy ambient sound of a second location, or a relationship between a set of parameters derived from the ambient noise of the first location and a set of parameters derived from the ambient noise of the second location.

22. A system (77) for analyzing noise of at least one signal class of a plurality of signal classes, comprising:

a first unit (70) comprising a first microphone (11) for receiving ambient noise at a first location;

a second unit (70') comprising a second microphone (11) for receiving ambient noise at a second location; and

a processor (72) for establishing whether the ambient noise of the first location or a set of parameters derived from the ambient noise of the first location fulfils a predefined rule describing a signal class of the plurality of signal classes, and for establishing whether the ambient noise of the second location or a set of parameters derived from the ambient noise of the second location fulfils a predefined rule describing a signal class of the plurality of signal classes;

wherein the processor (72) is configured to determine a relationship between the ambient noise at the first location and the ambient noise at the second location, or a relationship between a set of parameters derived from the ambient noise at the first location and a set of parameters derived from the ambient noise at the second location;

23. The system (77) according to claim 22, wherein the first and second units (70, 70') are connected via a communication interface and/or a radio interface.

24. A method (200, 200') for correlating noise of at least one signal class of a plurality of signal classes, comprising the steps of:

receiving (205) ambient noise;

establishing (210) whether the ambient noise or a set of parameters derived from the ambient noise satisfies a predefined rule describing a signal class of the plurality of signal classes;

logging (220) that the predefined rule has been met, or recording the received ambient noise for a migration time window, or deriving the set of parameters from the ambient noise of the migration time window and storing the set of parameters of the migration time window, or issuing an activation signal for another device to identify noise;

wherein the steps of receiving (205), establishing (210) are repeated for adjacent locations;

wherein the method further comprises: determining a relationship (320) between ambient noise at a first location and ambient noise at a second location, or a relationship between a set of parameters derived from ambient noise at the first location and a set of parameters derived from ambient noise at the second location;

25. A digital storage medium on which a computer program comprising program code is stored for performing the method (200, 200') according to claim 24 when the program is run on a computer.