EP4097695B1 - Procédé et dispositif d'identification d'anomalies acoustiques - Google Patents

Procédé et dispositif d'identification d'anomalies acoustiques Download PDF

Info

Publication number
EP4097695B1
EP4097695B1 EP21702020.5A EP21702020A EP4097695B1 EP 4097695 B1 EP4097695 B1 EP 4097695B1 EP 21702020 A EP21702020 A EP 21702020A EP 4097695 B1 EP4097695 B1 EP 4097695B1
Authority
EP
European Patent Office
Prior art keywords
abcd
audio segments
anomaly
audio
accordance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP21702020.5A
Other languages
German (de)
English (en)
Other versions
EP4097695A1 (fr
Inventor
Jakob Abesser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Publication of EP4097695A1 publication Critical patent/EP4097695A1/fr
Application granted granted Critical
Publication of EP4097695B1 publication Critical patent/EP4097695B1/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/02Mechanical actuation
    • G08B13/04Mechanical actuation by breaking of glass
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/16Actuation by interference with mechanical vibrations in air or other fluid
    • G08B13/1654Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
    • G08B13/1672Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0438Sensor means for detecting
    • G08B21/0469Presence detectors to detect unsafe condition, e.g. infrared sensor, microphone

Definitions

  • Embodiments of the present invention relate to a method and a device for detecting acoustic anomalies. Further exemplary embodiments relate to a corresponding computer program. According to exemplary embodiments, the detection of a normal situation and the detection of anomalies in comparison to this normal situation take place.
  • a complex overlay of several sound sources In real acoustic scenes there is usually a complex overlay of several sound sources. These can be positioned in the foreground or background and in any spatial position.
  • a variety of possible sounds are also conceivable, which can range from very short transient signals (e.g. clapping, gunshot) to longer, stationary sounds (siren, passing train).
  • a recording typically covers a specific period of time, which is divided into one or more time windows when viewed subsequently. Based on this subdivision and depending on the length of the noise (cf. transient or longer, stationary sound), a sound can extend over one or more audio segments/time windows.
  • an anomaly i.e. a sound deviation from the "acoustic normal state", i.e. the amount of noise that is considered “normal"
  • anomalies are breaking glass (burglary detection), a pistol shot (monitoring public events) or a chainsaw (monitoring nature reserves).
  • the problem is that the sound of the anomaly (out-of-order class) is often not known or cannot be precisely defined or described (e.g. what can a broken machine sound like?).
  • the second problem is that novel algorithms for sound classification using deep neural networks are very sensitive to changing (and often unknown) acoustic conditions in the operational scenario.
  • Classification models that are trained with audio data, for example with a high-quality microphone, achieve this recorded, only poor recognition rates when classifying audio data that was recorded using a poorer microphone.
  • Possible solutions lie in the area of “domain adaptation”, i.e. adapting the models or the audio data to be classified in order to achieve greater robustness in recognition. In practice, however, it is often logistically difficult and too expensive to record representative audio recordings at the later location of use of an audio analysis system and then annotate them with regard to the sound events they contain.
  • the third problem with audio analysis of environmental noise lies in data protection concerns, since classification methods can theoretically also be used to recognize and transcribe speech signals (e.g. when recording a conversation near the audio sensor).
  • classification models of existing state-of-the-art solutions are as follows: If the sound anomaly to be detected can be precisely specified, a classification model based on machine learning algorithms can be trained to recognize specific noise classes using supervised learning. Current studies show that neural networks in particular are very sensitive to changing acoustic conditions and that additional adaptation of classification models to the respective acoustic situation of the application must be carried out.
  • EP 2 988 105 A2 which describes a device of the method for the automatic detection and classification of audible acoustic signals in a surveillance area.
  • the object of the present invention is to create a concept for detecting anomalies that is optimized with regard to the learning behavior and that enables reliable and accurate detection of anomalies.
  • Embodiments of the present invention provide a method for detecting acoustic anomalies.
  • the method includes the steps of obtaining a long-term recording, which has a duration of at least greater than 1 minute or at least 10 minutes or at least 1 hour or at least 24 hours, with a plurality of first audio segments assigned to respective first time windows and analyzing the plurality of first audio segments to get to each of the A plurality of first audio segments have a first feature vector describing the respective first audio segment, such as. B. to obtain a spectrum for the audio segment (time-frequency spectrum) or an audio fingerprint with certain characteristics for the audio segment.
  • the result of the analysis of a long-term recording divided into a large number of time windows is a large number of first (single- or multi-dimensional) feature vectors for the large number of first audio segments (assigned to the corresponding times/windows of the long-term recording), which represent the "normal state".
  • the method includes further steps of obtaining another recording with one or more second audio segments associated with respective second audio windows and analyzing the one or more second audio segments to obtain one or more feature vectors describing the one or more second audio segments.
  • the result of the second part of the method is, for example, a large number of second feature vectors (e.g. with corresponding times of further recording).
  • the one or more second feature vectors are compared with the plurality of first feature vectors (e.g. by comparing the identities or similarities or by recognizing a sequence) in order to detect at least one anomaly.
  • a sound anomaly i.e. a recognition of the first appearance of a previously unheard sound
  • a temporal anomaly e.g. changed repetition pattern of a sound that has already been heard
  • a spatial anomaly Occurrence of a sound that has already been heard in a previously unknown spatial position.
  • Embodiments of the present invention are based on the knowledge that an “acoustic normal state” and “normal noises” can be learned independently simply through a long-term sound analysis (phase 1 of the method comprising the steps of obtaining a long-term recording and analyzing it). This means that this long-term sound analysis results in an independent or autonomous adaptation of an analysis system to a specific acoustic scene. No annotated training data (recording + semantic class annotation) is required, which represents a great saving in time, effort and cost. When this acoustic "normal state” or the "normal” noises are recorded, the current noise environment can be carried out in a subsequent analysis phase (phase 2 with the steps of obtaining another recording and analyzing it).
  • phase 1 involves learning a model using the normal background noise based on a statistical procedure or machine learning, whereby this model then allows (in phase 2) to compare currently recorded background noise with regard to its degree of novelty (probability of an anomaly).
  • Another advantage of this approach is that the privacy of people who may be in the immediate vicinity of the acoustic sensors is protected. This is called privacy by design. Due to the system, speech recognition is not possible because the interface (audio in, anomaly probability function out) is clearly defined. This can dispel possible data protection concerns when using acoustic sensors.
  • the long-term recording represents the normal acoustic situation
  • the multitude of first audio segments describes this normal situation in themselves and/or in their order.
  • the large number of first audio segments represent a kind of reference on their own and/or in their combination.
  • the aim of the method is to detect anomalies in comparison to this normal situation.
  • the result of the clustering described above is a description of the reference based on first audio segments.
  • the second audio segments alone or in their combination are then compared with the reference to represent the anomaly.
  • the anomaly is a deviation of the current acoustic situation described by the second feature vectors from the reference described by the first feature vectors.
  • the first feature vectors alone or in their combination represent a reference image of the normal state
  • the second feature vectors alone or in their combination describe the current acoustic situation
  • the anomaly in Form of a deviation of the description of the current acoustic situation (see second feature vectors) from the reference (see first feature vectors) can be recognized.
  • the anomaly is therefore defined by the fact that at least one of the second acoustic feature vectors deviates from the sequence of the first acoustic feature vectors. Possible deviations can be: sound anomalies, temporal anomalies and spatial anomalies.
  • phase 1 captures a large number of first audio segments, which are also referred to below as “normal” or “normal” considered noises/audio segments. According to exemplary embodiments knowing these "normal” audio segments makes it possible to detect a so-called sound anomaly.
  • the sub-step of identifying a second feature vector that differs from the analyzed first feature vectors is then carried out.
  • the method when analyzing, includes the substep of identifying a repeat pattern in the plurality of first time windows. This involves identifying repeating audio segments and determining the resulting pattern. According to exemplary embodiments, identification is carried out using repeating, identical or similar first feature vectors belonging to different first audio segments. According to exemplary embodiments, identical and similar first feature vectors or first audio segments can also be grouped into one or more groups during identification.
  • the method includes recognizing a sequence of first feature vectors associated with the first audio segments or recognizing a sequence of groups of identical or similar first feature vectors or first audio segments.
  • the basic steps make it advantageously possible to recognize normal noises or to recognize normal audio objects.
  • the combination of these normal audio objects in a certain order or a certain repetition pattern in terms of time then represents a normal acoustic state.
  • this method then enables the sub-step of comparing the repetition pattern of the first audio segments and/or order in the first audio segments with the repetition pattern of the second audio segments and/or order in the second audio segments to be carried out during matching. This comparison enables the detection of a temporal anomaly.
  • the method may include the step of determining a respective position for the respective first audio segments.
  • the respective position can also be determined for the respective second audio segments are made. According to one exemplary embodiment, this then enables the detection of a spatial anomaly to be carried out by the substep of comparing the position assigned to the respective first audio segments with the position assigned to the corresponding respective second audio segment.
  • each feature vector can each have one dimension or several dimensions for the different audio segments.
  • a possible realization of a feature vector would be, for example, a time-frequency spectrum.
  • the dimensional space can also be reduced.
  • the method includes the step of reducing the dimensions of the feature vector.
  • the method can have the step of determining a probability of occurrence of the respective first audio segment and of entering the probability of occurrence together with the respective first feature vector.
  • the method can have the step of determining a probability of occurrence of the respective first audio segment and outputting the probability of occurrence with the respective first feature vector and an associated first time window.
  • the probability of occurrence for the respective audio segment or a more precise probability of the occurrence of the audio segment at this point in time is output. The output takes place with the corresponding data record or feature vector.
  • the method can also be implemented using a computer.
  • the method has a computer program with a program code for carrying out the method.
  • FIG. 1 For exemplary embodiments, relate to a device with an interface and a processor.
  • the interface serves to obtain a long-term recording with a plurality of first audio segments assigned to respective first time windows and to obtain a further recording with one or more second audio segments assigned to respective second time windows.
  • the processor is designed to handle the multitude to analyze the first audio segments in order to obtain a first feature vector describing the respective first audio segment for each of the plurality of first audio segments.
  • the processor is designed to analyze the one or more second audio segments in order to obtain one or more feature vectors describing the one or more second audio segments.
  • the processor is designed to match the one or more second feature vectors with the plurality of first feature vectors in order to detect at least one anomaly.
  • the device comprises a recording unit connected to the interface, such as. B. a microphone or a microphone array.
  • the microphone array advantageously enables position determination, as already explained above.
  • the device comprises an output interface for outputting the probability of occurrence explained above.
  • Fig. 1 shows a method 100, which is divided into two phases 110 and 120.
  • Step 112 includes a long-term recording the acoustic normal state in the application scenario.
  • the analysis device 10 placed in the target environment so that a long-term recording 113 of the normal state is captured.
  • this long-term recording can last for at least 1 minute or at least 10 minutes or at least 1 hour or at least 24 hours.
  • This long-term recording 113 is then broken down, for example.
  • the breakdown can be divided into time periods of equal length, such as: B. 1 second or 0.1 seconds or dynamic time ranges.
  • Each time range includes an audio segment.
  • step 114 commonly referred to as analyzing, these audio segments are examined separately or in combination.
  • a so-called feature vector 115 (first feature vectors) is determined for each audio segment during analysis.
  • Feature vectors 115 can be determined, for example, by an energy spectrum for a specific frequency range or generally a time-frequency spectrum.
  • step 114 typical or dominant noises can then optionally be identified using unsupervised learning methods (e.g. clustering).
  • unsupervised learning methods e.g. clustering
  • time periods or audio segments are grouped that have similar feature vectors 115 and that accordingly have a similar sound.
  • No semantic classification of a sound e.g. “car” or “plane”
  • unsupervised learning takes place based on the frequencies of repeating or similar audio segments.
  • an unsupervised learning of the temporal order and/or typical repetition patterns of certain noises takes place.
  • the result of clustering is a compilation of audio segments or noises that are normal or typical for this area.
  • each audio segment can also be assigned a probability of occurrence.
  • Repeat patterns or a sequence i.e. a combination of several audio segments, are identified that are typical or normal for the current environment.
  • different audio segments can also be assigned a probability to each grouping, each repeat pattern or each sequence.
  • Phase 120 has the three basic steps 122 and 124 and 126.
  • an audio recording 123 is again recorded. This is typically significantly shorter compared to the audio recording 113. For example, this audio recording is shorter compared to audio recording 113. However, it can also be a continuous audio recording.
  • This audio recording 123 is then analyzed in a subsequent step 124. This step is comparable in content to step 114. This in turn involves converting the digital audio recording 123 into feature vectors. If these second feature vectors 125 are now available, they can be compared with the feature vectors 115.
  • a probability for each of the three anomaly types can be output at time X. This is with the arrows 126z, 126k and 126r (one arrow for each type of anatomy). Fig. 3 illustrated.
  • threshold values can be defined as to when feature vectors are similar or when groups of feature vectors are similar, so that the result then also presents a threshold value for an anomaly.
  • This threshold application can also be linked to the output of the probability distribution or appear in this in combination, e.g. B. to enable more accurate temporal detection of anomalies.
  • step 114 in adjustment phase 110 can also include unsupervised learning of typical spatial positions and/or movements of certain noises.
  • Microphone 18 shown instead of the in Fig. 3
  • Microphone 18 shown has two microphones or a microphone array with at least two microphones.
  • spatial localization of the current dominant sound sources/audio segments is then possible in the second phase 120 through multi-channel recording.
  • the underlying technology here can be, for example, beamforming.
  • Fig. 2a illustrates the temporal anomaly.
  • audio segments ABC for both phase 1 and phase 2 are plotted along the time axis t.
  • phase 1 it was recognized that a normal situation or normal order exists such that the audio segments ABC appear in the order ABC.
  • a repeat pattern was recognized that can be followed by another group ABC after the first group ABC.
  • this pattern ABCABC is recognized in phase 2, it can be assumed that there is no anomaly or at least no temporal anomaly. However, if the pattern ABCAABC shown here is recognized, there is a temporal anomaly because another audio segment A is arranged between the two groups ABC.
  • This audio segment A or anomalous audio segment A is provided with a double frame.
  • a sonic anomaly is illustrated.
  • the audio segments ABCABC were again recorded along the time axis t (cf. Fig. 2a ).
  • the sound anomaly during recognition is shown by the fact that another audio segment, here audio segment D, appears in phase 2.
  • This audio segment D has an increased length, e.g. B. over two time ranges and is therefore illustrated as DD.
  • the sound anomaly is double-framed in the audio segment species order. This sound anomaly could, for example, be a sound that was never heard during the learning phase. For example, there may be thunder that differs from the previous elements ABC in terms of loudness/intensity and length.
  • a local anomaly is illustrated.
  • two audio segments A and B were detected at two different positions, position 1 and position 2.
  • both elements A and B were recognized, with localization determining that both audio segment A and audio segment B were at positions 1.
  • the presence of audio segment B at position 1 represents a spatial anomaly.
  • the device 10 essentially includes the input interface 12, such as. B. a microphone interface and a processor 14.
  • the processor 14 receives the one or more (simultaneously present) audio signals from the microphone 18 or the microphone array 18 'and analyzes them. To this end, he essentially leads in connection with Fig. 1 Steps 114, 124 and 126 explained.
  • the result to be output (cf. output interface 16) is a set of feature vectors that represent the normal state or, in phase 2, an output of the detected anomalies, e.g. B. assigned to a specific type and/or assigned to a specific time.
  • the interface 16 can refer to a probability of anomalies or a probability of anomalies at certain times or, in general, a probability of feature vectors at certain times.
  • aspects have been described in connection with a device, it is understood that these aspects also represent a description of the corresponding method, so that a block or a component of a device is also to be understood as a corresponding method step or as a feature of a method step. Similarly, aspects described in connection with or as a method step also represent a description of a corresponding block or detail or feature of a corresponding device.
  • Some or all of the method steps may be performed by a hardware apparatus (or using a hardware device). Apparatus), such as a microprocessor, a programmable computer or an electronic circuit can be carried out. In some embodiments, some or more of the key process steps may be performed by such apparatus.
  • embodiments of the invention may be implemented in hardware or in software.
  • the implementation may be using a digital storage medium such as a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, a hard drive or other magnetic or optical memory are carried out on which electronically readable control signals are stored, which can interact or interact with a programmable computer system in such a way that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
  • Some embodiments according to the invention thus include a data carrier that has electronically readable control signals that are capable of interacting with a programmable computer system such that one of the methods described herein is carried out.
  • embodiments of the present invention may be implemented as a computer program product with a program code, the program code being effective to perform one of the methods when the computer program product runs on a computer.
  • the program code can, for example, also be stored on a machine-readable medium.
  • inventions include the computer program for performing one of the methods described herein, the computer program being stored on a machine-readable medium.
  • an exemplary embodiment of the method according to the invention is therefore a computer program that has a program code for carrying out one of the methods described herein when the computer program runs on a computer.
  • a further exemplary embodiment of the method according to the invention is therefore a data carrier (or a digital storage medium or a computer-readable medium) on which the computer program for carrying out one of the methods described herein is recorded.
  • the data carrier, digital storage medium or computer-readable medium is typically tangible and/or non-transitory.
  • a further exemplary embodiment of the method according to the invention is therefore a data stream or a sequence of signals which represents the computer program for carrying out one of the methods described herein.
  • the data stream or the sequence of signals can, for example, be configured to be transferred via a data communication connection, for example via the Internet.
  • Another embodiment includes a processing device, such as a computer or a programmable logic device, configured or adapted to perform one of the methods described herein.
  • a processing device such as a computer or a programmable logic device, configured or adapted to perform one of the methods described herein.
  • Another embodiment includes a computer on which the computer program for performing one of the methods described herein is installed.
  • a further embodiment according to the invention includes a device or system designed to transmit a computer program to a receiver for carrying out at least one of the methods described herein.
  • the transmission can take place electronically or optically, for example.
  • the recipient may be, for example, a computer, a mobile device, a storage device or a similar device.
  • the device or system can, for example, comprise a file server for transmitting the computer program to the recipient.
  • a programmable logic device e.g., a field programmable gate array, an FPGA
  • a field programmable gate array may cooperate with a microprocessor to perform any of the methods described herein.
  • the methods in some embodiments are carried out by one any hardware device. This can be universally applicable hardware such as a computer processor (CPU) or hardware specific to the method, such as an ASIC.
  • the devices described herein may be implemented, for example, using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
  • the devices described herein, or any components of the devices described herein may be at least partially implemented in hardware and/or in software (computer program).
  • the methods described herein may be implemented, for example, using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Emergency Management (AREA)
  • Business, Economics & Management (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • General Health & Medical Sciences (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Emergency Alarm Devices (AREA)

Claims (15)

  1. Procédé (100) d'identification d'anomalies acoustiques, aux étapes suivantes consistant à:
    obtenir (113) un enregistrement de longue durée avec une pluralité de premiers segments audio (ABCD) associés à des premières fenêtres de temps respectives; où l'enregistrement de longue durée comporte au moins une durée de plus de 1 minute, ou d'au moins 10 minutes, ou d'au moins 1 heure, ou d'au moins 24 heures;
    analyser (114) la pluralité des premiers segments audio (ABCD) pour obtenir, pour chacun de la pluralité des premiers segments audio (ABCD), un premier vecteur de caractéristiques décrivant le premier segment audio (ABCD) respectif;
    obtenir (123) un autre enregistrement avec un ou plusieurs deuxièmes segments audio (ABCD) associés à des deuxièmes fenêtres de temps respectives;
    analyser (124) les un ou plusieurs deuxièmes segments audio (ABCD) pour obtenir un ou plusieurs vecteurs de caractéristiques décrivant les un ou plusieurs deuxièmes segments audio (ABCD);
    aligner (126) les un ou plusieurs deuxièmes vecteurs de caractéristiques sur la pluralité des premiers vecteurs de caractéristiques pour identifier au moins une anomalie en comparaison avec une situation acoustique normale pour cet environnement.
  2. Procédé (100) selon la revendication 1, dans lequel l'anomalie comporte une anomalie phonétique, temporelle et/ou spatiale; et/ou
    dans lequel l'anomalie comporte une anomalie phonétique en combinaison avec une anomalie temporelle, ou une anomalie phonétique en combinaison avec une anomalie spatiale, ou une anomalie temporelle en combinaison avec une anomalie spatiale.
  3. Procédé (100) selon la revendication 1 ou 2, dans lequel le procédé (100) comporte, lors de l'analyse, la sous-étape consistant à identifier un modèle de répétition dans la pluralité de premières fenêtres de temps; ou
    dans lequel le procédé (100) comporte, lors de l'analyse, la sous-étape consistant à identifier un modèle de répétition dans la pluralité des premières fenêtres de temps; dans lequel l'identification a lieu à l'aide de premiers vecteurs de caractéristiques qui se répètent, identiques ou similaires appartenant à différents premiers segments audio (ABCD).
  4. Procédé (100) selon la revendication 3, dans lequel a lieu, lors de l'identification, un regroupement de premiers vecteurs de caractéristiques identiques ou similaires en un ou plusieurs groupes; et/ou
    dans lequel le procédé (100) comporte le fait d'identifier une séquence de premiers vecteurs de caractéristiques appartenant à différents premiers segments audio (ABCD) ou d'identifier une séquence de groupes de premiers vecteurs de caractéristiques identiques ou similaires.
  5. Procédé (100) selon l'une des revendications 3 à 4, dans lequel le procédé (100) comporte le fait d'identifier un modèle de répétition dans les une ou plusieurs deuxièmes fenêtres temporelles; et/ou
    dans lequel le procédé (100) comporte le fait d'identifier une séquence de deuxièmes vecteurs de caractéristiques appartenant à différents deuxièmes segments audio (ABCD) ou d'identifier une séquence de groupes de deuxièmes vecteurs de caractéristiques identiques ou similaires.
  6. Procédé (100) selon la revendication 5, dans lequel le procédé (100) comporte la sous-étape consistant à aligner le modèle de répétition des premiers segments audio (ABCD) et/ou l'ordre pour les premiers segments audio (ABCD) sur le modèle de répétition des deuxièmes segments audio (ABCD) et/ou l'ordre pour les deuxièmes segments audio (ABCD), pour identifier une anomalie temporelle.
  7. Procédé (100) selon l'une des revendications précédentes, dans lequel l'alignement comporte la sous-étape consistant à identifier un deuxième vecteur de caractéristiques qui est différent des premiers vecteurs de caractéristiques analysés, pour identifier une anomalie phonétique.
  8. Procédé (100) selon l'une des revendications précédentes, dans lequel le vecteur de caractéristiques présente une dimension, plusieurs dimensions ou un espace dimensionnel réduit; et/ou dans lequel le procédé (100) comporte l'étape consistant à réduire les dimensions du vecteur de caractéristiques.
  9. Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les premiers segments audio (ABCD) respectifs; ou
    dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les premiers segments audio (ABCD) respectifs; dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les deuxièmes segments audio (ABCD) respectifs, et dans lequel le procédé (100) comporte la sous-étape consistant à aligner la position associée au premier segment audio (ABCD) respectif sur la position associée au deuxième segment audio (ABCD) respectif, pour identifier une anomalie spatiale.
  10. Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé (100) présente l'étape consistant à déterminer une probabilité d'occurrence du premier segment audio (ABCD) respectif et à sortir la probabilité d'occurrence avec le premier vecteur de caractéristiques respectif ou dans lequel le procédé (100) comporte l'étape consistant à déterminer une probabilité d'occurrence du premier segment audio (ABCD) respectif et à sortir la probabilité d'occurrence avec le premier vecteur de caractéristiques respectif et une première fenêtre de temps.
  11. Procédé selon l'une des revendications précédentes, dans lequel la pluralité des premiers segments audio et/ou la pluralité des premiers segments audio dans leur ordre décrivent un état acoustique normal dans le scénario d'application et/ou représentent une référence; et/ou
    dans lequel l'une anomalie est identifiée lorsqu'un ou plusieurs deuxièmes vecteurs de caractéristiques diffèrent de la pluralité des premiers vecteurs de caractéristiques.
  12. Procédé selon l'une des revendications précédentes, dans lequel l'autre enregistrement comporte une fenêtre de temps ou en particulier une fenêtre de temps de moins de 5 minutes, de moins de 1 minute ou de moins de 10 secondes.
  13. Programme d'ordinateur avec un code de programme qui, lorsqu'il est exécuté sur un ordinateur, réalise une ou plusieurs étapes du procédé (100) selon les revendications précédentes.
  14. Dispositif (10) d'identification d'anomalies acoustiques, aux caractéristiques suivantes:
    une interface (12) destinée à obtenir un enregistrement de longue durée (113) avec une pluralité de premiers segments audio (ABCD) associés à des premières fenêtres de temps respectives ainsi que pour obtenir un autre enregistrement (123) avec un ou plusieurs deuxièmes segments audio (ABCD) associés à des deuxièmes fenêtres de temps respectives; où l'enregistrement de longue durée comporte au moins une durée de plus de 1 minute, ou d'au moins 10 minutes, ou d'au moins 1 heure, ou d'au moins 24 heures;
    un processeur (14) qui est conçu pour analyser la pluralité des premiers segments audio (ABCD) pour obtenir, pour chacun de la pluralité de premiers segments audio (ABCD), un premier vecteur de caractéristiques décrivant le premier segment audio (ABCD) respectif, et qui est conçu, pour analyser les un ou plusieurs deuxièmes segments audio (ABCD), pour obtenir un ou plusieurs vecteurs de caractéristiques décrivant les un ou plusieurs deuxièmes segments audio (ABCD), et qui est conçu, pour aligner les un ou plusieurs deuxièmes vecteurs de caractéristiques sur la pluralité des premiers vecteurs caractéristiques, pour identifier au moins une anomalie en comparaison avec une situation acoustique normale pour cet environnement.
  15. Dispositif (10) selon la revendication 14, dans lequel le dispositif (10) comporte un microphone (18) ou un réseau de microphones qui est connecté à l'interface (12); et/ou
    dans lequel le dispositif (10) comporte une interface de sortie destinée à sortir une probabilité d'occurrence du premier segment audio (ABCD) respectif avec le premier vecteur de caractéristiques respectif ou à sortir une probabilité d'occurrence du premier segment audio (ABCD) respectif avec le premier vecteur de caractéristiques respectif et une première fenêtre temporelle.
EP21702020.5A 2020-01-27 2021-01-27 Procédé et dispositif d'identification d'anomalies acoustiques Active EP4097695B1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020200946.5A DE102020200946A1 (de) 2020-01-27 2020-01-27 Verfahren und Vorrichtung zur Erkennung von akustischen Anomalien
PCT/EP2021/051804 WO2021151915A1 (fr) 2020-01-27 2021-01-27 Procédé et dispositif d'identification d'anomalies acoustiques

Publications (2)

Publication Number Publication Date
EP4097695A1 EP4097695A1 (fr) 2022-12-07
EP4097695B1 true EP4097695B1 (fr) 2024-02-21

Family

ID=74285498

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21702020.5A Active EP4097695B1 (fr) 2020-01-27 2021-01-27 Procédé et dispositif d'identification d'anomalies acoustiques

Country Status (4)

Country Link
US (1) US20220358952A1 (fr)
EP (1) EP4097695B1 (fr)
DE (1) DE102020200946A1 (fr)
WO (1) WO2021151915A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220457A (zh) * 2021-10-29 2022-03-22 成都中科信息技术有限公司 双通道通信链路的音频数据处理方法、装置及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2944903B1 (fr) * 2009-04-24 2016-08-26 Thales Sa Systeme et methode pour detecter des evenements audio anormaux
DE102012211154B4 (de) * 2012-06-28 2019-02-14 Robert Bosch Gmbh Überwachungssystem, Freiflächenüberwachung sowie Verfahren zur Überwachung eines Überwachungsbereichs
FR2994495B1 (fr) * 2012-08-10 2015-08-21 Thales Sa Procede et systeme pour detecter des evenements sonores dans un environnement donne
DE102014012184B4 (de) * 2014-08-20 2018-03-08 HST High Soft Tech GmbH Vorrichtung und Verfahren zur automatischen Erkennung und Klassifizierung von akustischen Signalen in einem Überwachungsbereich
US10134422B2 (en) * 2015-12-01 2018-11-20 Qualcomm Incorporated Determining audio event based on location information
DE102017010402A1 (de) * 2017-11-09 2019-05-09 Guido Mennicken Automatisiertes Verfahren zur Überwachung von Waldgebieten auf Rodungsaktivitäten
DE102017012007B4 (de) 2017-12-22 2024-01-25 HST High Soft Tech GmbH Vorrichtung und Verfahren zur universellen akustischen Prüfung von Objekten
DE102018211758A1 (de) * 2018-05-07 2019-11-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung, verfahren und computerprogramm zur akustischen überwachung eines überwachungsbereichs

Also Published As

Publication number Publication date
DE102020200946A1 (de) 2021-07-29
WO2021151915A1 (fr) 2021-08-05
EP4097695A1 (fr) 2022-12-07
US20220358952A1 (en) 2022-11-10

Similar Documents

Publication Publication Date Title
WO2017001607A1 (fr) Procédé et dispositif pour créer une base de données
DE112020004052T5 (de) Sequenzmodelle zur audioszenenerkennung
DE102014012184A1 (de) Vorrichtung und Verfahren zur automatischen Erkennung und Klassifizierung von akustischen Signalen in einem Überwachungsbereich
EP4097695B1 (fr) Procédé et dispositif d'identification d'anomalies acoustiques
WO1995025316A1 (fr) Identification de personnes sur la base d'informations sur des mouvements
WO2020239540A1 (fr) Procédé et dispositif de détection de fumée
DE102018205561A1 (de) Vorrichtung zur Klassifizierung von Signalen
EP2483834B1 (fr) Methode et appareil pour la reconnaissance d'une detection fausse d'un objet dans un image
WO2022013045A1 (fr) Procédé de lecture automatique sur des lèvres au moyen d'un élément fonctionnel et de fourniture dudit élément fonctionnel
DE102020207449A1 (de) Verfahren, Computerprogramm und Vorrichtung zum Verarbeiten von Signalen
EP3493171A1 (fr) Détection du comportement agressif dans les moyens de transport publics
WO2022180218A1 (fr) Dispositif de traitement d'au moins un ensemble de données d'entrée au moyen d'un réseau neuronal et procédé
BE1029610A1 (de) Systeme und Verfahren zum Verbessern einer Performanz einer trainierbaren optischen Zeichenerkennung (OCR)
WO2021148392A1 (fr) Procédé et dispositif d'identification d'objet sur la base de données de capteur
DE102020208828A1 (de) Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems
EP2359308A1 (fr) Dispositif de production et/ou de traitement d'une signature d'objet, dispositif de contrôle, procédé et produit-programme
DE102019209228A1 (de) Verfahren und Vorrichtung zum Überprüfen der Robustheit eines künstlichen neuronalen Netzes
DE102019213697A1 (de) Verfahren zum Erkennen einer Annäherung und/oder Entfernung eines Einsatzfahrzeugs relativ zu einem Fahrzeug
DE102018201914A1 (de) Verfahren zum Anlernen eines Modells zur Personen-Wiedererkennung unter Verwendung von Bildern einer Kamera und Verfahren zum Erkennen von Personen aus einem angelernten Modell zur Personen-Wiedererkennung durch eine zweite Kamera eines Kameranetzwerkes
WO2018019480A1 (fr) Système de surveillance d'un parc de stationnement pour véhicules à moteur
EP3759644B1 (fr) Identification de sièges inoccupés sur la base de la détection d'une texture répétée
DE112013004687T5 (de) System und Verfahren zum Verarbeiten von Ereignissen in einer Umgebung
DE102019209153A1 (de) Verfahren und Vorrichtung zum sicheren Klassifizieren und/oder Segmentieren von Bildern
DE112022001291T5 (de) Aufzeichnen eines aus einem gemisch von tonströmen getrennten tons auf einer persönlichen einheit
DE102020211714A1 (de) Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220724

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20230913

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20231212

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 502021002760

Country of ref document: DE

Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANG, DE

Free format text: FORMER OWNER: ANMELDERANGABEN UNKLAR / UNVOLLSTAENDIG, 80297 MUENCHEN, DE

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 502021002760

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: GERMAN