EP4097695B1 - Procédé et dispositif d'identification d'anomalies acoustiques - Google Patents
Procédé et dispositif d'identification d'anomalies acoustiques Download PDFInfo
- Publication number
- EP4097695B1 EP4097695B1 EP21702020.5A EP21702020A EP4097695B1 EP 4097695 B1 EP4097695 B1 EP 4097695B1 EP 21702020 A EP21702020 A EP 21702020A EP 4097695 B1 EP4097695 B1 EP 4097695B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- abcd
- audio segments
- anomaly
- audio
- accordance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 75
- 239000013598 vector Substances 0.000 claims description 76
- 230000007774 longterm Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 15
- 230000002123 temporal effect Effects 0.000 claims description 14
- 238000001514 detection method Methods 0.000 description 19
- 238000004458 analytical method Methods 0.000 description 10
- 238000012544 monitoring process Methods 0.000 description 8
- 238000001228 spectrum Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 230000004807 localization Effects 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000750 constant-initial-state spectroscopy Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 231100001261 hazardous Toxicity 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012109 statistical procedure Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B13/00—Burglar, theft or intruder alarms
- G08B13/02—Mechanical actuation
- G08B13/04—Mechanical actuation by breaking of glass
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B13/00—Burglar, theft or intruder alarms
- G08B13/16—Actuation by interference with mechanical vibrations in air or other fluid
- G08B13/1654—Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems
- G08B13/1672—Actuation by interference with mechanical vibrations in air or other fluid using passive vibration detection systems using sonic detecting means, e.g. a microphone operating in the audio frequency range
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/02—Alarms for ensuring the safety of persons
- G08B21/04—Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
- G08B21/0438—Sensor means for detecting
- G08B21/0469—Presence detectors to detect unsafe condition, e.g. infrared sensor, microphone
Definitions
- Embodiments of the present invention relate to a method and a device for detecting acoustic anomalies. Further exemplary embodiments relate to a corresponding computer program. According to exemplary embodiments, the detection of a normal situation and the detection of anomalies in comparison to this normal situation take place.
- a complex overlay of several sound sources In real acoustic scenes there is usually a complex overlay of several sound sources. These can be positioned in the foreground or background and in any spatial position.
- a variety of possible sounds are also conceivable, which can range from very short transient signals (e.g. clapping, gunshot) to longer, stationary sounds (siren, passing train).
- a recording typically covers a specific period of time, which is divided into one or more time windows when viewed subsequently. Based on this subdivision and depending on the length of the noise (cf. transient or longer, stationary sound), a sound can extend over one or more audio segments/time windows.
- an anomaly i.e. a sound deviation from the "acoustic normal state", i.e. the amount of noise that is considered “normal"
- anomalies are breaking glass (burglary detection), a pistol shot (monitoring public events) or a chainsaw (monitoring nature reserves).
- the problem is that the sound of the anomaly (out-of-order class) is often not known or cannot be precisely defined or described (e.g. what can a broken machine sound like?).
- the second problem is that novel algorithms for sound classification using deep neural networks are very sensitive to changing (and often unknown) acoustic conditions in the operational scenario.
- Classification models that are trained with audio data, for example with a high-quality microphone, achieve this recorded, only poor recognition rates when classifying audio data that was recorded using a poorer microphone.
- Possible solutions lie in the area of “domain adaptation”, i.e. adapting the models or the audio data to be classified in order to achieve greater robustness in recognition. In practice, however, it is often logistically difficult and too expensive to record representative audio recordings at the later location of use of an audio analysis system and then annotate them with regard to the sound events they contain.
- the third problem with audio analysis of environmental noise lies in data protection concerns, since classification methods can theoretically also be used to recognize and transcribe speech signals (e.g. when recording a conversation near the audio sensor).
- classification models of existing state-of-the-art solutions are as follows: If the sound anomaly to be detected can be precisely specified, a classification model based on machine learning algorithms can be trained to recognize specific noise classes using supervised learning. Current studies show that neural networks in particular are very sensitive to changing acoustic conditions and that additional adaptation of classification models to the respective acoustic situation of the application must be carried out.
- EP 2 988 105 A2 which describes a device of the method for the automatic detection and classification of audible acoustic signals in a surveillance area.
- the object of the present invention is to create a concept for detecting anomalies that is optimized with regard to the learning behavior and that enables reliable and accurate detection of anomalies.
- Embodiments of the present invention provide a method for detecting acoustic anomalies.
- the method includes the steps of obtaining a long-term recording, which has a duration of at least greater than 1 minute or at least 10 minutes or at least 1 hour or at least 24 hours, with a plurality of first audio segments assigned to respective first time windows and analyzing the plurality of first audio segments to get to each of the A plurality of first audio segments have a first feature vector describing the respective first audio segment, such as. B. to obtain a spectrum for the audio segment (time-frequency spectrum) or an audio fingerprint with certain characteristics for the audio segment.
- the result of the analysis of a long-term recording divided into a large number of time windows is a large number of first (single- or multi-dimensional) feature vectors for the large number of first audio segments (assigned to the corresponding times/windows of the long-term recording), which represent the "normal state".
- the method includes further steps of obtaining another recording with one or more second audio segments associated with respective second audio windows and analyzing the one or more second audio segments to obtain one or more feature vectors describing the one or more second audio segments.
- the result of the second part of the method is, for example, a large number of second feature vectors (e.g. with corresponding times of further recording).
- the one or more second feature vectors are compared with the plurality of first feature vectors (e.g. by comparing the identities or similarities or by recognizing a sequence) in order to detect at least one anomaly.
- a sound anomaly i.e. a recognition of the first appearance of a previously unheard sound
- a temporal anomaly e.g. changed repetition pattern of a sound that has already been heard
- a spatial anomaly Occurrence of a sound that has already been heard in a previously unknown spatial position.
- Embodiments of the present invention are based on the knowledge that an “acoustic normal state” and “normal noises” can be learned independently simply through a long-term sound analysis (phase 1 of the method comprising the steps of obtaining a long-term recording and analyzing it). This means that this long-term sound analysis results in an independent or autonomous adaptation of an analysis system to a specific acoustic scene. No annotated training data (recording + semantic class annotation) is required, which represents a great saving in time, effort and cost. When this acoustic "normal state” or the "normal” noises are recorded, the current noise environment can be carried out in a subsequent analysis phase (phase 2 with the steps of obtaining another recording and analyzing it).
- phase 1 involves learning a model using the normal background noise based on a statistical procedure or machine learning, whereby this model then allows (in phase 2) to compare currently recorded background noise with regard to its degree of novelty (probability of an anomaly).
- Another advantage of this approach is that the privacy of people who may be in the immediate vicinity of the acoustic sensors is protected. This is called privacy by design. Due to the system, speech recognition is not possible because the interface (audio in, anomaly probability function out) is clearly defined. This can dispel possible data protection concerns when using acoustic sensors.
- the long-term recording represents the normal acoustic situation
- the multitude of first audio segments describes this normal situation in themselves and/or in their order.
- the large number of first audio segments represent a kind of reference on their own and/or in their combination.
- the aim of the method is to detect anomalies in comparison to this normal situation.
- the result of the clustering described above is a description of the reference based on first audio segments.
- the second audio segments alone or in their combination are then compared with the reference to represent the anomaly.
- the anomaly is a deviation of the current acoustic situation described by the second feature vectors from the reference described by the first feature vectors.
- the first feature vectors alone or in their combination represent a reference image of the normal state
- the second feature vectors alone or in their combination describe the current acoustic situation
- the anomaly in Form of a deviation of the description of the current acoustic situation (see second feature vectors) from the reference (see first feature vectors) can be recognized.
- the anomaly is therefore defined by the fact that at least one of the second acoustic feature vectors deviates from the sequence of the first acoustic feature vectors. Possible deviations can be: sound anomalies, temporal anomalies and spatial anomalies.
- phase 1 captures a large number of first audio segments, which are also referred to below as “normal” or “normal” considered noises/audio segments. According to exemplary embodiments knowing these "normal” audio segments makes it possible to detect a so-called sound anomaly.
- the sub-step of identifying a second feature vector that differs from the analyzed first feature vectors is then carried out.
- the method when analyzing, includes the substep of identifying a repeat pattern in the plurality of first time windows. This involves identifying repeating audio segments and determining the resulting pattern. According to exemplary embodiments, identification is carried out using repeating, identical or similar first feature vectors belonging to different first audio segments. According to exemplary embodiments, identical and similar first feature vectors or first audio segments can also be grouped into one or more groups during identification.
- the method includes recognizing a sequence of first feature vectors associated with the first audio segments or recognizing a sequence of groups of identical or similar first feature vectors or first audio segments.
- the basic steps make it advantageously possible to recognize normal noises or to recognize normal audio objects.
- the combination of these normal audio objects in a certain order or a certain repetition pattern in terms of time then represents a normal acoustic state.
- this method then enables the sub-step of comparing the repetition pattern of the first audio segments and/or order in the first audio segments with the repetition pattern of the second audio segments and/or order in the second audio segments to be carried out during matching. This comparison enables the detection of a temporal anomaly.
- the method may include the step of determining a respective position for the respective first audio segments.
- the respective position can also be determined for the respective second audio segments are made. According to one exemplary embodiment, this then enables the detection of a spatial anomaly to be carried out by the substep of comparing the position assigned to the respective first audio segments with the position assigned to the corresponding respective second audio segment.
- each feature vector can each have one dimension or several dimensions for the different audio segments.
- a possible realization of a feature vector would be, for example, a time-frequency spectrum.
- the dimensional space can also be reduced.
- the method includes the step of reducing the dimensions of the feature vector.
- the method can have the step of determining a probability of occurrence of the respective first audio segment and of entering the probability of occurrence together with the respective first feature vector.
- the method can have the step of determining a probability of occurrence of the respective first audio segment and outputting the probability of occurrence with the respective first feature vector and an associated first time window.
- the probability of occurrence for the respective audio segment or a more precise probability of the occurrence of the audio segment at this point in time is output. The output takes place with the corresponding data record or feature vector.
- the method can also be implemented using a computer.
- the method has a computer program with a program code for carrying out the method.
- FIG. 1 For exemplary embodiments, relate to a device with an interface and a processor.
- the interface serves to obtain a long-term recording with a plurality of first audio segments assigned to respective first time windows and to obtain a further recording with one or more second audio segments assigned to respective second time windows.
- the processor is designed to handle the multitude to analyze the first audio segments in order to obtain a first feature vector describing the respective first audio segment for each of the plurality of first audio segments.
- the processor is designed to analyze the one or more second audio segments in order to obtain one or more feature vectors describing the one or more second audio segments.
- the processor is designed to match the one or more second feature vectors with the plurality of first feature vectors in order to detect at least one anomaly.
- the device comprises a recording unit connected to the interface, such as. B. a microphone or a microphone array.
- the microphone array advantageously enables position determination, as already explained above.
- the device comprises an output interface for outputting the probability of occurrence explained above.
- Fig. 1 shows a method 100, which is divided into two phases 110 and 120.
- Step 112 includes a long-term recording the acoustic normal state in the application scenario.
- the analysis device 10 placed in the target environment so that a long-term recording 113 of the normal state is captured.
- this long-term recording can last for at least 1 minute or at least 10 minutes or at least 1 hour or at least 24 hours.
- This long-term recording 113 is then broken down, for example.
- the breakdown can be divided into time periods of equal length, such as: B. 1 second or 0.1 seconds or dynamic time ranges.
- Each time range includes an audio segment.
- step 114 commonly referred to as analyzing, these audio segments are examined separately or in combination.
- a so-called feature vector 115 (first feature vectors) is determined for each audio segment during analysis.
- Feature vectors 115 can be determined, for example, by an energy spectrum for a specific frequency range or generally a time-frequency spectrum.
- step 114 typical or dominant noises can then optionally be identified using unsupervised learning methods (e.g. clustering).
- unsupervised learning methods e.g. clustering
- time periods or audio segments are grouped that have similar feature vectors 115 and that accordingly have a similar sound.
- No semantic classification of a sound e.g. “car” or “plane”
- unsupervised learning takes place based on the frequencies of repeating or similar audio segments.
- an unsupervised learning of the temporal order and/or typical repetition patterns of certain noises takes place.
- the result of clustering is a compilation of audio segments or noises that are normal or typical for this area.
- each audio segment can also be assigned a probability of occurrence.
- Repeat patterns or a sequence i.e. a combination of several audio segments, are identified that are typical or normal for the current environment.
- different audio segments can also be assigned a probability to each grouping, each repeat pattern or each sequence.
- Phase 120 has the three basic steps 122 and 124 and 126.
- an audio recording 123 is again recorded. This is typically significantly shorter compared to the audio recording 113. For example, this audio recording is shorter compared to audio recording 113. However, it can also be a continuous audio recording.
- This audio recording 123 is then analyzed in a subsequent step 124. This step is comparable in content to step 114. This in turn involves converting the digital audio recording 123 into feature vectors. If these second feature vectors 125 are now available, they can be compared with the feature vectors 115.
- a probability for each of the three anomaly types can be output at time X. This is with the arrows 126z, 126k and 126r (one arrow for each type of anatomy). Fig. 3 illustrated.
- threshold values can be defined as to when feature vectors are similar or when groups of feature vectors are similar, so that the result then also presents a threshold value for an anomaly.
- This threshold application can also be linked to the output of the probability distribution or appear in this in combination, e.g. B. to enable more accurate temporal detection of anomalies.
- step 114 in adjustment phase 110 can also include unsupervised learning of typical spatial positions and/or movements of certain noises.
- Microphone 18 shown instead of the in Fig. 3
- Microphone 18 shown has two microphones or a microphone array with at least two microphones.
- spatial localization of the current dominant sound sources/audio segments is then possible in the second phase 120 through multi-channel recording.
- the underlying technology here can be, for example, beamforming.
- Fig. 2a illustrates the temporal anomaly.
- audio segments ABC for both phase 1 and phase 2 are plotted along the time axis t.
- phase 1 it was recognized that a normal situation or normal order exists such that the audio segments ABC appear in the order ABC.
- a repeat pattern was recognized that can be followed by another group ABC after the first group ABC.
- this pattern ABCABC is recognized in phase 2, it can be assumed that there is no anomaly or at least no temporal anomaly. However, if the pattern ABCAABC shown here is recognized, there is a temporal anomaly because another audio segment A is arranged between the two groups ABC.
- This audio segment A or anomalous audio segment A is provided with a double frame.
- a sonic anomaly is illustrated.
- the audio segments ABCABC were again recorded along the time axis t (cf. Fig. 2a ).
- the sound anomaly during recognition is shown by the fact that another audio segment, here audio segment D, appears in phase 2.
- This audio segment D has an increased length, e.g. B. over two time ranges and is therefore illustrated as DD.
- the sound anomaly is double-framed in the audio segment species order. This sound anomaly could, for example, be a sound that was never heard during the learning phase. For example, there may be thunder that differs from the previous elements ABC in terms of loudness/intensity and length.
- a local anomaly is illustrated.
- two audio segments A and B were detected at two different positions, position 1 and position 2.
- both elements A and B were recognized, with localization determining that both audio segment A and audio segment B were at positions 1.
- the presence of audio segment B at position 1 represents a spatial anomaly.
- the device 10 essentially includes the input interface 12, such as. B. a microphone interface and a processor 14.
- the processor 14 receives the one or more (simultaneously present) audio signals from the microphone 18 or the microphone array 18 'and analyzes them. To this end, he essentially leads in connection with Fig. 1 Steps 114, 124 and 126 explained.
- the result to be output (cf. output interface 16) is a set of feature vectors that represent the normal state or, in phase 2, an output of the detected anomalies, e.g. B. assigned to a specific type and/or assigned to a specific time.
- the interface 16 can refer to a probability of anomalies or a probability of anomalies at certain times or, in general, a probability of feature vectors at certain times.
- aspects have been described in connection with a device, it is understood that these aspects also represent a description of the corresponding method, so that a block or a component of a device is also to be understood as a corresponding method step or as a feature of a method step. Similarly, aspects described in connection with or as a method step also represent a description of a corresponding block or detail or feature of a corresponding device.
- Some or all of the method steps may be performed by a hardware apparatus (or using a hardware device). Apparatus), such as a microprocessor, a programmable computer or an electronic circuit can be carried out. In some embodiments, some or more of the key process steps may be performed by such apparatus.
- embodiments of the invention may be implemented in hardware or in software.
- the implementation may be using a digital storage medium such as a floppy disk, a DVD, a Blu-ray Disc, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, a hard drive or other magnetic or optical memory are carried out on which electronically readable control signals are stored, which can interact or interact with a programmable computer system in such a way that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
- Some embodiments according to the invention thus include a data carrier that has electronically readable control signals that are capable of interacting with a programmable computer system such that one of the methods described herein is carried out.
- embodiments of the present invention may be implemented as a computer program product with a program code, the program code being effective to perform one of the methods when the computer program product runs on a computer.
- the program code can, for example, also be stored on a machine-readable medium.
- inventions include the computer program for performing one of the methods described herein, the computer program being stored on a machine-readable medium.
- an exemplary embodiment of the method according to the invention is therefore a computer program that has a program code for carrying out one of the methods described herein when the computer program runs on a computer.
- a further exemplary embodiment of the method according to the invention is therefore a data carrier (or a digital storage medium or a computer-readable medium) on which the computer program for carrying out one of the methods described herein is recorded.
- the data carrier, digital storage medium or computer-readable medium is typically tangible and/or non-transitory.
- a further exemplary embodiment of the method according to the invention is therefore a data stream or a sequence of signals which represents the computer program for carrying out one of the methods described herein.
- the data stream or the sequence of signals can, for example, be configured to be transferred via a data communication connection, for example via the Internet.
- Another embodiment includes a processing device, such as a computer or a programmable logic device, configured or adapted to perform one of the methods described herein.
- a processing device such as a computer or a programmable logic device, configured or adapted to perform one of the methods described herein.
- Another embodiment includes a computer on which the computer program for performing one of the methods described herein is installed.
- a further embodiment according to the invention includes a device or system designed to transmit a computer program to a receiver for carrying out at least one of the methods described herein.
- the transmission can take place electronically or optically, for example.
- the recipient may be, for example, a computer, a mobile device, a storage device or a similar device.
- the device or system can, for example, comprise a file server for transmitting the computer program to the recipient.
- a programmable logic device e.g., a field programmable gate array, an FPGA
- a field programmable gate array may cooperate with a microprocessor to perform any of the methods described herein.
- the methods in some embodiments are carried out by one any hardware device. This can be universally applicable hardware such as a computer processor (CPU) or hardware specific to the method, such as an ASIC.
- the devices described herein may be implemented, for example, using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
- the devices described herein, or any components of the devices described herein may be at least partially implemented in hardware and/or in software (computer program).
- the methods described herein may be implemented, for example, using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Emergency Management (AREA)
- Business, Economics & Management (AREA)
- Gerontology & Geriatric Medicine (AREA)
- General Health & Medical Sciences (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Emergency Alarm Devices (AREA)
Claims (15)
- Procédé (100) d'identification d'anomalies acoustiques, aux étapes suivantes consistant à:obtenir (113) un enregistrement de longue durée avec une pluralité de premiers segments audio (ABCD) associés à des premières fenêtres de temps respectives; où l'enregistrement de longue durée comporte au moins une durée de plus de 1 minute, ou d'au moins 10 minutes, ou d'au moins 1 heure, ou d'au moins 24 heures;analyser (114) la pluralité des premiers segments audio (ABCD) pour obtenir, pour chacun de la pluralité des premiers segments audio (ABCD), un premier vecteur de caractéristiques décrivant le premier segment audio (ABCD) respectif;obtenir (123) un autre enregistrement avec un ou plusieurs deuxièmes segments audio (ABCD) associés à des deuxièmes fenêtres de temps respectives;analyser (124) les un ou plusieurs deuxièmes segments audio (ABCD) pour obtenir un ou plusieurs vecteurs de caractéristiques décrivant les un ou plusieurs deuxièmes segments audio (ABCD);aligner (126) les un ou plusieurs deuxièmes vecteurs de caractéristiques sur la pluralité des premiers vecteurs de caractéristiques pour identifier au moins une anomalie en comparaison avec une situation acoustique normale pour cet environnement.
- Procédé (100) selon la revendication 1, dans lequel l'anomalie comporte une anomalie phonétique, temporelle et/ou spatiale; et/ou
dans lequel l'anomalie comporte une anomalie phonétique en combinaison avec une anomalie temporelle, ou une anomalie phonétique en combinaison avec une anomalie spatiale, ou une anomalie temporelle en combinaison avec une anomalie spatiale. - Procédé (100) selon la revendication 1 ou 2, dans lequel le procédé (100) comporte, lors de l'analyse, la sous-étape consistant à identifier un modèle de répétition dans la pluralité de premières fenêtres de temps; ou
dans lequel le procédé (100) comporte, lors de l'analyse, la sous-étape consistant à identifier un modèle de répétition dans la pluralité des premières fenêtres de temps; dans lequel l'identification a lieu à l'aide de premiers vecteurs de caractéristiques qui se répètent, identiques ou similaires appartenant à différents premiers segments audio (ABCD). - Procédé (100) selon la revendication 3, dans lequel a lieu, lors de l'identification, un regroupement de premiers vecteurs de caractéristiques identiques ou similaires en un ou plusieurs groupes; et/ou
dans lequel le procédé (100) comporte le fait d'identifier une séquence de premiers vecteurs de caractéristiques appartenant à différents premiers segments audio (ABCD) ou d'identifier une séquence de groupes de premiers vecteurs de caractéristiques identiques ou similaires. - Procédé (100) selon l'une des revendications 3 à 4, dans lequel le procédé (100) comporte le fait d'identifier un modèle de répétition dans les une ou plusieurs deuxièmes fenêtres temporelles; et/ou
dans lequel le procédé (100) comporte le fait d'identifier une séquence de deuxièmes vecteurs de caractéristiques appartenant à différents deuxièmes segments audio (ABCD) ou d'identifier une séquence de groupes de deuxièmes vecteurs de caractéristiques identiques ou similaires. - Procédé (100) selon la revendication 5, dans lequel le procédé (100) comporte la sous-étape consistant à aligner le modèle de répétition des premiers segments audio (ABCD) et/ou l'ordre pour les premiers segments audio (ABCD) sur le modèle de répétition des deuxièmes segments audio (ABCD) et/ou l'ordre pour les deuxièmes segments audio (ABCD), pour identifier une anomalie temporelle.
- Procédé (100) selon l'une des revendications précédentes, dans lequel l'alignement comporte la sous-étape consistant à identifier un deuxième vecteur de caractéristiques qui est différent des premiers vecteurs de caractéristiques analysés, pour identifier une anomalie phonétique.
- Procédé (100) selon l'une des revendications précédentes, dans lequel le vecteur de caractéristiques présente une dimension, plusieurs dimensions ou un espace dimensionnel réduit; et/ou dans lequel le procédé (100) comporte l'étape consistant à réduire les dimensions du vecteur de caractéristiques.
- Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les premiers segments audio (ABCD) respectifs; ou
dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les premiers segments audio (ABCD) respectifs; dans lequel le procédé (100) comporte l'étape consistant à déterminer une position respective pour les deuxièmes segments audio (ABCD) respectifs, et dans lequel le procédé (100) comporte la sous-étape consistant à aligner la position associée au premier segment audio (ABCD) respectif sur la position associée au deuxième segment audio (ABCD) respectif, pour identifier une anomalie spatiale. - Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé (100) présente l'étape consistant à déterminer une probabilité d'occurrence du premier segment audio (ABCD) respectif et à sortir la probabilité d'occurrence avec le premier vecteur de caractéristiques respectif ou dans lequel le procédé (100) comporte l'étape consistant à déterminer une probabilité d'occurrence du premier segment audio (ABCD) respectif et à sortir la probabilité d'occurrence avec le premier vecteur de caractéristiques respectif et une première fenêtre de temps.
- Procédé selon l'une des revendications précédentes, dans lequel la pluralité des premiers segments audio et/ou la pluralité des premiers segments audio dans leur ordre décrivent un état acoustique normal dans le scénario d'application et/ou représentent une référence; et/ou
dans lequel l'une anomalie est identifiée lorsqu'un ou plusieurs deuxièmes vecteurs de caractéristiques diffèrent de la pluralité des premiers vecteurs de caractéristiques. - Procédé selon l'une des revendications précédentes, dans lequel l'autre enregistrement comporte une fenêtre de temps ou en particulier une fenêtre de temps de moins de 5 minutes, de moins de 1 minute ou de moins de 10 secondes.
- Programme d'ordinateur avec un code de programme qui, lorsqu'il est exécuté sur un ordinateur, réalise une ou plusieurs étapes du procédé (100) selon les revendications précédentes.
- Dispositif (10) d'identification d'anomalies acoustiques, aux caractéristiques suivantes:une interface (12) destinée à obtenir un enregistrement de longue durée (113) avec une pluralité de premiers segments audio (ABCD) associés à des premières fenêtres de temps respectives ainsi que pour obtenir un autre enregistrement (123) avec un ou plusieurs deuxièmes segments audio (ABCD) associés à des deuxièmes fenêtres de temps respectives; où l'enregistrement de longue durée comporte au moins une durée de plus de 1 minute, ou d'au moins 10 minutes, ou d'au moins 1 heure, ou d'au moins 24 heures;un processeur (14) qui est conçu pour analyser la pluralité des premiers segments audio (ABCD) pour obtenir, pour chacun de la pluralité de premiers segments audio (ABCD), un premier vecteur de caractéristiques décrivant le premier segment audio (ABCD) respectif, et qui est conçu, pour analyser les un ou plusieurs deuxièmes segments audio (ABCD), pour obtenir un ou plusieurs vecteurs de caractéristiques décrivant les un ou plusieurs deuxièmes segments audio (ABCD), et qui est conçu, pour aligner les un ou plusieurs deuxièmes vecteurs de caractéristiques sur la pluralité des premiers vecteurs caractéristiques, pour identifier au moins une anomalie en comparaison avec une situation acoustique normale pour cet environnement.
- Dispositif (10) selon la revendication 14, dans lequel le dispositif (10) comporte un microphone (18) ou un réseau de microphones qui est connecté à l'interface (12); et/ou
dans lequel le dispositif (10) comporte une interface de sortie destinée à sortir une probabilité d'occurrence du premier segment audio (ABCD) respectif avec le premier vecteur de caractéristiques respectif ou à sortir une probabilité d'occurrence du premier segment audio (ABCD) respectif avec le premier vecteur de caractéristiques respectif et une première fenêtre temporelle.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102020200946.5A DE102020200946A1 (de) | 2020-01-27 | 2020-01-27 | Verfahren und Vorrichtung zur Erkennung von akustischen Anomalien |
PCT/EP2021/051804 WO2021151915A1 (fr) | 2020-01-27 | 2021-01-27 | Procédé et dispositif d'identification d'anomalies acoustiques |
Publications (2)
Publication Number | Publication Date |
---|---|
EP4097695A1 EP4097695A1 (fr) | 2022-12-07 |
EP4097695B1 true EP4097695B1 (fr) | 2024-02-21 |
Family
ID=74285498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21702020.5A Active EP4097695B1 (fr) | 2020-01-27 | 2021-01-27 | Procédé et dispositif d'identification d'anomalies acoustiques |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220358952A1 (fr) |
EP (1) | EP4097695B1 (fr) |
DE (1) | DE102020200946A1 (fr) |
WO (1) | WO2021151915A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220457A (zh) * | 2021-10-29 | 2022-03-22 | 成都中科信息技术有限公司 | 双通道通信链路的音频数据处理方法、装置及存储介质 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2944903B1 (fr) * | 2009-04-24 | 2016-08-26 | Thales Sa | Systeme et methode pour detecter des evenements audio anormaux |
DE102012211154B4 (de) * | 2012-06-28 | 2019-02-14 | Robert Bosch Gmbh | Überwachungssystem, Freiflächenüberwachung sowie Verfahren zur Überwachung eines Überwachungsbereichs |
FR2994495B1 (fr) * | 2012-08-10 | 2015-08-21 | Thales Sa | Procede et systeme pour detecter des evenements sonores dans un environnement donne |
DE102014012184B4 (de) * | 2014-08-20 | 2018-03-08 | HST High Soft Tech GmbH | Vorrichtung und Verfahren zur automatischen Erkennung und Klassifizierung von akustischen Signalen in einem Überwachungsbereich |
US10134422B2 (en) * | 2015-12-01 | 2018-11-20 | Qualcomm Incorporated | Determining audio event based on location information |
DE102017010402A1 (de) * | 2017-11-09 | 2019-05-09 | Guido Mennicken | Automatisiertes Verfahren zur Überwachung von Waldgebieten auf Rodungsaktivitäten |
DE102017012007B4 (de) | 2017-12-22 | 2024-01-25 | HST High Soft Tech GmbH | Vorrichtung und Verfahren zur universellen akustischen Prüfung von Objekten |
DE102018211758A1 (de) * | 2018-05-07 | 2019-11-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung, verfahren und computerprogramm zur akustischen überwachung eines überwachungsbereichs |
-
2020
- 2020-01-27 DE DE102020200946.5A patent/DE102020200946A1/de active Pending
-
2021
- 2021-01-27 WO PCT/EP2021/051804 patent/WO2021151915A1/fr active Search and Examination
- 2021-01-27 EP EP21702020.5A patent/EP4097695B1/fr active Active
-
2022
- 2022-07-26 US US17/874,072 patent/US20220358952A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
DE102020200946A1 (de) | 2021-07-29 |
WO2021151915A1 (fr) | 2021-08-05 |
EP4097695A1 (fr) | 2022-12-07 |
US20220358952A1 (en) | 2022-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017001607A1 (fr) | Procédé et dispositif pour créer une base de données | |
DE112020004052T5 (de) | Sequenzmodelle zur audioszenenerkennung | |
DE102014012184A1 (de) | Vorrichtung und Verfahren zur automatischen Erkennung und Klassifizierung von akustischen Signalen in einem Überwachungsbereich | |
EP4097695B1 (fr) | Procédé et dispositif d'identification d'anomalies acoustiques | |
WO1995025316A1 (fr) | Identification de personnes sur la base d'informations sur des mouvements | |
WO2020239540A1 (fr) | Procédé et dispositif de détection de fumée | |
DE102018205561A1 (de) | Vorrichtung zur Klassifizierung von Signalen | |
EP2483834B1 (fr) | Methode et appareil pour la reconnaissance d'une detection fausse d'un objet dans un image | |
WO2022013045A1 (fr) | Procédé de lecture automatique sur des lèvres au moyen d'un élément fonctionnel et de fourniture dudit élément fonctionnel | |
DE102020207449A1 (de) | Verfahren, Computerprogramm und Vorrichtung zum Verarbeiten von Signalen | |
EP3493171A1 (fr) | Détection du comportement agressif dans les moyens de transport publics | |
WO2022180218A1 (fr) | Dispositif de traitement d'au moins un ensemble de données d'entrée au moyen d'un réseau neuronal et procédé | |
BE1029610A1 (de) | Systeme und Verfahren zum Verbessern einer Performanz einer trainierbaren optischen Zeichenerkennung (OCR) | |
WO2021148392A1 (fr) | Procédé et dispositif d'identification d'objet sur la base de données de capteur | |
DE102020208828A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems | |
EP2359308A1 (fr) | Dispositif de production et/ou de traitement d'une signature d'objet, dispositif de contrôle, procédé et produit-programme | |
DE102019209228A1 (de) | Verfahren und Vorrichtung zum Überprüfen der Robustheit eines künstlichen neuronalen Netzes | |
DE102019213697A1 (de) | Verfahren zum Erkennen einer Annäherung und/oder Entfernung eines Einsatzfahrzeugs relativ zu einem Fahrzeug | |
DE102018201914A1 (de) | Verfahren zum Anlernen eines Modells zur Personen-Wiedererkennung unter Verwendung von Bildern einer Kamera und Verfahren zum Erkennen von Personen aus einem angelernten Modell zur Personen-Wiedererkennung durch eine zweite Kamera eines Kameranetzwerkes | |
WO2018019480A1 (fr) | Système de surveillance d'un parc de stationnement pour véhicules à moteur | |
EP3759644B1 (fr) | Identification de sièges inoccupés sur la base de la détection d'une texture répétée | |
DE112013004687T5 (de) | System und Verfahren zum Verarbeiten von Ereignissen in einer Umgebung | |
DE102019209153A1 (de) | Verfahren und Vorrichtung zum sicheren Klassifizieren und/oder Segmentieren von Bildern | |
DE112022001291T5 (de) | Aufzeichnen eines aus einem gemisch von tonströmen getrennten tons auf einer persönlichen einheit | |
DE102020211714A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220724 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20230913 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20231212 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 502021002760 Country of ref document: DE Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANG, DE Free format text: FORMER OWNER: ANMELDERANGABEN UNKLAR / UNVOLLSTAENDIG, 80297 MUENCHEN, DE |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 502021002760 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: LANGUAGE OF EP DOCUMENT: GERMAN |