WO2010122056A2 - Systeme et methode pour detecter des evenements audio anormaux - Google Patents
Systeme et methode pour detecter des evenements audio anormaux Download PDFInfo
- Publication number
- WO2010122056A2 WO2010122056A2 PCT/EP2010/055266 EP2010055266W WO2010122056A2 WO 2010122056 A2 WO2010122056 A2 WO 2010122056A2 EP 2010055266 W EP2010055266 W EP 2010055266W WO 2010122056 A2 WO2010122056 A2 WO 2010122056A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- segment
- audio
- segments
- classes
- vector
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
Definitions
- the object of the present invention is a system and method for detecting sound events that are considered abnormal with respect to a typical sound environment.
- the invention applies, in particular, in monitoring applications of areas, places or spaces.
- the prior art distinguishes two processes. The first is a detection process. The second is a process of classifying detected events.
- the major drawbacks of the supervised approach stem from the need to have to specify the abnormal events beforehand, and to collect a sufficient and statistically representative quantity of these events.
- the specification of events is not always possible, and the collection of a sufficient number of achievements to enrich a database, either. It is also necessary, for each new configuration, to conduct a new supervised learning.
- the supervision task requires human intervention (manual or semi-automatic segmentation, labeling, etc.).
- the flexibility of these solutions is therefore limited in terms of use, and taking into account new environments is difficult to implement.
- learning event models takes into account the background noise and its variability, in fact, it can in some cases not be robust.
- the idea of the invention rests, in particular, on a new approach in which the method uses an automated learning step, that is to say which does not require in normal operation of human intervention, the different elements, sensors or other devices constituting the self-sufficient system to model an environment.
- the invention relates to a method for detecting abnormal audio events in a given environment, characterized in that it comprises at least the following steps:
- a phase of use which comprises the analysis of an audio stream, with extraction of the acoustic parameters, an automatic segmentation step of said analyzed flow identical to that used during the learning phase and a step during which the likelihood of each statistical model contained in the database is determined on each of the segments of the analyzed audio stream, • said likelihood determination step leading to a likelihood value ⁇ corresponding to the most probable, maximum likelihood model, which is compared a threshold value in order to trigger or not a presence signal or absence of audio anomalies in the audio stream analyzed.
- the modeling step is, for example, a statistical modeling of the segment classes which consists in modeling the probability density of the set of acoustic parameters of each segment class using a mixed Gaussian or GMM mixture. .
- the modeling step is a statistical modeling of the segment classes which consists in modeling the probability density of the set of acoustic parameters of each segment class using a Markov or HMM type model.
- the learning step consists, for example, in using an algorithm that makes it possible to position centroids uniformly distributed in the parameter space by performing the following steps:
- class_number class_nb + 1): one looks for the segment which maximizes the cumulated distance with the centroids identified with the previous iteration,
- Stopping criterion either when the predefined number of classes is reached, or when the distance between the segment found and the centroids identified at the previous iteration is less than a threshold.
- the automatic segmentation step uses, for example, the principle of the dendrogram.
- the automatic segmentation step can select a segmentation level by using one of the following criteria:
- said method uses, for example, a K-average algorithm or an LBG-type algorithm for the step of grouping the segments into classes or a version derived from the K-average algorithm.
- the size of the model can be determined automatically by applying a threshold Smax on the distance between the last two grouped classes, that is to say, to minimize the number of possible segments while minimizing the grouping distance considered.
- the acoustic parameters used can be chosen from the following list: spectral, temporal or spectral-temporal parameters associated with the audio stream.
- the invention also relates to a system for detecting audio anomalies in a given environment, characterized in that it comprises at least the following elements:
- One or more audio sensors suitable for detecting audio streams • One or more audio sensors suitable for detecting audio streams
- a pretreatment module adapted to execute an automatic segmentation of the acoustic parameters extracted from the audio stream to be analyzed,
- a likelihood calculation module taking as input the audio model of the database and the likelihood calculation result executed on an analyzed audio stream, • A comparison module of the obtained likelihood value and a threshold value.
- the system comprises, for example, a first preprocessing module and a first database development module are stored in a first processor for the learning phase of the system and in that it comprises a second preprocessing module and a second module of the likelihood calculation, the likelihood calculation module receiving as input information on the models from the database.
- the system comprises, for example, a central station or surveillance room equipped with several Ei control screens, a video stream redirection module, a main screen Ep, several sets Zi each consisting of one or more associated audio sensors to video cameras Ci, Vi, said modules Ci, Vi being connected to a module adapted to determine a likelihood value ⁇ i, said likelihood values ⁇ i being transmitted to the central station.
- the likelihood values thus determined can be used to order the associated video streams to provide assistance to an operator (automatic selection of the priority stream to the main screen, or manual selection by the operator from the ordered display of the screens
- Priority flows are, for example, those with the lowest likelihood values.
- FIG. 1 an example of a detection system according to the invention
- FIG. 2 the steps implemented during the learning phase and the recognition phase of the system according to the invention
- FIG. 4 an algorithm that can be implemented to build a dictionary
- FIG. 6 a detailed example of the steps in the use phase of FIG. 2, and
- the audio detection system can also be used to prioritize video streams from multiple cameras. This mode of use can be particularly adapted to a monitoring application by providing assistance to the security operator in charge of viewing live video streams.
- the detection system according to the invention will use two separate processors having different calculation capacities. On the upper part of the figure is represented the system used during the learning period of the system, while on the lower part, an example system for capturing anomalies and recognizing these anomalies is shown.
- the system may include a single processor having sufficient computing and processing capabilities to perform the learning step and the recognition step.
- FIG. 1 schematizes an exemplary architecture of the system implemented by the invention, for which part concerns the learning of a database that will be used for the recognition of noises and abnormal sound events on a platform of subway.
- the system comprises an audio sensor 1 of sounds, of sound noises present in an area to be monitored or of which it is desired to analyze sound events.
- the data received on this audio sensor 1 is transmitted, firstly, to a device 3 containing a filter and an analog-digital converter known to those skilled in the art, then via an input 4 to a processor 5 comprising a module 6 detailed in Figure 2 preprocessing 7.
- the models thus generated are transmitted via an output 8 of the processor 3 to a database 9.
- This database 9 will contain models corresponding to classes of acoustic parameters representative of a audio environment considered normal.
- This database will be initialized during a learning phase and may be updated during the operation of the detection system according to the invention.
- Database 9 is also used when recognizing noise or detecting abnormal audio events.
- the system also includes, for the recognition of abnormal audio events, one or more audio sensors 10, each sensor 10 being connected to a device 11 comprising a filter and an analog digital converter or ADC.
- the data detected by the audio sensor and formatted by the filter and the ADC are transmitted to a processor 13 via an input 12.
- the processor comprises a preprocessing module 14 detailed in FIG. 2, then a module 15 for recognizing processed data, said module receiving information from the database 9 by a link 16 which can be wired or not.
- the result "abnormal audio event” or “abnormal audio events” is transmitted via the output 17 of the processor to either a PC-type device 18, allowing the display of the result, or to a device triggering a signal. alarm 19 or to a system 19 'for redirecting the video stream and the alarm according to for example the diagram in FIG. 5.
- the preprocessing modules 6 and 14 must be identical in order to ensure the compatibility of the models of the database.
- the audio sensors 2 and 10 may be sensors having similar or identical characteristics (type, characteristic and positioning in the environment) in order to overcome the differences in the shaping of the signals between the phases. learning and testing.
- the transmission of data between the different devices can be performed via wired links, or wireless systems, such as Bluetooth, .... local wireless networks or abbreviated English WLAN, etc.
- FIG. 5 An example of another system architecture will be given, by way of illustration and not limitation, in Figure 5. This architecture allows in particular to prioritize different video streams from different cameras or video devices associated with surveillance sensors.
- the system may also include a buffer memory whose function, among other things, is to store the latest abnormal audio data or events.
- This buffer can thus allow a monitoring operator to access the streams recorded during the generation of an alarm.
- This memory is similar to storing video streams in CCTV.
- FIG. 2 represents an example for the sequence of the steps implemented during the method according to the invention, the left part of the figure corresponding to the learning phase while the right part to the use phase.
- a first step is the automated learning of the system.
- the system will record thanks to the sensor for a duration T
- T A initially set the noise and / or representative background sound of the subway platform.
- This learning phase is automated and unsupervised.
- the acoustic parameters that will be used are generally spectral, temporal or spectro-temporal parameters. It is thus possible to use a modeling of the spectral envelope of the noise picked up by the microphone, such as cepstral parameters or cepstral vectors.
- the audio stream in this case will be modeled by a sequence of cepstral vectors.
- an audio sequence representative of a sound environment in the initially targeted surveillance area is captured.
- the acoustic parameters are extracted during an extraction step 2.1, from the audio signal, from the audio sequence, using a short-term sliding analysis window.
- This analysis technique being known to those skilled in the art, it will not be explained.
- One way of proceeding is to consider scan frames whose duration is for example of the order of 20 to 60ms, with a typical overlap of 50%.
- the acoustic parameters considered by the method are chosen according to the properties of the signals to be modeled.
- the duration of an analysis frame generally takes into account hypotheses of stationary of the analyzed signal on the horizon of the frame.
- cepstral parameters that model the spectral envelope are often used in combination with other, more specific parameters that can be used to model temporal or spectral properties.
- the ZCR (Zero Crossing Rate) rate in the time domain or in the spectral range can be cited as the measurement known by the abbreviation "SFM" (Spectral Flatness Measure). These two measurements are part of the parameters used to distinguish the speech signals voiced by noise signals.
- SFM Standard Flatness Measure
- the next step 2.2 is an automatic segmentation step from the parameter vectors extracted in step 2.1.
- the purpose of this segmentation step is to group the vectors that are close, for example, using a predefined distance criterion.
- the criterion will be chosen according to the type of acoustic parameters that have been used to characterize the background sound or audio.
- This segmentation can be performed in several ways, for example by using one of the following techniques: detection of breakage of trajectories or models, temporal decomposition, or dendrogram which corresponds to a graphical representation of a hierarchical classification tree implementing evidence of gradual inclusion of classes.
- the segmentation principle will consist of grouping frames in a so-called bottom-up approach using an appropriate distance (adapted to the parameters).
- the dendrogram provides a set of possible segmentations (segmentation by level of the dendrogram).
- the method then uses a buffer implemented in the system to include at least one segment or group of vectors. Such a buffer memory is conventionally used, it is not shown for reasons of simplification.
- the set of segments thus calculated will be used to construct a dictionary whose number of classes Nc is predefined, or else determined automatically on a criterion of interclass distances for example. This corresponds to steps 2.3 and 2.4.
- the segments are grouped into classes by implementing a K-average algorithm (K-Means), or an "LBG” algorithm (Linde-Buzo-Gray) or any other algorithm having the same or similar functionalities used by a person skilled in the art.
- K-Means K-average algorithm
- LBG Longde-Buzo-Gray
- step 2.4 is to model the probability density of the set of acoustic parameters of each segment class, using, for example, a mixture model of Gaussian, better known by the abbreviation GMM (Gaussian Mixture Model).
- GMM Gaussian Mixture Model
- the algorithm generally used to find the maximum likelihood of the parameters of the probabilistic model when it depends on unobservable latent variables is better known by the abbreviation "EM” for Expectation-Maximization, and will be used for the phase learning system.
- the number of Gaussians used may be predefined or automatically determined from a criterion derived from the information theory of the "MDL” type, abbreviated as "Anglo-Saxon”. "Minimum Description Length" in which the best assumption for a dataset is that which leads to the widest compression of data.
- the system therefore has a database 9 corresponding to the learning of the system, that is to say comprising a sound model of the environment to be monitored.
- the learning corpus (set of signals representative of the sound environment to be modeled) is analyzed.
- the analysis consists in extracting parameters from each frame with a recovery. Typically, the duration of the analysis frame is a few tens of ms, and the recovery is generally 50%. Depending on the type of environment, it may be preferable to use a longer or shorter frame to better take into account the degree of stationing signals.
- Several types of analysis are possible (spectral analysis, cepstrale, temporal, ).
- the analysis of a frame results in a parameter vector, which is stored in a "first in first out” memory better known as "FIFO" (for First In First Out). represented for the sake of clarity.
- the size of this memory is equal to the number of elements (vectors in this case of application) used by the dendrogram.
- the corresponding duration (proportional to the size of the memory) may be of the order of a few hundred ms, or even a few seconds for highly stationary background noise. This duration must generally be chosen so as to incorporate at least one audio event considered a priori as elementary. However, a Compromise can be achieved to reduce the delay introduced by the processing during the use phase of the system. The minimization of the number of vectors makes it possible to obtain a result of the more reactive detection process.
- the dendrogram is here used to automatically obtain a segmentation of the audio signal.
- the principle consists in grouping in a "bottom-up" approach the input elements of the dendrogram. This method makes it possible to obtain a segmentation for all the different possible levels, in other words for a number of segments ranging from the initial number of elements to a single segment.
- each element is the representative of its class. If N is the number of elements (vectors) at the input of the dendrogram, then there are N classes at the lowest level.
- the number of segments is decremented to go to the higher level after grouping the two closest classes according to a distance criterion (defined according to the parameters used).
- a distance criterion defined according to the parameters used.
- groupings possible according to the distance that one seeks to minimize for the selection of the classes to be grouped.
- the 4 main methods of grouping are:> minimal distance between class vectors or in Anglo-Saxon
- the stop criterion used is, for example, based on the minimum distance between the two last grouped classes.
- FIG. 3 shows an example of groupings according to the N classes for a bottom-up approach, the vertical axis corresponding to the vectors, the horizontal axis schematizing the buffer memory of the dendrogram. At the end of this grouping, the method makes it possible to obtain 3, then 2, then a vector represented by a single letter R for the grouping.
- the automatic segmentation method must finally automatically select a level of segmentation that will be considered optimal according to a criterion to be defined.
- a first criterion is to apply an Smax threshold on the distance between the last two grouped classes (the higher the level of the dendrogram, the greater the distance between the classes to be grouped). It is therefore a question of minimizing the number of possible segments while minimizing the grouping distance considered.
- the distance criterion In cases 1) and 2), the distance criterion must be less than a threshold while minimizing the number of segments. In cases 3) and 4), the correlation criterion must be greater than a threshold while minimizing the number of segments.
- the segmentation described above is applied to the entire learning base.
- the segments thus obtained are grouped by class using, for example, a learning algorithm of the LBG (Line-Buzo-Gray) type or a K-average type algorithm.
- LBG Line-Buzo-Gray
- K-average type algorithm a learning algorithm of the LBG (Line-Buzo-Gray) type or a K-average type algorithm.
- B I centroid of index k
- the number of classes can either be fixed a priori, or determined automatically using a stopping criterion based on the minimum distance between centroids (it is not necessary to increase the number of centroids if they are sufficiently close to a certain criterion).
- the determination of the threshold used for this stopping criterion can be based on a spectral distance (possibly calculated on a non-linear scale of the MEL or Bark type frequencies to introduce a constraint related to the perception of sounds). This spectral distance can generally be calculated from the parameters used in computing the associated spectral envelopes.
- An alternative is to determine the threshold from the correlation between the distances used with the parameters and the spectral distances.
- Stopping criterion either when the predefined number of classes is reached, or when the distance between the segment found and the centroids identified at the previous iteration is less than a threshold.
- the threshold may be related to a spectrally weighted perceptual distance.
- the EM (Expectation-Maximization) algorithm is used to build a GMM model by segment class.
- a criterion of type minimum length "MDL" Minimum Description Length
- MDL Minimum Description Length
- Figure 5 shows an example of a threshold set in the graph of the score distribution profiles of normal audio events and abnormal audio events.
- the threshold makes it possible to ensure a compromise between the numbers of false alarms and false rejections. If the shaded areas (see figure), annotated Aa, and An are equal, the probability of false alarms is equal to the probability of false rejections.
- Principle of the detection system ( Figure 6)
- the segmentation module is preferably identical to that implemented for the learning phase.
- the log likelihood or log likelihood of each GMM model 4.3 is calculated.
- a threshold 4.4 is then applied to the maximum log-likelihood obtained (most likely GMM model) to decide whether or not an abnormal event is present.
- the detection threshold of the system can be determined automatically from a predefined basis 4.5 of abnormal events which makes it possible to estimate the distribution of anomalous event scores and to compare it with the distribution of the scores obtained on the data. learning.
- the threshold can then be chosen to have a point of operation of the system favoring either the false alarm rate or the false rejection rate.
- the distributions of normal events and abnormal events are obtained from the learning sequences, and simulated sequences, respectively.
- the simulated sequences are obtained by superimposing the abnormal events on the learning sequences at different levels of signal to noise ratio RSB (Signal to Noise Ratio).
- RSB Signal to Noise Ratio
- the noise is the sound environment represented by the learning sequences
- the signal is the abnormal event.
- the optimal threshold can then be determined according to the desired compromise from the distributions thus obtained. It is possible to use models other than GMM: HMM (Hidden Markov model).
- Markov or "HMM” type models make it possible to take into account the temporal evolution of the sound event on the horizon of a segment.
- the learning algorithms are known in the field of voice recognition in particular.
- Automatic segmentation can be used to initialize the learning of HMM models, which are then used for online segmentation using a Viterbi algorithm.
- a standard HMM topology can be used: Bakis model (left-right model). However, it is possible to maintain automatic segmentation and constrain upstream segmentation.
- the system can be in continuous operation, which means that it continuously picks up the sounds or audio streams present in the area to be monitored or an operator can control the operation of the system over periods of time previously fixed by a operator.
- the right part of Figure 2 shows the steps for the use phase.
- the first step 3.1 will be to extract the acoustic parameters of the analyzed audio stream.
- step 2.1 of the learning phase remains valid.
- the same method of segmentation 3.1 is applied to the sequence of acoustic parameters of the analyzed audio stream. It is the same for the segmentation step 3.2 which is executed in the same way as step 2.2.
- the system has the segments (characteristics of the audio stream being analyzed). He then applies a likelihood step, that is to say that the likelihood of each GMM statistical model obtained during the learning is calculated on each of the segments obtained in step 2.2. It is possible to apply likelihood normalization procedures before making the decision to detect or not detect an audio anomaly.
- the likelihood calculation is applied for each class K and a score or likelihood ⁇ k is assigned to a segment. This value is compared with a threshold value set beforehand. An anomalous event hypothesis is generated if none of the GMM models produces a score above a threshold. This non-detection threshold (normal event) can be determined automatically from the training data. The detection of a normal event can be taken on the horizon of the segment in question or on the horizon of several consecutive segments.
- the acoustic parameters used for segmentation may be different from those used for modeling. It is indeed quite relevant to perform the segmentation according to a spectral criterion (cepstral parameters) and to add additional specific parameters for modeling that allow a finer modeling.
- the decision thresholds can be predefined from knowledge, a priori, on the signals, or learned by simulating abnormal conditions.
- Different types of classification modules can be used in parallel, to improve performance through a merge stage.
- Different types of parameters can be used for detection and classification to maximize the discrimination power of the system between normal and abnormal events.
- Unsupervised and supervised approaches can complement each other.
- the system and method described above can be combined with a conventional classified classification solution by limiting false alarms.
- the classification is activated only when abnormal event is detected. The detection is done taking into account the sound environment and therefore with greater robustness.
- FIG. 7 schematizes an exemplary architecture comprising several devices making it possible to record sounds such as audio sensors Ci, in particular abnormal audio events.
- the audio sensors are associated with a video camera Vi.
- a preprocessing module At the level of the video camera and audio sensor assembly, it is possible to integrate a preprocessing module.
- the assembly thus formed is connected, for example, to a calculator Pi comprising an abnormal event recognition module, and a database 9 containing the models used to recognize the abnormal events.
- Each calculator Pi is connected to a central or monitoring room comprising, for example, several screens Ei surveillance.
- the central receives the audio and video streams. It includes an Fr module to prioritize video streams from cameras according to their importance.
- the links for transferring data from one device to another are, for example, wired links, or wireless links, of the Bluetooth type, or the system is part of a wireless local area network or WLAN ( Wireless Local Area Network).
- the likelihood calculation can be used to order the associated video streams to provide operator assistance (for automatic selection of the priority stream to the main screen, or to facilitate manual selection by the operator from the orderly display of control screens
- Priority streams are those with the lowest likelihoods (highest probability of having an abnormal audio event).
- the models obtained during the learning phase can be supplemented by other models obtained during a subsequent learning phase.
- the system can then simply use both sets of models as a reference for the normal sound environment, or use a set of models resulting from a more elaborate grouping process. It is possible to synthesize new models using a Gaussian distance criterion (such as the Battacharyya distance, or the Kullback-Leibler divergence measure).
- Another approach is to apply the initial classification system to the new learning data, to retain among the new data those that score below a predefined threshold to learn new models. These new models are then added to the previous ones.
- the solution of the invention is not supervised, the system and the method have the advantage of being able to be used in different environments and without a priori abnormal events to detect.
- the learning phase of the system is automated from automatic segmentation of speech or audio captured to learning patterns used in the system. This automation also makes it possible to envisage a mode of operation with regular or continuous updating.
- Another advantage resulting from the automation of the processing chain is the possible reinitialization of the system to a new scenario or a new environment, as well as its possibility of evolution and adaptation over time.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BRPI1014280A BRPI1014280A2 (pt) | 2009-04-24 | 2010-04-21 | sistema e ,método para detectar eventos de áudio anormais |
MX2011011214A MX2011011214A (es) | 2009-04-24 | 2010-04-21 | Sistema y metodo para detectar eventos de audio anormales. |
US13/266,101 US8938404B2 (en) | 2009-04-24 | 2010-04-21 | System and method for detecting abnormal audio events |
EP10718923A EP2422301A2 (fr) | 2009-04-24 | 2010-04-21 | Systeme et methode pour detecter des evenements audio anormaux |
SG2011078235A SG175350A1 (en) | 2009-04-24 | 2010-04-21 | System and method for detecting abnormal audio events |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0902007A FR2944903B1 (fr) | 2009-04-24 | 2009-04-24 | Systeme et methode pour detecter des evenements audio anormaux |
FR0902007 | 2009-04-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2010122056A2 true WO2010122056A2 (fr) | 2010-10-28 |
WO2010122056A3 WO2010122056A3 (fr) | 2010-12-16 |
Family
ID=41402413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2010/055266 WO2010122056A2 (fr) | 2009-04-24 | 2010-04-21 | Systeme et methode pour detecter des evenements audio anormaux |
Country Status (8)
Country | Link |
---|---|
US (1) | US8938404B2 (fr) |
EP (1) | EP2422301A2 (fr) |
BR (1) | BRPI1014280A2 (fr) |
FR (1) | FR2944903B1 (fr) |
MX (1) | MX2011011214A (fr) |
MY (1) | MY157136A (fr) |
SG (1) | SG175350A1 (fr) |
WO (1) | WO2010122056A2 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201230A (zh) * | 2011-06-15 | 2011-09-28 | 天津大学 | 一种突发事件语音检测方法 |
FR2981189A1 (fr) * | 2011-10-10 | 2013-04-12 | Thales Sa | Systeme et procede non supervise d'analyse et de structuration thematique multi resolution de flux audio |
CN103366738A (zh) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | 生成声音分类器和检测异常声音的方法和设备及监视系统 |
EP2696344A1 (fr) * | 2012-08-10 | 2014-02-12 | Thales | Procede et systeme pour detecter des evenements sonores dans un environnement donne |
CN109844739A (zh) * | 2016-09-09 | 2019-06-04 | 国家科学研究中心 | 用于在多种信号中模式识别的方法 |
CN112349296A (zh) * | 2020-11-10 | 2021-02-09 | 胡添杰 | 一种基于声音识别的地铁站台安全监测方法 |
CN116631443A (zh) * | 2021-02-26 | 2023-08-22 | 武汉星巡智能科技有限公司 | 基于振动频谱对比的婴儿哭声类别检测方法、装置及设备 |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10809966B2 (en) * | 2013-03-14 | 2020-10-20 | Honeywell International Inc. | System and method of audio information display on video playback timeline |
US10007716B2 (en) * | 2014-04-28 | 2018-06-26 | Moogsoft, Inc. | System for decomposing clustering events from managed infrastructures coupled to a data extraction device |
US11010220B2 (en) | 2013-04-29 | 2021-05-18 | Moogsoft, Inc. | System and methods for decomposing events from managed infrastructures that includes a feedback signalizer functor |
US10803133B2 (en) | 2013-04-29 | 2020-10-13 | Moogsoft Inc. | System for decomposing events from managed infrastructures that includes a reference tool signalizer |
US10700920B2 (en) | 2013-04-29 | 2020-06-30 | Moogsoft, Inc. | System and methods for decomposing events from managed infrastructures that includes a floating point unit |
US10013476B2 (en) * | 2014-04-28 | 2018-07-03 | Moogsoft, Inc. | System for decomposing clustering events from managed infrastructures |
US9396256B2 (en) | 2013-12-13 | 2016-07-19 | International Business Machines Corporation | Pattern based audio searching method and system |
US10873508B2 (en) | 2015-01-27 | 2020-12-22 | Moogsoft Inc. | Modularity and similarity graphics system with monitoring policy |
US10425291B2 (en) | 2015-01-27 | 2019-09-24 | Moogsoft Inc. | System for decomposing events from managed infrastructures with prediction of a networks topology |
US11817993B2 (en) | 2015-01-27 | 2023-11-14 | Dell Products L.P. | System for decomposing events and unstructured data |
US11303502B2 (en) | 2015-01-27 | 2022-04-12 | Moogsoft Inc. | System with a plurality of lower tiers of information coupled to a top tier of information |
US11924018B2 (en) | 2015-01-27 | 2024-03-05 | Dell Products L.P. | System for decomposing events and unstructured data |
US10979304B2 (en) | 2015-01-27 | 2021-04-13 | Moogsoft Inc. | Agent technology system with monitoring policy |
US10686648B2 (en) * | 2015-01-27 | 2020-06-16 | Moogsoft Inc. | System for decomposing clustering events from managed infrastructures |
CN106323452B (zh) * | 2015-07-06 | 2019-03-29 | 中达电子零组件(吴江)有限公司 | 一种设备异音的检测方法及检测装置 |
US10142483B2 (en) * | 2015-12-22 | 2018-11-27 | Intel Corporation | Technologies for dynamic audio communication adjustment |
US10141009B2 (en) * | 2016-06-28 | 2018-11-27 | Pindrop Security, Inc. | System and method for cluster-based audio event detection |
WO2018005316A1 (fr) | 2016-07-01 | 2018-01-04 | Bostel Technologies, Llc | Phonodermoscopie, système et procédé dispositif médical destiné au diagnostic de la peau |
US11298072B2 (en) * | 2016-07-01 | 2022-04-12 | Bostel Technologies, Llc | Dermoscopy diagnosis of cancerous lesions utilizing dual deep learning algorithms via visual and audio (sonification) outputs |
WO2018053537A1 (fr) | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Améliorations de la reconnaissance de locuteurs dans un centre d'appels |
WO2018053518A1 (fr) | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Caractéristiques de bas niveau de compensation de canal pour la reconnaissance de locuteur |
US20180150697A1 (en) * | 2017-01-09 | 2018-05-31 | Seematics Systems Ltd | System and method for using subsequent behavior to facilitate learning of visual event detectors |
JP6485567B1 (ja) * | 2018-02-27 | 2019-03-20 | オムロン株式会社 | 適合性判定装置、適合性判定方法及びプログラム |
JP6810097B2 (ja) * | 2018-05-21 | 2021-01-06 | ファナック株式会社 | 異常検出器 |
US10475468B1 (en) | 2018-07-12 | 2019-11-12 | Honeywell International Inc. | Monitoring industrial equipment using audio |
JP6614623B1 (ja) * | 2018-11-02 | 2019-12-04 | 国立研究開発法人産業技術総合研究所 | 不明水検出装置、不明水検出方法、プログラム及び不明水検出システム |
CN109599120B (zh) * | 2018-12-25 | 2021-12-07 | 哈尔滨工程大学 | 一种基于大规模养殖场厂哺乳动物异常声音监测方法 |
WO2020159917A1 (fr) | 2019-01-28 | 2020-08-06 | Pindrop Security, Inc. | Repérage de mots-clés et découverte de mots non supervisés pour une analyse de fraude |
US11019201B2 (en) | 2019-02-06 | 2021-05-25 | Pindrop Security, Inc. | Systems and methods of gateway detection in a telephone network |
US10665251B1 (en) | 2019-02-27 | 2020-05-26 | International Business Machines Corporation | Multi-modal anomaly detection |
WO2020198354A1 (fr) | 2019-03-25 | 2020-10-01 | Pindrop Security, Inc. | Détection d'appels provenant d'assistants vocaux |
US12015637B2 (en) | 2019-04-08 | 2024-06-18 | Pindrop Security, Inc. | Systems and methods for end-to-end architectures for voice spoofing detection |
US11488622B2 (en) * | 2019-12-16 | 2022-11-01 | Cellular South, Inc. | Embedded audio sensor system and methods |
US11784888B2 (en) | 2019-12-25 | 2023-10-10 | Moogsoft Inc. | Frequency-based sorting algorithm for feature sparse NLP datasets |
DE102020200946A1 (de) * | 2020-01-27 | 2021-07-29 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein | Verfahren und Vorrichtung zur Erkennung von akustischen Anomalien |
JP7445503B2 (ja) | 2020-04-09 | 2024-03-07 | 日本放送協会 | 異常音検知装置及びそのプログラム |
US11450340B2 (en) | 2020-12-07 | 2022-09-20 | Honeywell International Inc. | Methods and systems for human activity tracking |
US11443758B2 (en) * | 2021-02-09 | 2022-09-13 | International Business Machines Corporation | Anomalous sound detection with timbre separation |
US11765501B2 (en) | 2021-03-10 | 2023-09-19 | Honeywell International Inc. | Video surveillance system with audio analytics adapted to a particular environment to aid in identifying abnormal events in the particular environment |
US11620827B2 (en) | 2021-03-22 | 2023-04-04 | Honeywell International Inc. | System and method for identifying activity in an area using a video camera and an audio sensor |
CN114121050A (zh) * | 2021-11-30 | 2022-03-01 | 云知声智能科技股份有限公司 | 音频播放方法、装置、电子设备和存储介质 |
US11836982B2 (en) | 2021-12-15 | 2023-12-05 | Honeywell International Inc. | Security camera with video analytics and direct network communication with neighboring cameras |
CN114781467B (zh) * | 2022-06-22 | 2022-09-06 | 济南嘉宏科技有限责任公司 | 一种基于振动相似度的故障检测方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100429716C (zh) * | 2002-08-19 | 2008-10-29 | 皇家飞利浦电子股份有限公司 | 用于检测记录载体上的异常的扫描设备和方法 |
-
2009
- 2009-04-24 FR FR0902007A patent/FR2944903B1/fr active Active
-
2010
- 2010-04-21 WO PCT/EP2010/055266 patent/WO2010122056A2/fr active Application Filing
- 2010-04-21 US US13/266,101 patent/US8938404B2/en not_active Expired - Fee Related
- 2010-04-21 MY MYPI2011005126A patent/MY157136A/en unknown
- 2010-04-21 EP EP10718923A patent/EP2422301A2/fr not_active Ceased
- 2010-04-21 BR BRPI1014280A patent/BRPI1014280A2/pt active Search and Examination
- 2010-04-21 MX MX2011011214A patent/MX2011011214A/es active IP Right Grant
- 2010-04-21 SG SG2011078235A patent/SG175350A1/en unknown
Non-Patent Citations (1)
Title |
---|
None |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201230A (zh) * | 2011-06-15 | 2011-09-28 | 天津大学 | 一种突发事件语音检测方法 |
FR2981189A1 (fr) * | 2011-10-10 | 2013-04-12 | Thales Sa | Systeme et procede non supervise d'analyse et de structuration thematique multi resolution de flux audio |
WO2013053705A1 (fr) * | 2011-10-10 | 2013-04-18 | Thales | Systeme et procede non supervise d'analyse et de structuration thematique multi resolution de flux audio |
CN103366738A (zh) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | 生成声音分类器和检测异常声音的方法和设备及监视系统 |
EP2696344A1 (fr) * | 2012-08-10 | 2014-02-12 | Thales | Procede et systeme pour detecter des evenements sonores dans un environnement donne |
FR2994495A1 (fr) * | 2012-08-10 | 2014-02-14 | Thales Sa | Procede et systeme pour detecter des evenements sonores dans un environnement donne |
CN109844739A (zh) * | 2016-09-09 | 2019-06-04 | 国家科学研究中心 | 用于在多种信号中模式识别的方法 |
CN109844739B (zh) * | 2016-09-09 | 2023-07-18 | 国家科学研究中心 | 用于在多种信号中模式识别的方法 |
CN112349296A (zh) * | 2020-11-10 | 2021-02-09 | 胡添杰 | 一种基于声音识别的地铁站台安全监测方法 |
CN116631443A (zh) * | 2021-02-26 | 2023-08-22 | 武汉星巡智能科技有限公司 | 基于振动频谱对比的婴儿哭声类别检测方法、装置及设备 |
CN116631443B (zh) * | 2021-02-26 | 2024-05-07 | 武汉星巡智能科技有限公司 | 基于振动频谱对比的婴儿哭声类别检测方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
MX2011011214A (es) | 2011-11-18 |
US20120185418A1 (en) | 2012-07-19 |
US8938404B2 (en) | 2015-01-20 |
MY157136A (en) | 2016-05-13 |
FR2944903A1 (fr) | 2010-10-29 |
FR2944903B1 (fr) | 2016-08-26 |
WO2010122056A3 (fr) | 2010-12-16 |
EP2422301A2 (fr) | 2012-02-29 |
SG175350A1 (en) | 2011-11-28 |
BRPI1014280A2 (pt) | 2019-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2010122056A2 (fr) | Systeme et methode pour detecter des evenements audio anormaux | |
EP2696344B1 (fr) | Procede et systeme pour detecter des evenements sonores dans un environnement donne | |
EP0594480B1 (fr) | Procédé de détection de la parole | |
EP3767558B1 (fr) | Procede et dispositif de determination d'une duree estimee avant un incident technique dans une infrastructure informatique a partir de valeurs d'indicateurs de performance | |
EP4000234A1 (fr) | Procédé et dispositif de détection d'anomalies, produit-programme d'ordinateur et support porteur non transitoire lisible par ordinateur correspondants | |
EP3155608A1 (fr) | Procede de suivi d'une partition musicale et procede de modelisation associe | |
EP1877826B1 (fr) | Détecteur séquentiel markovien | |
CN110852215A (zh) | 一种多模态情感识别方法、系统及存储介质 | |
WO2003048711A2 (fr) | System de detection de parole dans un signal audio en environnement bruite | |
FR3098940A1 (fr) | Procédé et dispositif de détermination d’une valeur de risque d’incident technique dans une infrastructure informatique à partir de valeurs d’indicateurs de performance | |
FR2979447A1 (fr) | Procede de configuration d'un dispositif de detection a capteur, programme d'ordinateur et dispositif adaptatif correspondants | |
EP4027269A1 (fr) | Procédé de construction et d'entraînement d'un détecteur de la présence d'anomalies dans un signal temporel, dispositifs et procédé associés | |
EP3252563B1 (fr) | Détermination d'un contexte de mobilité d'un utilisateur porteur d'un équipement muni de capteurs inertiels | |
CN113345466A (zh) | 基于多麦克风场景的主说话人语音检测方法、装置及设备 | |
EP2766825B1 (fr) | Systeme et procede non supervise d'analyse et de structuration thematique multi resolution de flux audio | |
JP2018109739A (ja) | 音声フレーム処理用の装置及び方法 | |
WO2007051940A1 (fr) | Procede et dispositif de calcul de mesure de similarite entre une representation d'un segment audio de reference et une representation d'un segment audio a tester et procede et dispositif de suivi d'un locuteur de reference | |
EP3543904A1 (fr) | Procédé de contrôle de détection de scènes et appareil correspondant | |
US20230317102A1 (en) | Sound Event Detection | |
Martín-Gutiérrez et al. | An End-to-End Speaker Diarization Service for improving Multimedia Content Access | |
WO2023237498A1 (fr) | Dispositif de traitement de donnees par voie d'apprentissage, procede, programme et systeme correspondant | |
EP4099044A1 (fr) | Methode et dispositif de classification d'impulsions de signaux radar | |
WO2024061989A1 (fr) | Procédé de traitement de signal monodimensionnel, dispositif et programme correspondant | |
WO2007003505A1 (fr) | Procédé et dispositif de segmentation et de labellisation du contenu d'un signal d'entrée se présentant sous la forme d'un flux continu de données d'entrée indifférenciées. | |
Segerholm | Unsupervised Online Anomaly Detection in Multivariate Time-Series |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10718923 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2011/011214 Country of ref document: MX Ref document number: 2010718923 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13266101 Country of ref document: US |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: PI1014280 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: PI1014280 Country of ref document: BR Kind code of ref document: A2 Effective date: 20111024 |