WO2008148289A1 - Système et procédé d'identification audio intelligents - Google Patents

Système et procédé d'identification audio intelligents Download PDF

Info

Publication number
WO2008148289A1
WO2008148289A1 PCT/CN2008/000765 CN2008000765W WO2008148289A1 WO 2008148289 A1 WO2008148289 A1 WO 2008148289A1 CN 2008000765 W CN2008000765 W CN 2008000765W WO 2008148289 A1 WO2008148289 A1 WO 2008148289A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
feature vector
data
audio
feature
Prior art date
Application number
PCT/CN2008/000765
Other languages
English (en)
Chinese (zh)
Inventor
Yangsheng Xu
Jianzhao Qin
Jun Cheng
Xinyu Wu
Chong Guo Li
Original Assignee
Shenzhen Institute Of Advanced Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute Of Advanced Technology filed Critical Shenzhen Institute Of Advanced Technology
Publication of WO2008148289A1 publication Critical patent/WO2008148289A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Definitions

  • the present invention relates to a system and method for automatically recognizing audio data. Background technique
  • Hearing is one of the important sources of human access to external information. It is also an important channel for humans to distinguish external occurrences. For example, when you hear a barking, you can judge that there may be dogs nearby. When you hear screams, you can It is determined that someone may be hurt nearby. Many important pieces of information of the present invention can be provided by analysis of audio. At present, most of the functions of audio-based analysis systems are to pre-process the original audio collected, such as: denoising, extracting or enhancing the audio of the specified features, but finally the recognition of the audio requires human participation. In many applications in nature, different sounds need to be automatically identified.
  • the technical problem to be solved by the present invention is to provide an intelligent audio recognition system and an automatic identification method for automatically identifying audio data.
  • An intelligent audio recognition method includes the following steps:
  • A. Collect various sample audio data, and mark the collected sample audio data
  • the feature vector of the audio data to be identified is input to the classifier, and the classifier performs discrimination according to the feature vector to obtain an identification result of the to-be-identified audio data.
  • step B includes the following steps:
  • step D includes the following steps:
  • the characteristic component described in the step B2 or D2 comprises: a center frequency of the audio, an energy feature of the audio in some specific frequency segments or an energy distribution feature of the audio in the plurality of time periods.
  • the feature vector described in the step B3 or D3 is a vector sum of a center frequency of the audio and a sum of audio energy spectra in some specific frequency segments.
  • step C the category region to be described in step C is divided according to the value of the feature vector, and is defined by a curve or a curved surface.
  • step E includes the following processing:
  • the classifier judging the credibility of the classification result according to the rejection index, when the category index is higher than the preset threshold, determining that the classification result is credible, the classifier gives the category of the audio data to be identified; when the rejection index is lower than the pre- When the threshold is set, the classifier gives the category of the audio data to be identified, and indicates that the classification result is not authentic.
  • the step A comprises identifying the collected sample audio data, determining and indicating what sound the sample audio data is.
  • An intelligent audio recognition system includes an audio data set for collecting and storing various types of sample audio data, a training unit, and an identification unit; the training unit is configured to extract feature vectors of sample audio data, and find and establish a mapping relationship between the sample audio data feature vector and the belonging category; the identification unit is configured to store data of a mapping relationship between the established audio data feature vector and the belonging category, and extract the feature data of the audio data to be identified, and according to The feature vector of the audio data to be identified is given the identification result.
  • the training unit includes a first pre-processing module, a first feature extraction module, and a training module
  • the pre-processing module is configured to perform denoising processing on the sample audio data to obtain training data
  • the feature extraction module A feature vector for extracting sample audio data from the training data, the training module for finding and establishing a mapping relationship from the sample audio data feature vector to the belonging class.
  • the identification unit comprises a second pre-processing module, a second feature extraction module, and a classifier, wherein the second pre-processing module is configured to perform denoising processing on the audio data to be recognized, to obtain identification data;
  • the second feature extraction module is configured to extract a feature vector of the audio data to be recognized from the identification data, where the classifier is configured to store data of a mapping relationship between the audio data feature vector output by the training module and the category, and according to the input The feature vector of the audio data to be identified, and the identification result is output.
  • the invention has the beneficial effects that: the intelligent audio recognition system and method of the invention can automatically identify the audio data, and the system has good real-time performance and expansion capability.
  • FIG. 1 is a block diagram of the system of the present invention
  • Figure 2 is a block diagram of a training unit of the present invention
  • FIG. 3 is a block diagram of an identification unit of the present invention.
  • FIG. 4 is a schematic diagram of establishing a mapping relationship between feature vectors and categories when the sample audio data is of four types
  • FIG. 5 is a schematic diagram of establishing a mapping relationship between feature vectors and categories when the sample audio data is of two types.
  • An intelligent audio recognition system as shown in FIG. 1 includes at least one audio data set 1, a training unit 2 and an identification unit 3 for collecting and storing various types of sample audio data.
  • the training unit 2 is configured to extract feature vectors of the sample audio data, and find and establish a mapping relationship from the sample audio data feature vector to the belonging category;
  • the identification unit 3 is configured to store the established audio data feature vector and the category The data of the mapping relationship is extracted, and the feature vector of the audio data to be identified is extracted, and the identification result is given according to the feature vector of the audio data to be identified.
  • the training unit includes a first pre-processing module 21, a first feature extraction module 22, and a training module 23.
  • the identification unit includes a second pre-processing module 31, a second feature extraction module 32, and a classifier 33.
  • the establishment of the audio data set 1 is to provide the necessary training samples for the subsequent training unit 2.
  • the user collects audio data according to the category of audio that needs to be recognized.
  • the data set can be created by using its own recording, collecting audio material from the Internet, and purchasing Buy audio material CDs and other methods to collect learning samples.
  • each type of audio needs to collect multiple samples, and in the process of sample collection, the collected samples need to be manually labeled, that is, the collected samples are answered by the human ear, and then the sample is determined to be what sound. In order to ensure the identification of the system, samples should be collected as much as possible.
  • the collected sample audio data needs to be pre-processed first, that is, the pre-processing module 21 removes noise and the like from the sample audio data from the audio data set 1, and the sample audio to be recognized is from a complex audio background.
  • the separated training data is obtained.
  • the feature extraction module 22 extracts components from the training data that reflect the essential characteristics of the sample audio data, such as: the center frequency of the audio, and the energy characteristics of the audio in certain frequency segments.
  • the energy distribution characteristics of the audio in the plurality of time periods are obtained by Fourier transforming the audio signal, and the features are combined to obtain corresponding feature vectors. For example: The center frequency of the sample audio is 33, and the sum of the energy spectra in an audio segment is 1000.
  • the resulting feature vector is the vector sum of the sum of the center frequency and the energy spectrum in an audio segment (33,1000).
  • the training module 23 uses the extracted feature components to train the classifier 33 for recognizing the audio, that is, the training classifier, that is, the training module 23 finds a plurality of classification curves or surfaces according to the feature vectors of the N sample audio data. Separating N classification regions by classification curves or surfaces, so that the feature vectors of each specimen audio data are distributed in different classification regions, and the classification regions are divided according to the values of the feature vectors, that is, a feature vector space is established. The mapping to the category.
  • the training module 23 is equivalent to finding two straight lines so that the feature vectors of the four types of samples are respectively It is distributed in four regions divided by two lines.
  • the triangle is the first type of eigenvector obtained during training
  • the circle is the second type of eigenvector obtained during training.
  • the five-pointed star is obtained during training.
  • the third type of eigenvectors, the pentagon is the fourth type of eigenvectors obtained during training
  • the straight line 1 and the straight line 2 are the classification lines obtained from the four types of eigenvectors (ie, the four regions divided by the two classification lines should be respectively
  • the method includes dividing the feature space into four subspaces, and the training module stores the trained data, that is, the data of the established audio data feature vector and the mapping relationship between the categories, in the classifier 33.
  • the division principle of the category region in the method of the invention is: by dividing the feature vector space, the divided different types of regions only contain feature vectors of the same type of samples, or as many feature vectors as possible of the sample, as little as possible Contains feature vectors that are not such samples.
  • the function of the identification unit 3 is to use the classifier 33 trained by the training module 23 according to the audio data to be recognized to obtain the identification result.
  • the second pre-processing module 31 and the second feature extraction module 32 in the identification unit respectively function the same as the second pre-processing module 21 and the second feature extraction 22 in the training unit.
  • the audio sample to be identified After the audio sample to be identified is obtained, it is first pre-processed by the pre-processing module 31 to obtain the processed identification data. Then, the feature extraction method in the feature extraction module 22 is used to identify the audio data. Feature extraction, obtaining a feature vector of the audio data to be recognized; thereafter, the extracted feature vector is input as a classifier 33 (obtained by the training module 23), and the classifier outputs the recognition result according to the input feature vector. For example, when the feature vector to be classified is distributed in the space sandwiched by the upper half of the straight line 1 and the lower half of the straight line 2 (such as the hexagon in FIG. 4), the present invention discriminates the feature vector to be classified as Category 1.
  • the present invention discriminates the feature vector to be classified into the first part of the circular feature vector.
  • the octagon in the drawing can be divided into the third category, and the hexagonal star is divided into the fourth category.
  • the classifier gives the identification result of the audio class to be identified according to the input feature vector of the to-be-identified audio data, and the more sample audios are collected in the audio data set, the more the classified regions are divided, the more to be identified The finer the audio data classification, the closer the classification result is to the real sound category.
  • the more commonly used classifiers are neural networks, support vector machines, Adaboost, and so on.
  • the process of obtaining a linear classification surface by a classifier based on a linear support vector machine is introduced below.
  • the interface w of the linear support vector machine can be obtained by solving the following optimization problems:
  • n b can be used as a rejection index, which is considered reliable when 1 W ' X _ is greater than a certain threshold.
  • the one-against-one method constructs ⁇ _ 1)/2 classification faces. These classification planes obtain ⁇ _ 1)/2 classification planes by taking out the combination of the two types from the class and then using the above-mentioned classification surface construction method for the two types of problems.
  • We use the voting method to determine the category to which the feature vector X belongs. Let: For the class and class, the classification face is w. If 'X _ b > 0, vote for the i-th class; if 1 ' X _ & k Q , vote for the j-th class.
  • the identification result output by the classifier includes a classification result and a rejection index of the category to which the audio to be recognized belongs, and the feature vector of the audio data to be recognized in FIG. 4 is a hexagon as an example, since the hexagon is distributed on the straight line 1. In the first type of space sandwiched between the half and the lower half of the line 2, the category is the first category.
  • the refusal index is a parameter used to measure the credibility of the classification results.
  • the output classification result is a probability belonging to a certain class, and the probability can be used as a rejection index. If the probability of belonging to all classes in the output result is less than a certain probability, then the rejection is The sample category is discriminated; for the classifier based on the classification surface, the distance between the feature vector of the sample and the nearest classification surface can be used as the rejection index, if the distance between the feature vector of the sample and the nearest classification surface is smaller than For a certain value, the sample category is rejected.
  • the rejection index is used to determine the credibility of the classification result.
  • the threshold can be set according to the experiment (for example: the invention can establish a small-scale test set, and then find a threshold to concentrate the test. For some untrustworthy samples, the value is set to a threshold value.
  • a threshold value is preset. When the rejection index is greater than the preset threshold, the classification result given by the classifier is credible; When the index is less than the preset threshold, the classification result of the classifier is less reliable, and the classifier indicates that the classification result is not credible while giving the category of the audio data to be identified.
  • the present invention discriminates the feature vector to be classified into a class to which the circular feature vector belongs.
  • the linear classification surface parameter (the linear classification surface parameter can be obtained by obtaining the linear classification surface normal vector) and the symbol (positive or negative) of the dot product of the feature vector to be classified will be used to distinguish the feature vector to be classified, and the point
  • the absolute value of the product is the rejection index, which is used to measure the credibility of the classification. The larger the rejection index (the absolute value of the dot product), the higher the reliability of the classification. When the absolute value of the dot product is greater than the default. At the threshold, the classification is considered reliable.
  • the method and system can be used to identify various unused audios in nature, and the system can be used to identify specific audios first, and based on the identification results, to implement subsequent Function, training a fast, scalable classifier to ensure that the system has good real-time and scalability.
  • the intelligent audio recognition system of the invention can be used for intelligent monitoring in various situations. Such as: can be installed in the elevator The system automatically recognizes abnormal sounds such as screams, screams, and percussive sounds, and sends an alarm signal to the monitoring personnel, thereby improving the reaction time for handling abnormal conditions in the elevator, and at the same time reducing the workload of the elevator monitoring personnel. .
  • the system can also be used for home monitoring. After installing the system indoors, the system can identify the abnormal sounds that may occur in the room such as glass breaking sound, door crashing sound, explosion sound, gunshot sound, etc., and immediately send out an alarm signal after identifying these abnormal sounds, thereby effectively Prevent the occurrence of criminal acts such as theft of doors and windows into the room.
  • the system can also be installed outdoors to automatically identify weather-related sounds such as thunder, wind, and rain, and monitor weather conditions in real time.
  • the system can help wildlife researchers working in the field to conduct research. Wild zoologists often need to spend weeks or even months to track some rare wild animals.
  • the present invention can identify the sound of a certain wild animal by broadcasting a wireless sensor installed in the designated area. The sound of the animal is signaled to help the wildlifeologist track it.
  • the system can also be used for the diagnosis of mechanical faults. When the machine malfunctions, it will emit a sound different from the normal operation of the machine, and the sound of different faults will be different.
  • the system can learn according to several different fault audios, and then install the real-time sound of the machine work near the machine.
  • the system can also be applied to Internet-based audio retrieval and audio-based scene analysis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un système et un procédé d'identification audio intelligents. Le système comprend un ensemble de données audio pour collecter et stocker toutes sortes d'échantillons de données audio (1), une unité de formation (2) et une unité d'identification (3). L'unité de formation (2) sert à extraire des vecteurs caractéristiques des échantillons de données audio, rechercher et établir la relation d'assignation des vecteurs caractéristiques des échantillons de données audio et des classes correspondantes. L'unité d'identification (3) sert à stocker les données établies présentant la relation d'assignation des échantillons de données audio et les classes correspondantes et à extraire un vecteur caractéristique des données audio à identifier, enfin à obtenir le résultat d'identification en fonction du vecteur caractéristique des données audio à identifier.
PCT/CN2008/000765 2007-06-07 2008-04-15 Système et procédé d'identification audio intelligents WO2008148289A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710075008.4 2007-06-07
CN 200710075008 CN101067930B (zh) 2007-06-07 2007-06-07 一种智能音频辨识系统及辨识方法

Publications (1)

Publication Number Publication Date
WO2008148289A1 true WO2008148289A1 (fr) 2008-12-11

Family

ID=38880462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/000765 WO2008148289A1 (fr) 2007-06-07 2008-04-15 Système et procédé d'identification audio intelligents

Country Status (2)

Country Link
CN (1) CN101067930B (fr)
WO (1) WO2008148289A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184732A (zh) * 2011-04-28 2011-09-14 重庆邮电大学 基于分形特征的智能轮椅语音识别控制方法及系统
CN111370025A (zh) * 2020-02-25 2020-07-03 广州酷狗计算机科技有限公司 音频识别方法、装置及计算机存储介质

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067930B (zh) * 2007-06-07 2011-06-29 深圳先进技术研究院 一种智能音频辨识系统及辨识方法
CN101587710B (zh) * 2009-07-02 2011-12-14 北京理工大学 一种基于音频突发事件分类的多码本编码参数量化方法
CN102623007B (zh) * 2011-01-30 2014-01-01 清华大学 基于可变时长的音频特征分类方法
CN102664004B (zh) * 2012-03-22 2013-10-23 重庆英卡电子有限公司 森林盗窃行为识别方法
CN103198838A (zh) * 2013-03-29 2013-07-10 苏州皓泰视频技术有限公司 一种用于嵌入式系统的异常声音监控方法和监控装置
CN103743477B (zh) * 2013-12-27 2016-01-13 柳州职业技术学院 一种机械故障检测诊断方法及其设备
CN104464733B (zh) * 2014-10-28 2019-09-20 百度在线网络技术(北京)有限公司 一种语音对话的多场景管理方法及装置
CN104700833A (zh) * 2014-12-29 2015-06-10 芜湖乐锐思信息咨询有限公司 一种大数据语音分类方法
CN106531191A (zh) * 2015-09-10 2017-03-22 百度在线网络技术(北京)有限公司 用于提供危险报告信息的方法和装置
CN105138696B (zh) * 2015-09-24 2019-11-19 深圳市冠旭电子股份有限公司 一种音乐推送方法及装置
CN105679313A (zh) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 一种音频识别报警系统及方法
CN107801090A (zh) * 2017-11-03 2018-03-13 北京奇虎科技有限公司 利用音频信息检测异常视频文件的方法、装置及计算设备
CN108764304B (zh) * 2018-05-11 2020-03-06 Oppo广东移动通信有限公司 场景识别方法、装置、存储介质及电子设备
CN108764114B (zh) * 2018-05-23 2022-09-13 腾讯音乐娱乐科技(深圳)有限公司 一种信号识别方法及其设备、存储介质、终端
CN108764341B (zh) * 2018-05-29 2019-07-19 中国矿业大学 一种变工况条件下的滚动轴承故障诊断方法
CN110658006B (zh) * 2018-06-29 2021-03-23 杭州萤石软件有限公司 一种扫地机器人故障诊断方法和扫地机器人

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001242880A (ja) * 2000-03-01 2001-09-07 Nippon Telegr & Teleph Corp <Ntt> 信号検出方法、信号の検索方法及び認識方法並びに記録媒体
CN1316726A (zh) * 2000-02-02 2001-10-10 摩托罗拉公司 语音识别的方法和装置
CN1614685A (zh) * 2004-09-29 2005-05-11 上海交通大学 嵌入式语音命令识别系统中非命令词快速拒识方法
CN101067930A (zh) * 2007-06-07 2007-11-07 深圳先进技术研究院 一种智能音频辨识系统及辨识方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1316726A (zh) * 2000-02-02 2001-10-10 摩托罗拉公司 语音识别的方法和装置
JP2001242880A (ja) * 2000-03-01 2001-09-07 Nippon Telegr & Teleph Corp <Ntt> 信号検出方法、信号の検索方法及び認識方法並びに記録媒体
CN1614685A (zh) * 2004-09-29 2005-05-11 上海交通大学 嵌入式语音命令识别系统中非命令词快速拒识方法
CN101067930A (zh) * 2007-06-07 2007-11-07 深圳先进技术研究院 一种智能音频辨识系统及辨识方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUANG H. ET AL.: "A NEW EFFECTIVE METHOD ON AUDIO INFORMATION RETRIEVAL", COMPUTER APPLICATION INVESTIGATION, no. 3, 2004, pages 85 - 87 *
YANG X. ET AL.: "Multi-class signal classification algorithm based on wavelet subspace", SVM AND FUZZY INTEGRAL, INFORMATION AND CONTROL, vol. 36, no. 2, April 2007 (2007-04-01), pages 211 - 217 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184732A (zh) * 2011-04-28 2011-09-14 重庆邮电大学 基于分形特征的智能轮椅语音识别控制方法及系统
CN111370025A (zh) * 2020-02-25 2020-07-03 广州酷狗计算机科技有限公司 音频识别方法、装置及计算机存储介质

Also Published As

Publication number Publication date
CN101067930B (zh) 2011-06-29
CN101067930A (zh) 2007-11-07

Similar Documents

Publication Publication Date Title
WO2008148289A1 (fr) Système et procédé d&#39;identification audio intelligents
CN109300471B (zh) 融合声音采集识别的场区智能视频监控方法、装置及系统
CN102163427B (zh) 一种基于环境模型的音频异常事件检测方法
CN101494049B (zh) 一种用于音频监控系统中的音频特征参数的提取方法
Carletti et al. Audio surveillance using a bag of aural words classifier
Conte et al. An ensemble of rejecting classifiers for anomaly detection of audio events
CN105424395A (zh) 设备故障的确定方法和装置
Ntalampiras et al. Acoustic detection of human activities in natural environments
CN111436944A (zh) 一种基于智能移动终端的跌倒检测方法
CN103336832A (zh) 基于质量元数据的视频分类器构造方法
Sharma et al. Two-stage supervised learning-based method to detect screams and cries in urban environments
CN114371353A (zh) 一种基于声纹识别的电力设备异常监测方法及系统
Wan et al. Recognition of potential danger to buried pipelines based on sounds
Dong et al. At the speed of sound: Efficient audio scene classification
CN113707175B (zh) 基于特征分解分类器与自适应后处理的声学事件检测系统
Zhao et al. Event classification for living environment surveillance using audio sensor networks
Czúni et al. Time domain audio features for chainsaw noise detection using WSNs
CN113450827A (zh) 基于压缩神经网络的设备异常工况声纹分析算法
CN110065867B (zh) 基于音视频的电梯舒适度评价的方法和系统
CN103310088A (zh) 照明电耗异常的自动检测方法
CN116416665A (zh) 基于安防系统的人脸识别方法、装置及存储介质
CN115062725A (zh) 酒店收益异常分析方法及系统
Amayri et al. Estimating occupancy in an office setting
CN106530199A (zh) 基于窗口式假设检验的多媒体综合隐写分析方法
CN110874584A (zh) 一种基于改进原型聚类的叶片故障诊断方法

Legal Events

Date Code Title Description
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08733963

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08733963

Country of ref document: EP

Kind code of ref document: A1