CN112735448A - Sound detection method and system based on target detection - Google Patents
Sound detection method and system based on target detection Download PDFInfo
- Publication number
- CN112735448A CN112735448A CN202011480987.3A CN202011480987A CN112735448A CN 112735448 A CN112735448 A CN 112735448A CN 202011480987 A CN202011480987 A CN 202011480987A CN 112735448 A CN112735448 A CN 112735448A
- Authority
- CN
- China
- Prior art keywords
- sound
- target
- detection
- model
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Emergency Alarm Devices (AREA)
Abstract
According to the sound detection method and system based on target detection, the specific form of the target sound on the spectrogram is identified by using the target detection algorithm on the spectrogram of the sound signal, noise reduction is not needed on the sound, the anti-interference performance on various environmental noises is good, misjudgment is not generated, and the generalization of the model is improved; the target sounds in different frequency bands or in different frequency bands do not need to be retrained, and the trained model can be generalized to the same type of target sounds which appear in different frequency bands, have different sound pressure levels and slightly different frequency spectrum forms, and can be applied to all target sounds which accord with certain frequency spectrum form characteristics.
Description
Technical Field
The application relates to a sound detection method and system based on target detection, which are applicable to the technical field of sound signal detection.
Background
The blade of the wind driven generator is one of important parts of a wind turbine generator set for converting wind energy into mechanical energy, and is a basis for obtaining higher wind energy utilization coefficient and economic benefit, and the performance and the generating efficiency of the whole machine are directly influenced by the state of the blade. The comprehensive benefits of the wind field are also seriously influenced by the frequent operation and maintenance of the blades and accidents. During the operation of the blade for sweeping wind, the judgment of the fault type and the fault position can be assisted by the identification of some target sounds. In different existing scenes, a thunder and lightning identification method based on a sound signal is mostly based on judgment of a burst high-energy signal or is combined with other types of signals such as images, currents and the like to judge and monitor the occurrence of thunder and lightning. The method based on the burst high-energy signal can not well distinguish other sounds with similar characteristics possibly generated in the environment through a threshold value, such as impact sound, blasting sound and the like; the method based on various signals has strong recognition capability for the lightning phenomenon, but the monitoring cost is high and the maintenance is more complicated because various monitoring means are required to be installed, and the same method is only suitable for a single scene of lightning recognition.
For fault detection of whistle generated when a drain hole is blocked, the front edge is corroded and the like, the existing method is to extract whistle forms or characteristics and identify the whistle forms or characteristics by a clustering or polynomial fitting correlation method. This type of approach allows identification of whistles of the same shape. When a fault develops or a whistle with a new shape which is not completely the same appears in different frequency domains, a good identification accuracy rate cannot be achieved. The frequency band of whistle is related to the fault position, and the form is related to the fault type, which changes with time. In addition, the method has high requirements on data quality and feature extraction, and has low generalization on unknown noise of unknown environment.
The Chinese patent application 201710419138.9 sets an energy threshold value for a specific frequency domain range in a collected sound sample, and determines that thunder is generated when the threshold value exceeds a preset range; the method simply carries out rule judgment through frequency domain energy, the rule is single, and misjudgment of all other noises with short-time high energy in the environment cannot be avoided. The Chinese patent application 201910331781.5 comprehensively collects lightning information through an image, temperature, humidity and electromagnetic field measuring device, the device firstly triggers data collection through an optical detection device, and then collects information such as an image, an electric field, a magnetic field and the like; the system does not describe whether the thunder is judged after data acquisition, so that the system can be understood as judging the thunder only through light intensity; the method also has a single judgment standard, cannot distinguish other signals with short-term high brightness in the environment, and has higher monitoring cost for collecting various signals. The Chinese patent application 201510115347.5 extracts the spectrum curve of the whistle of the blade, then fits a polynomial to reconstruct the whistle form, and identifies the whistle through the correlation between the signal and a reconstruction model; the method needs an effective extraction method for the target sound, has high sensitivity to noise, and often has low generalization because sound signals contain various environmental noises in an actual scene; secondly, the method carries out polynomial fitting on the characteristics, and the generalization of whistling and other fault sound forms in the whole life cycle of the fan along with the change of time is not enough, and the stability is not high. The chinese patent application 201910603546.9 frames the sound signals of the wind turbine, extracts features of the framed sound signals for secondary clustering, and judges the fault according to the periodicity of the category label. The method needs to select effective characteristics for fault characteristics, and target sounds such as whistling sounds and thunder sounds which are possibly changed in frequency domain and sound intensity cannot be identified. In addition, the patent uses a binary classification method, which does not well distinguish the unseen fault characteristics and cannot identify the fault type of the abnormal state.
The schemes in the prior art can only detect single target sound, the same method cannot be reused in the detection of other target sounds, and the sound-based schemes are all easily affected by various environmental noises during collection.
Disclosure of Invention
The invention aims to provide a sound detection method and a sound detection system based on target detection, wherein a sound signal is simply used, and a target sound is accurately identified by a method of carrying out target detection on a spectrogram of the signal. The method has the advantages of single monitoring signal and high generalization, can accurately identify the form of the target sound in the spectrogram, and has little influence on the accuracy due to the frequency band change of the target sound or various noises appearing in the environment.
The application relates to a sound detection method based on target detection, which comprises a training process and a prediction process, wherein the training process comprises the following steps:
(1.1) collecting a plurality of groups of historical data with target sounds and carrying out quality screening on the data;
(1.2) carrying out spectrum conversion on the data to obtain a spectrogram;
(1.3) converting the spectrogram into a picture and storing the picture;
(1.4) marking the position of the target sound on the spectrogram by using a target detection marking tool;
(1.5) training the labeled picture data by using a target detection model;
the prediction process comprises the following steps:
(2.1) carrying out data quality screening and spectrum conversion on the collected audio signal to be detected;
(2.2) converting the frequency spectrum of the data to be detected into a picture;
(2.3) predicting the generated picture by using the trained target detection model;
and (2.4) when the model identifies that the picture contains the target sound, calculating the frequency band and the time period of the target sound and outputting the frequency band and the time period as a result.
In the step (2.3), after prediction, the model outputs the number, probability and frame position of the detection targets; after the step (2.3), the method can further comprise the following steps: generating a detection factor according to a detection result of the model; the target sound is identified according to the model and the probability or frequency mapping of the target sound occurrence becomes the detection factor.
The application also relates to a sound detection system based on target detection, which comprises a sound sensor, a machine end hardware device and an operation module, wherein the operation module runs the sound detection method.
The voice detection system further comprises a station-side server, and the operation module can be arranged on the station-side server; the end hardware device may include an edge hardware piece counting system.
According to the sound detection method and system based on target detection, the specific form of the target sound on the spectrogram is identified by using the target detection algorithm on the spectrogram of the sound signal, noise reduction is not needed on the sound, the anti-interference performance on various environmental noises is good, misjudgment is not generated, and the generalization of the model is improved; the target sounds in different frequency bands or in different frequency bands do not need to be retrained, and the trained model can be generalized to the same type of target sounds which appear in different frequency bands, have different sound pressure levels and slightly different frequency spectrum forms, and can be applied to all target sounds which accord with certain frequency spectrum form characteristics.
Drawings
Fig. 1 is a schematic flow chart of a sound detection method based on object detection according to the present application.
Fig. 2 is a schematic diagram of different forms of thunder samples identified in an embodiment of the present application.
FIG. 3 is a schematic diagram of whistle samples of different frequency bands identified in an embodiment of the present application.
FIG. 4 is a schematic illustration of different forms of whistle samples identified in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The sound signal that contains or not contain fan wind-sweeping pneumatic noise in this application passes through audio equipment collection period, through further analysis to sound signal, realizes the discernment to wherein target sound. The sound detection system based on target detection comprises a sound sensor used for collecting sounds of a fan, the environment and the like, a machine end hardware device and application software running on a station end server. The end hardware device may further include an edge hardware piece acquisition system for acquiring operational data and/or environmental data of the system components.
The target sound is suitable for the sound with certain shape characteristics on the spectrogram of the audio frequency and in the time-frequency domain. For example, a whistle in the form of an S-shape. The width, the height and the shape of the whistle are different in different fans, but the S-shaped characteristic is maintained overall, so that the whistle can be used by the whistle method. For another example, thunder is sound generated in the random lightning discharge process, the frequency spectrum of the thunder has several different forms according to the distance of thunder, but the overall form characteristics of the thunder are also kept consistent, namely, the frequency domain range of the sound is from large to small for a certain time after the sound is strong, and the target sound can also be obtained by using the method. The blade front edge protective film is tilted to generate sound similar to whistle, and thunder is identified to be lightning stroke damage and the like before the blade is seriously failed. The target sounds have respective specific forms on the spectrogram, such as whistling sounds are approximately S-shaped, thunder sounds are approximately triangular, and the like, so that the target sounds can be identified and positioned by a method for detecting targets in the image by converting the frequency spectrum into the image.
The sound detection method based on target detection comprises a training process and a prediction process, and is shown in fig. 1. Wherein, the training process comprises the following steps:
(1.1) collecting a plurality of groups of historical data with target sounds;
(1.2) performing quality screening on the data; according to different target sounds, screening methods and standards may be different, and a machine learning method may be used manually or manually, mainly to ensure that the spectrum form of the target sounds is not completely covered by noise;
(1.3) performing spectrum conversion on the data to obtain a spectrogram, which can use but is not limited to Short Time Fourier Transform (STFT), mel spectrum, etc.;
(1.4) converting the spectrogram into pictures and storing the pictures, wherein if the sound signal is longer, the pictures can be stored into a plurality of small pictures in a sliding window mode;
(1.5) marking the position of the target sound on the spectrogram by using a target detection marking tool;
(1.6) training the labeled picture data by using a target detection model; models include, but are not limited to, Yolo, SSD, R-CNN, AttentionNet, etc. training may choose to use weights that have been trained well in the common data set as initial weights, depending on the size of the historical data set.
The prediction process comprises the following steps:
(2.1) carrying out data quality screening and spectrum conversion on the collected audio signal to be detected, wherein the specific method is consistent with the training process, and if the data sampling rate in the prediction process is different from that in the training process, the sampling rate can be unified through resampling;
(2.2) converting the frequency spectrum of the data to be detected into a picture, and keeping the format size and the like of the generated picture consistent with those of the picture during training; in the prediction process, the picture can be stored, and can also be directly cached in a memory for use;
(2.3) predicting the generated picture by using the trained target detection model, and outputting the number, probability and frame position of the detection target by using the model;
(2.4) generating a correlation detection factor according to the result of the model, wherein the correlation detection factor can be a factor according to whether the model identifies the target sound and the probability/maximum probability/frequency mapping of the target sound; the specific mapping mode is formulated according to the purpose of target sound detection, and when the audio signal sliding window is a plurality of pictures, the identification result of each picture is comprehensively considered during mapping;
and (2.5) when the model identifies the form that the picture contains the target sound, calculating the frequency band and the time period of the target sound according to the position of the output target frame in the spectrogram and outputting the frequency band and the time period as the result.
Examples
The marked thunder and whistle data are used for respectively training a thunder and whistle recognition model, a brand new wind field is monitored by using the model, and thunder samples in different forms are recognized, as shown in fig. 2. Different frequency bands, different forms of whistle samples are identified as shown in figures 3 and 4. In this embodiment, the training of the model can be completed by only marking 20 pictures and using the weights trained by the public data set as the initial weights of the model, and the model uses the Yolo v3 target detection model. The two models are verified by at least 5 wind fields for several months respectively, and the results that the recognition accuracy is greater than 95% and the false alarm rate is less than 3% can be achieved without retraining the models.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.
Claims (7)
1. A sound detection method based on target detection comprises a training process and a prediction process, and is characterized in that: the training process comprises the following steps:
(1.1) collecting a plurality of groups of historical data with target sounds and carrying out quality screening on the data;
(1.2) carrying out spectrum conversion on the data to obtain a spectrogram;
(1.3) converting the spectrogram into a picture and storing the picture;
(1.4) marking the position of the target sound on the spectrogram by using a target detection marking tool;
(1.5) training the labeled picture data by using a target detection model;
the prediction process comprises the following steps:
(2.1) carrying out data quality screening and spectrum conversion on the collected audio signal to be detected;
(2.2) converting the frequency spectrum of the data to be detected into a picture;
(2.3) predicting the generated picture by using the trained target detection model;
and (2.4) when the model identifies that the picture contains the target sound, calculating the frequency band and the time period of the target sound and outputting the frequency band and the time period as a result.
2. The sound detection method according to claim 1, characterized in that: in the step (2.3), after prediction, the model outputs the number, probability and frame position of the detection target.
3. The sound detection method according to claim 1 or 2, characterized in that: after the step (2.3), the method can further comprise the following steps: and generating a detection factor according to the detection result of the model.
4. The sound detection method according to claim 2 or 3, characterized in that: the target sound is identified according to the model and the probability or frequency mapping of the target sound occurrence becomes the detection factor.
5. The utility model provides a sound detecting system based on target detection, includes sound sensor, machine end hardware device and operation module, its characterized in that: the arithmetic module runs the sound detection method according to any one of claims 1 to 4.
6. The sound detection system of claim 5, wherein: the sound detection system further comprises a station-side server, and the operation module is arranged on the station-side server.
7. The sound detection system according to claim 5 or 6, characterized in that: the end hardware device comprises an edge hardware piece counting system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011480987.3A CN112735448A (en) | 2020-12-15 | 2020-12-15 | Sound detection method and system based on target detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011480987.3A CN112735448A (en) | 2020-12-15 | 2020-12-15 | Sound detection method and system based on target detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112735448A true CN112735448A (en) | 2021-04-30 |
Family
ID=75602388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011480987.3A Pending CN112735448A (en) | 2020-12-15 | 2020-12-15 | Sound detection method and system based on target detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112735448A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102494894A (en) * | 2011-11-17 | 2012-06-13 | 高丙团 | Audio monitoring and fault diagnosis system for wind generating set and audio monitoring and fault diagnosis method for same |
CN104677623A (en) * | 2015-03-16 | 2015-06-03 | 西安交通大学 | On-site acoustic diagnosis method and monitoring system for wind turbine blade failure |
US10504504B1 (en) * | 2018-12-07 | 2019-12-10 | Vocalid, Inc. | Image-based approaches to classifying audio data |
US20200184991A1 (en) * | 2018-12-05 | 2020-06-11 | Pascal Cleve | Sound class identification using a neural network |
CN111306010A (en) * | 2020-04-17 | 2020-06-19 | 北京天泽智云科技有限公司 | Method and system for detecting lightning damage of fan blade |
CN111306008A (en) * | 2019-12-31 | 2020-06-19 | 远景智能国际私人投资有限公司 | Fan blade detection method, device, equipment and storage medium |
-
2020
- 2020-12-15 CN CN202011480987.3A patent/CN112735448A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102494894A (en) * | 2011-11-17 | 2012-06-13 | 高丙团 | Audio monitoring and fault diagnosis system for wind generating set and audio monitoring and fault diagnosis method for same |
CN104677623A (en) * | 2015-03-16 | 2015-06-03 | 西安交通大学 | On-site acoustic diagnosis method and monitoring system for wind turbine blade failure |
US20200184991A1 (en) * | 2018-12-05 | 2020-06-11 | Pascal Cleve | Sound class identification using a neural network |
US10504504B1 (en) * | 2018-12-07 | 2019-12-10 | Vocalid, Inc. | Image-based approaches to classifying audio data |
CN111306008A (en) * | 2019-12-31 | 2020-06-19 | 远景智能国际私人投资有限公司 | Fan blade detection method, device, equipment and storage medium |
CN111306010A (en) * | 2020-04-17 | 2020-06-19 | 北京天泽智云科技有限公司 | Method and system for detecting lightning damage of fan blade |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7199608B2 (en) | Methods and apparatus for inspecting wind turbine blades, and equipment and storage media therefor | |
CN112727710B (en) | Wind field thunderbolt density statistical method and system based on audio signals | |
CN111161756B (en) | Method for extracting and identifying abnormal whistle contour in wind sweeping sound signal of fan blade | |
CN112201260B (en) | Transformer running state online detection method based on voiceprint recognition | |
CN108169639B (en) | Method for identifying switch cabinet fault based on parallel long-time and short-time memory neural network | |
CN110259648B (en) | Fan blade fault diagnosis method based on optimized K-means clustering | |
CN111306010B (en) | Method and system for detecting lightning damage of fan blade | |
WO2023245990A1 (en) | Classification and abnormality detection method for electrical devices of power plant on basis of multi-information fusion | |
CN112560673A (en) | Thunder detection method and system based on image recognition | |
KR102314824B1 (en) | Acoustic event detection method based on deep learning | |
CN115618205A (en) | Portable voiceprint fault detection system and method | |
CN115932659A (en) | Transformer fault detection method based on voiceprint characteristics | |
CN115467787A (en) | Motor state detection system and method based on audio analysis | |
CN114694640A (en) | Abnormal sound extraction and identification method and device based on audio frequency spectrogram | |
CN112735448A (en) | Sound detection method and system based on target detection | |
Zhu et al. | Wind turbine blade fault detection by acoustic analysis: Preliminary results | |
CN114242085A (en) | Fault diagnosis method and device for rotating equipment | |
CN117093938A (en) | Fan bearing fault detection method and system based on deep learning | |
CN116230013A (en) | Transformer fault voiceprint detection method based on x-vector | |
CN110346032A (en) | A kind of Φ-OTDR vibration signal end-point detecting method combined based on constant false alarm with zero-crossing rate | |
EP4254261A1 (en) | Blade fault diagnosis method, apparatus and system, and storage medium | |
CN114093385A (en) | Unmanned aerial vehicle detection method and device | |
CN108106717A (en) | A kind of method based on voice signal identification set state | |
CN111833905B (en) | System and method for detecting quality of marked character based on audio analysis | |
CN112926626B (en) | Fan blade fault detection method based on sparse Bayesian learning and power spectrum separation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210430 |