CN112735448A

CN112735448A - Sound detection method and system based on target detection

Info

Publication number: CN112735448A
Application number: CN202011480987.3A
Authority: CN
Inventors: 鲍亭文; 朱小芹; 王旻轩; 刘展; 金超
Original assignee: Beijing Cyberinsight Technology Co ltd
Current assignee: Beijing Cyberinsight Technology Co ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-04-30

Abstract

According to the sound detection method and system based on target detection, the specific form of the target sound on the spectrogram is identified by using the target detection algorithm on the spectrogram of the sound signal, noise reduction is not needed on the sound, the anti-interference performance on various environmental noises is good, misjudgment is not generated, and the generalization of the model is improved; the target sounds in different frequency bands or in different frequency bands do not need to be retrained, and the trained model can be generalized to the same type of target sounds which appear in different frequency bands, have different sound pressure levels and slightly different frequency spectrum forms, and can be applied to all target sounds which accord with certain frequency spectrum form characteristics.

Description

Sound detection method and system based on target detection

Technical Field

The application relates to a sound detection method and system based on target detection, which are applicable to the technical field of sound signal detection.

Background

The blade of the wind driven generator is one of important parts of a wind turbine generator set for converting wind energy into mechanical energy, and is a basis for obtaining higher wind energy utilization coefficient and economic benefit, and the performance and the generating efficiency of the whole machine are directly influenced by the state of the blade. The comprehensive benefits of the wind field are also seriously influenced by the frequent operation and maintenance of the blades and accidents. During the operation of the blade for sweeping wind, the judgment of the fault type and the fault position can be assisted by the identification of some target sounds. In different existing scenes, a thunder and lightning identification method based on a sound signal is mostly based on judgment of a burst high-energy signal or is combined with other types of signals such as images, currents and the like to judge and monitor the occurrence of thunder and lightning. The method based on the burst high-energy signal can not well distinguish other sounds with similar characteristics possibly generated in the environment through a threshold value, such as impact sound, blasting sound and the like; the method based on various signals has strong recognition capability for the lightning phenomenon, but the monitoring cost is high and the maintenance is more complicated because various monitoring means are required to be installed, and the same method is only suitable for a single scene of lightning recognition.

For fault detection of whistle generated when a drain hole is blocked, the front edge is corroded and the like, the existing method is to extract whistle forms or characteristics and identify the whistle forms or characteristics by a clustering or polynomial fitting correlation method. This type of approach allows identification of whistles of the same shape. When a fault develops or a whistle with a new shape which is not completely the same appears in different frequency domains, a good identification accuracy rate cannot be achieved. The frequency band of whistle is related to the fault position, and the form is related to the fault type, which changes with time. In addition, the method has high requirements on data quality and feature extraction, and has low generalization on unknown noise of unknown environment.

The Chinese patent application 201710419138.9 sets an energy threshold value for a specific frequency domain range in a collected sound sample, and determines that thunder is generated when the threshold value exceeds a preset range; the method simply carries out rule judgment through frequency domain energy, the rule is single, and misjudgment of all other noises with short-time high energy in the environment cannot be avoided. The Chinese patent application 201910331781.5 comprehensively collects lightning information through an image, temperature, humidity and electromagnetic field measuring device, the device firstly triggers data collection through an optical detection device, and then collects information such as an image, an electric field, a magnetic field and the like; the system does not describe whether the thunder is judged after data acquisition, so that the system can be understood as judging the thunder only through light intensity; the method also has a single judgment standard, cannot distinguish other signals with short-term high brightness in the environment, and has higher monitoring cost for collecting various signals. The Chinese patent application 201510115347.5 extracts the spectrum curve of the whistle of the blade, then fits a polynomial to reconstruct the whistle form, and identifies the whistle through the correlation between the signal and a reconstruction model; the method needs an effective extraction method for the target sound, has high sensitivity to noise, and often has low generalization because sound signals contain various environmental noises in an actual scene; secondly, the method carries out polynomial fitting on the characteristics, and the generalization of whistling and other fault sound forms in the whole life cycle of the fan along with the change of time is not enough, and the stability is not high. The chinese patent application 201910603546.9 frames the sound signals of the wind turbine, extracts features of the framed sound signals for secondary clustering, and judges the fault according to the periodicity of the category label. The method needs to select effective characteristics for fault characteristics, and target sounds such as whistling sounds and thunder sounds which are possibly changed in frequency domain and sound intensity cannot be identified. In addition, the patent uses a binary classification method, which does not well distinguish the unseen fault characteristics and cannot identify the fault type of the abnormal state.

The schemes in the prior art can only detect single target sound, the same method cannot be reused in the detection of other target sounds, and the sound-based schemes are all easily affected by various environmental noises during collection.

Disclosure of Invention

The invention aims to provide a sound detection method and a sound detection system based on target detection, wherein a sound signal is simply used, and a target sound is accurately identified by a method of carrying out target detection on a spectrogram of the signal. The method has the advantages of single monitoring signal and high generalization, can accurately identify the form of the target sound in the spectrogram, and has little influence on the accuracy due to the frequency band change of the target sound or various noises appearing in the environment.

The application relates to a sound detection method based on target detection, which comprises a training process and a prediction process, wherein the training process comprises the following steps:

(1.1) collecting a plurality of groups of historical data with target sounds and carrying out quality screening on the data;

(1.2) carrying out spectrum conversion on the data to obtain a spectrogram;

(1.3) converting the spectrogram into a picture and storing the picture;

(1.4) marking the position of the target sound on the spectrogram by using a target detection marking tool;

(1.5) training the labeled picture data by using a target detection model;

the prediction process comprises the following steps:

(2.1) carrying out data quality screening and spectrum conversion on the collected audio signal to be detected;

(2.2) converting the frequency spectrum of the data to be detected into a picture;

(2.3) predicting the generated picture by using the trained target detection model;

and (2.4) when the model identifies that the picture contains the target sound, calculating the frequency band and the time period of the target sound and outputting the frequency band and the time period as a result.

In the step (2.3), after prediction, the model outputs the number, probability and frame position of the detection targets; after the step (2.3), the method can further comprise the following steps: generating a detection factor according to a detection result of the model; the target sound is identified according to the model and the probability or frequency mapping of the target sound occurrence becomes the detection factor.

The application also relates to a sound detection system based on target detection, which comprises a sound sensor, a machine end hardware device and an operation module, wherein the operation module runs the sound detection method.

The voice detection system further comprises a station-side server, and the operation module can be arranged on the station-side server; the end hardware device may include an edge hardware piece counting system.

Drawings

Fig. 1 is a schematic flow chart of a sound detection method based on object detection according to the present application.

Fig. 2 is a schematic diagram of different forms of thunder samples identified in an embodiment of the present application.

FIG. 3 is a schematic diagram of whistle samples of different frequency bands identified in an embodiment of the present application.

FIG. 4 is a schematic illustration of different forms of whistle samples identified in an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The sound signal that contains or not contain fan wind-sweeping pneumatic noise in this application passes through audio equipment collection period, through further analysis to sound signal, realizes the discernment to wherein target sound. The sound detection system based on target detection comprises a sound sensor used for collecting sounds of a fan, the environment and the like, a machine end hardware device and application software running on a station end server. The end hardware device may further include an edge hardware piece acquisition system for acquiring operational data and/or environmental data of the system components.

The target sound is suitable for the sound with certain shape characteristics on the spectrogram of the audio frequency and in the time-frequency domain. For example, a whistle in the form of an S-shape. The width, the height and the shape of the whistle are different in different fans, but the S-shaped characteristic is maintained overall, so that the whistle can be used by the whistle method. For another example, thunder is sound generated in the random lightning discharge process, the frequency spectrum of the thunder has several different forms according to the distance of thunder, but the overall form characteristics of the thunder are also kept consistent, namely, the frequency domain range of the sound is from large to small for a certain time after the sound is strong, and the target sound can also be obtained by using the method. The blade front edge protective film is tilted to generate sound similar to whistle, and thunder is identified to be lightning stroke damage and the like before the blade is seriously failed. The target sounds have respective specific forms on the spectrogram, such as whistling sounds are approximately S-shaped, thunder sounds are approximately triangular, and the like, so that the target sounds can be identified and positioned by a method for detecting targets in the image by converting the frequency spectrum into the image.

The sound detection method based on target detection comprises a training process and a prediction process, and is shown in fig. 1. Wherein, the training process comprises the following steps:

(1.1) collecting a plurality of groups of historical data with target sounds;

(1.2) performing quality screening on the data; according to different target sounds, screening methods and standards may be different, and a machine learning method may be used manually or manually, mainly to ensure that the spectrum form of the target sounds is not completely covered by noise;

(1.3) performing spectrum conversion on the data to obtain a spectrogram, which can use but is not limited to Short Time Fourier Transform (STFT), mel spectrum, etc.;

(1.4) converting the spectrogram into pictures and storing the pictures, wherein if the sound signal is longer, the pictures can be stored into a plurality of small pictures in a sliding window mode;

(1.5) marking the position of the target sound on the spectrogram by using a target detection marking tool;

(1.6) training the labeled picture data by using a target detection model; models include, but are not limited to, Yolo, SSD, R-CNN, AttentionNet, etc. training may choose to use weights that have been trained well in the common data set as initial weights, depending on the size of the historical data set.

The prediction process comprises the following steps:

(2.1) carrying out data quality screening and spectrum conversion on the collected audio signal to be detected, wherein the specific method is consistent with the training process, and if the data sampling rate in the prediction process is different from that in the training process, the sampling rate can be unified through resampling;

(2.2) converting the frequency spectrum of the data to be detected into a picture, and keeping the format size and the like of the generated picture consistent with those of the picture during training; in the prediction process, the picture can be stored, and can also be directly cached in a memory for use;

(2.3) predicting the generated picture by using the trained target detection model, and outputting the number, probability and frame position of the detection target by using the model;

(2.4) generating a correlation detection factor according to the result of the model, wherein the correlation detection factor can be a factor according to whether the model identifies the target sound and the probability/maximum probability/frequency mapping of the target sound; the specific mapping mode is formulated according to the purpose of target sound detection, and when the audio signal sliding window is a plurality of pictures, the identification result of each picture is comprehensively considered during mapping;

and (2.5) when the model identifies the form that the picture contains the target sound, calculating the frequency band and the time period of the target sound according to the position of the output target frame in the spectrogram and outputting the frequency band and the time period as the result.

Examples

The marked thunder and whistle data are used for respectively training a thunder and whistle recognition model, a brand new wind field is monitored by using the model, and thunder samples in different forms are recognized, as shown in fig. 2. Different frequency bands, different forms of whistle samples are identified as shown in figures 3 and 4. In this embodiment, the training of the model can be completed by only marking 20 pictures and using the weights trained by the public data set as the initial weights of the model, and the model uses the Yolo v3 target detection model. The two models are verified by at least 5 wind fields for several months respectively, and the results that the recognition accuracy is greater than 95% and the false alarm rate is less than 3% can be achieved without retraining the models.

Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims

1. A sound detection method based on target detection comprises a training process and a prediction process, and is characterized in that: the training process comprises the following steps:

(1.2) carrying out spectrum conversion on the data to obtain a spectrogram;

(1.3) converting the spectrogram into a picture and storing the picture;

(1.5) training the labeled picture data by using a target detection model;

the prediction process comprises the following steps:

2. The sound detection method according to claim 1, characterized in that: in the step (2.3), after prediction, the model outputs the number, probability and frame position of the detection target.

3. The sound detection method according to claim 1 or 2, characterized in that: after the step (2.3), the method can further comprise the following steps: and generating a detection factor according to the detection result of the model.

4. The sound detection method according to claim 2 or 3, characterized in that: the target sound is identified according to the model and the probability or frequency mapping of the target sound occurrence becomes the detection factor.

5. The utility model provides a sound detecting system based on target detection, includes sound sensor, machine end hardware device and operation module, its characterized in that: the arithmetic module runs the sound detection method according to any one of claims 1 to 4.

6. The sound detection system of claim 5, wherein: the sound detection system further comprises a station-side server, and the operation module is arranged on the station-side server.

7. The sound detection system according to claim 5 or 6, characterized in that: the end hardware device comprises an edge hardware piece counting system.