CN111341343A

CN111341343A - Online updating system and method for abnormal sound detection

Info

Publication number: CN111341343A
Application number: CN202010135422.5A
Authority: CN
Inventors: 王旺旺
Original assignee: Espressif Systems Shanghai Co Ltd
Current assignee: Espressif Systems Shanghai Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2020-06-26
Anticipated expiration: 2040-03-02
Also published as: CN111341343B

Abstract

The invention provides an online updating system for abnormal sound detection, which comprises an embedded end system and a server end system. The embedded end system collects the working audio of the sound source to be detected and inputs the working audio into the neural network module to carry out offline abnormal sound classification. And when the offline abnormal sound classification result is that the abnormal sound is not identified, the unknown abnormal audio is reported by the unknown abnormal reporting module. And the server end system classifies unknown abnormal audios according to the types of the equipment, finishes data cleaning and clustering, adjusts the network structure of the neural network module, generates a new training set and a new verification set, then performs abnormal sound detection model training on the equipment of the type, and sends the training result to the embedded end system reporting the unidentified abnormal audios after the training is finished, so as to update the abnormal sound detection model. The invention also comprises a method for upgrading the abnormal sound detection system by utilizing the online updating system. The invention realizes the dynamic update of the abnormal sound detection model and can adapt to the changing abnormal sound diagnosis working environment.

Description

Online updating system and method for abnormal sound detection

Technical Field

The invention relates to the field of embedded equipment, in particular to an online updating system and method for abnormal sound detection.

Background

Voice is a convenient, effective and fast way to transfer information. At present, the abnormal detection of the operation of a vehicle, the fault detection of mechanical equipment such as a compressor, a motor and the like, the detection of abnormal sounds in a room, the detection of crying of children and the like are mainly based on human judgment and depend too much on the subjective experience of people, so that the positioning error is large and the cost is high.

In recent years, some abnormal sound detection methods based on deep learning appear, which can show good practical application effect, but also have some defects:

1. the system is complex, the calculation amount is large, the system depends on a complex calculation Unit and even a GPU (Graphic processing Unit), and for equipment such as an air conditioner and a compressor, the abnormal sound detection system is difficult to deploy and has high cost.

2. The huge detection system needs to be deployed in a server, work data are transmitted to the server after a sound source to be detected is networked, and the work data are transmitted back to equipment through the network after being detected on the server. This has the following problems:

the abnormal sound detection system cannot work when the network is not connected due to the fact that the abnormal sound detection system is dependent on the network environment.

Because the working sound frequency domain of sound sources to be detected, such as air conditioners, compressors and the like, has large span, 48KHz is required to be used as the sound sampling frequency, the transmitted data volume is large, and the pressure on a network is large.

When real-time audio is transmitted through a network, a frame drop phenomenon is very easy to occur, and when a frame drop occurs in an audio stream, the frequency spectrum characteristics of the audio stream may change along with the frame drop phenomenon, so that abnormal sound detection fails.

Deployment of detection systems to an offline environment also presents problems: the waiting sound source structure of large machine machinery is complex, the possible abnormal kinds are many, when the abnormal sound detection is carried out by deep learning, when the unknown abnormal sound which is not learned by the detection system appears, the abnormal sound can not be positioned. Therefore, the abnormal type cannot be updated well, resulting in poor abnormal sound recognition effect.

Disclosure of Invention

The invention aims to provide an online updating system and method for abnormal sound detection, which mainly solve the problems in the prior art, can continuously keep the advantage of small dependence of an offline abnormal sound detection system on a network, and simultaneously increase the capability of the offline abnormal sound detection system for rapidly learning and identifying new unknown abnormal sounds, and improve the reliability of abnormal sound detection of equipment.

In order to achieve the above object, the technical solution adopted by the present invention is to provide an embedded end system of an online update system for abnormal sound detection, which operates at a sound source end to be detected, and is characterized by comprising an equipment sound acquisition module, an equipment sound audio feature extraction module, a neural network module and an unknown abnormality reporting module;

the equipment sound acquisition module converts the sound emitted by the sound source to be detected into an audio digital signal and then transmits the audio digital signal to the equipment sound audio feature extraction module; the device sound audio characteristic extraction module processes the audio digital signal on a frequency domain to obtain audio frequency samples which are used as the input of the neural network module;

the neural network module processes the audio frequency samples to finish abnormal sound classification, and the output abnormal types comprise N types of abnormal, unidentified abnormal and abnormal; the number of the abnormal types is determined by an abnormal sound detection model; the network structure of the neural network module is determined according to the number of the abnormal types and is dynamically variable; the abnormal sound detection model also determines working parameters of the neural network module;

when the abnormal sound detection result diagnosed by the neural network module is the unidentified abnormality, the unknown abnormality reporting module collects the MAC address and the model of the sound source to be detected to form header information, and the header information and the corresponding audio digital signal are combined into abnormal audio and then reported in a wireless transmission mode.

And further, the method is also suitable for receiving the trained abnormal sound detection model and finishing the updating of the abnormal sound detection model.

The technical scheme adopted by the invention also provides a server side system of the online updating system for abnormal sound detection, which is characterized in that the server side system comprises an online learning module, a model training module and a model updating module;

the online learning module receives reported abnormal audio, performs data processing on the abnormal audio, adjusts the network structure of the neural network module corresponding to the abnormal sound detection model, and divides the abnormal audio into a training set and a verification set; the abnormal audio comprises the MAC address and the model of the sound source to be detected and an audio digital signal corresponding to the unidentified abnormality;

the model training module performs model training on the abnormal sound detection model to obtain a trained abnormal sound detection model;

and the model updating module issues the trained abnormal sound detection model to the embedded end system, and updates the abnormal sound detection model in the embedded end system.

Further, the online learning module classifies the abnormal audio according to the model of the sound source to be tested to obtain equipment-associated abnormal audio, and then establishes the training set and the verification set by using the equipment-associated abnormal audio for each sound source to be tested.

Further, the data processing of the equipment-associated abnormal audio by the online learning module comprises data cleaning and abnormal category clustering; the data processing obtains M clustering categories as M new abnormal categories;

according to the M new abnormal categories and the N known abnormal categories, the online learning module adjusts the classification dimensionality of the abnormal sound detection model to be N + M +2, and the method comprises the following steps: n + M anomalies, no identified anomalies, and no anomalies.

Further, the process of data cleaning and abnormal category clustering specifically includes:

step 101, performing VAD (voice activity detection) judgment on the equipment-associated abnormal audio, and deleting mute audio to obtain an abnormal audio set;

102, dividing the abnormal audio set into abnormal sound sampling segments of one frame in J seconds;

103, clustering the abnormal sound sampling segments by using a DBSCAN algorithm, and discarding the abnormal sound sampling segments corresponding to the noise to obtain the M clustering categories as the M new abnormal categories.

Further, the online learning module extracts an audio frequency of L minutes from the abnormal sound sampling segment as a verification set of the trained abnormal sound detection model, adds the audio frequency into an original verification set to form an extended verification set, and adds the abnormal sound sampling segment without the verification set into an original training set to form an extended training set.

Further, the process of performing the model training on the abnormal sound detection model by the model training module includes:

step 201, setting the number of data iterations to be 0; dividing the training set into a plurality of batch data by taking K seconds as a unit, and discarding insufficient data;

step 202, taking one batch from the training set;

step 203, adding one to the iteration number;

step 204, solving the frequency domain characteristics of the batch data by using fast Fourier transform;

step 205, performing a round of training by using the frequency domain characteristics of the batch data;

step 206, judging whether the whole training set is traversed according to the iteration times; if the traversal is completed, obtaining a candidate abnormal sound detection model, entering step 207, and if the traversal is not completed, skipping to step 202;

step 207, calculating a loss value of the candidate abnormal sound detection model on the verification set;

step 208, if the loss value is the minimum value in the model training and the loss value is not reduced compared with the loss value at the last time, recording the current candidate abnormal sound detection model as the optimal abnormal sound detection model, and skipping to step 210; otherwise, jumping to step 209;

step 209, clearing the iteration times, adding one to the epoch iteration times, and skipping to step 202;

and step 210, finishing training.

Further, the model updating module issues the trained abnormal sound detection model and completes updating of the abnormal sound detection model according to the MAC address and the model of the sound source to be detected corresponding to the header information, and does not update the abnormal sound detection model for the embedded end system that does not report an error.

The technical scheme adopted by the invention also provides an online updating method, which is suitable for an embedded end system of the online updating system and is characterized in that the online updating method comprises the following steps:

step 301, converting the sound emitted by the sound source to be tested into the audio digital signal;

step 302, processing the audio digital signal in a frequency domain to obtain the audio frequency sample;

step 303, processing the audio frequency samples to finish the abnormal sound classification, wherein the output abnormal types comprise N types of abnormal, unidentified abnormal and abnormal; the number of the abnormal types is determined by the abnormal sound detection model;

and step 304, when the abnormal sound detection result is that the abnormal sound is not identified, the unknown abnormal reporting module collects the MAC address and the model of the sound source to be detected to form the header information, and the header information and the corresponding audio digital signal are combined into the abnormal audio and then are reported in a wireless transmission mode.

step 301B, converting the sound emitted by the sound source to be tested into the audio digital signal;

step 302B, processing the audio digital signal in a frequency domain to obtain the audio frequency sample;

step 303B, processing the audio frequency samples to complete the abnormal sound classification, wherein the output abnormal types comprise N types of abnormal, unidentified abnormal and abnormal; the number of the abnormal types is determined by the abnormal sound detection model;

step 304B, when the abnormal sound detection result is that the abnormal sound is not identified, the unknown abnormal reporting module collects the MAC address and the model of the sound source to be detected, constitutes the header information, and reports the header information and the corresponding audio digital signal after being combined into the abnormal audio through a wireless transmission mode;

and 305B, receiving the trained abnormal sound detection model, and finishing the updating of the abnormal sound detection model.

The technical scheme adopted by the invention also provides an online updating method, which is suitable for a server end system of an online updating system and is characterized in that the online updating method comprises the following steps:

step 401, receiving the abnormal audio, where the abnormal audio includes the MAC address and the model of the sound source to be detected, and the audio digital signal corresponding to the unidentified abnormality;

step 402, classifying the abnormal audio according to the model of the sound source to be detected to obtain the equipment-associated abnormal audio;

step 403, performing data cleaning and abnormal category clustering on the abnormal audio associated with the equipment, and adjusting the structure of the abnormal sound detection model according to a clustering result;

step 404, dividing the abnormal audio into the training set and the verification set;

step 405, performing model training on the abnormal sound detection model by using the training set and the verification set to obtain the trained abnormal sound detection model;

and 406, obtaining the sound source to be detected reporting the associated abnormal audio frequency of the equipment according to the MAC address in the header information, and updating the trained abnormal sound detection model into an embedded end system of an online updating system of the sound source to be detected.

In view of the above technical features, the present invention has the following advantages:

1. the main working mode of the invention is still an off-line abnormal sound detection system, and the invention has less dependence on the network, strong performance and reliable work.

2. The invention can complete the dynamic update of the neural model of the off-line abnormal sound detection system by utilizing the on-line update system, and is suitable for the working environment of abnormal sound diagnosis.

3. The invention realizes the directional update by the mode of reporting abnormal audio and adding equipment information at the same time, namely: after the data is updated, the data can be only sent to the reported equipment (and only the reported exception is sent down), so that the network resource is saved.

4. The invention introduces clustering operation to the data after framing to determine new abnormal sound types Ln +1, Ln +2, Ln + m corresponding to the equipment, thereby distinguishing different abnormal sound types and being capable of processing a large amount of unknown abnormalities on line.

5. The invention adopts FFT algorithm to replace MFCC algorithm, and furthest reserves the frequency domain characteristic of abnormal sound. Because the mel filter used in the MFCC algorithm is optimized for human hearing, i.e. when two sounds with different loudness act on human ears, the presence of the frequency components with higher loudness affects the perception of the frequency components with lower loudness, making them less noticeable. However, the sound source in the abnormal sound detection task is not human sound, and does not accord with the pronunciation characteristics of the human sound and the auditory characteristics of human ears.

Drawings

FIG. 1 is a schematic diagram of a system architecture of one embodiment of the present invention;

FIG. 2 is a flow chart of the online learning module performing data cleaning and abnormal category clustering according to one embodiment of the present invention;

FIG. 3 is a flowchart of the operation of the model training module of one embodiment of the present invention;

FIG. 4 is a flowchart illustrating the updating of the embedded terminal when the abnormal sound detection system is updated by the online updating system according to an embodiment of the present invention;

fig. 5 is an update flow chart of the server side when the online update system updates the abnormal sound detection system according to an embodiment of the present invention.

In the figure: the method comprises the following steps of 1-equipment sound collection module, 2-audio feature extraction and classification system, 3-unknown abnormality reporting module, 4-wireless network and 5-server end system.

Detailed Description

The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Referring to fig. 1, an online update system for abnormal sound detection according to the present embodiment includes an equipment sound collection module 1, an audio feature extraction and classification system 2, an unknown abnormality reporting module 3, a wireless network 4, and a server-side system 5.

The device sound collection module 1, the audio feature extraction and classification system 2 and the unknown anomaly reporting module 3 form an embedded end system of the online updating system.

The audio feature extraction and classification system 2 comprises a device sound audio feature extraction module and a neural network module.

The equipment sound acquisition module 1 converts the sound emitted by the sound source to be detected into an audio digital signal and then transmits the audio digital signal to the equipment sound audio feature extraction module; the device sound audio characteristic extraction module processes the collected audio digital signals on a frequency domain to obtain audio frequency samples which are used as the input of the neural network module. The neural network module further processes the audio frequency sampling, completes the abnormal sound classification, and finally outputs N +2 abnormal types, including: n anomalies, no identified anomalies, and no anomalies. The number of abnormal types is determined by an abnormal sound detection model; the network structure of the neural network module is determined according to the number of the abnormal types and is dynamically variable; the abnormal sound detection model also determines the working parameters of the neural network module.

In the early stage of the deployment of the abnormal sound detection model, model training is carried out according to a large amount of known abnormal audios in the early stage, and the obtained model can be ensured to basically cover the currently known abnormal sound type. The prototype model is then deployed to the audio feature extraction and classification system 2 of the sound source to be tested.

When a sound source to be detected enters an actual running state, the equipment sound collection module 1 collects sounds generated when the equipment runs, the audio characteristic extraction and classification system 2 starts to detect the sounds, when the detection result shows that the abnormality is not identified, the unknown abnormality reporting module 3 is used for connecting the server end system 5 through the wireless network 4 (for example, wireless transmission modes such as Wi-Fi or Bluetooth are used), and the collected unknown abnormal audio is sent to the server end system 5. Meanwhile, in order to ensure the accurate labeling of the audio source, device information, such as the MAC address and the product model of the device, is added in front of the first frame of audio to form audio header information during transmission, so that data can be classified and labeled on the server-side system 5.

The server-side system 5 includes an online learning module, a model training module, and a model updating module. After the online learning module of the server end system 5 receives the unknown abnormal audio, the unknown abnormal audio information is stored, then the online learning module is used for cleaning and clustering data according to the model of the equipment to generate abnormal sound sample information, a sample library and a training library are obtained, and the network structure of the neural network module is adjusted according to the structure of the data cleaning and clustering. The model training module utilizes the sample library and the training library to respectively train the neural network aiming at different types of equipment, and after training is completed, the latest abnormal sound detection model aiming at different types of equipment is obtained.

After the latest models aiming at different devices are obtained, the model updating module issues the corresponding models according to the device MAC addresses and the models in the uploading time header information: that is, not all updates are performed during the sending, and the devices without reported data are not updated. The latest model is transmitted to the embedded end system 2 through a wireless network 4 (wireless transmission modes such as Wi-Fi or Bluetooth) to complete the updating of the abnormal sound detection model in the audio feature extraction and classification system 2, so that the audio feature extraction and classification system 2 supports the detection of new abnormal sounds.

Referring to fig. 2, the present embodiment is an online update for abnormal sound detectionAfter receiving unknown abnormal audio, an online learning module in the system firstly reads the head information of the abnormal audio to classify the types of the equipment, and classifies the classified data into X types according to the types₁,X₂,...,X_NAnd N is the number of all device models.

When abnormal audios of different types of equipment are taken, data cleaning and abnormal category clustering are firstly carried out, and the method specifically comprises the following steps:

step 101, associating abnormal audio X with equipment_NDeleting the mute audio through VAD judgment to obtain an abnormal audio set X_N′。

102, framing, and dividing X_N' Small fragment x divided into 10 s-frame_tAs an abnormal sound sampling section.

Step 103, clustering abnormal sound sampling segments by using a DBSCAN algorithm, wherein the clustering comprises the following steps:

step 103-1, from X_NSelecting a data point, and taking an epsilon neighborhood of the point as a domain to be solved;

step 103-2, if enough points exist in the neighborhood epsilon of the point, starting a clustering process, and marking the current data point as the first point in the cluster 1, otherwise, marking as noise;

103-3, for the first point in the new cluster 1, searching points near the epsilon distance of the first point, which belong to one part of the same cluster, namely, dividing the points near the epsilon neighborhood of the point into the cluster 1, and then repeating all new points just added into the cluster 1;

step 103-4, repeating steps 203-2 and 203-3 until X_NAll points in' are determined to be a cluster or noise. At this moment, the existing abnormal sound is divided into a new abnormal sound set X'_{N_1},X'_{N_2},...,X'_{N_m}And M is the total number of categories of the cluster, i.e. the total number of new anomalies is M.

After clustering is completed, modifying the network structure of the full-connection and classification layer, changing the layer 7-layer9 of the model according to the latest classification, adjusting the abnormal class to be N + M, and outputting the classification dimensionality of the result to be N + M + 2: n + M anomalies, no identified anomalies, and no anomalies.

Obtaining an unknown abnormal audio set X'_{N_1},X'_{N_2},...,X'_{N_m}Then, the abnormal code is recorded as Lⁿ⁺¹,Lⁿ ⁺²,...,L^n+mN is all the previously known abnormal sound types, and then the abnormal sounds after clustering are collected into X'_{N_m}Extract 30 minutes of audio as L^n+mIs added to the verification set of the original model. Then aggregate X 'of verification set is removed'_{N_1},X'_{N_2},...,X'_{N_m}And adding the training set of the original model.

Referring to fig. 3, the process of training the new model is the same as the process of training the original model, the model network is trained by using the training set, and then the training result is verified by using the verification set. The original model uses only the original training set and the original validation set. In the process of training the new model, the reported abnormal audio is used for expanding the original training set and the original verification set, and then training is started. The specific training process comprises the following steps:

step 201, setting the number of data iterations to be 0; dividing a training set into a plurality of batch data by taking K seconds (K is 300) as a unit, and discarding insufficient data;

step 202, taking a batch from a training set;

step 203, adding one to the iteration number;

step 204, solving FFT characteristics of batch data;

step 205, performing a round of training by using the FFT characteristic of the batch data;

step 208, if the loss value is the minimum value in the model training and the loss value is not reduced compared with the last loss value, recording the current candidate abnormal sound detection model as the optimal abnormal sound detection model, and skipping to step 210; otherwise, jumping to step 209;

step 209, resetting the iteration times to zero, adding one to the epoch times, skipping to step 202, and starting the training iteration of the next epoch;

and step 210, finishing training.

The present embodiment further includes a method for updating an abnormal-sound detection system using an online updating system for abnormal-sound detection, wherein:

referring to fig. 4, the step of embedding the end includes:

step 302, processing the audio digital signal in a frequency domain to obtain audio frequency samples;

step 303, processing the audio frequency samples to finish abnormal sound classification, wherein the output abnormal types comprise N types of abnormal, unidentified abnormal and no abnormal; the number of abnormal types is determined by an abnormal sound detection model;

304, when the abnormal sound detection result is that the abnormal sound is not identified, the unknown abnormal reporting module collects the MAC address and the model of the sound source to be detected to form header information, and reports the header information and the corresponding audio digital signal in a wireless transmission mode after the header information and the corresponding audio digital signal are combined into abnormal audio;

and 305, receiving the trained abnormal sound detection model, and finishing updating the abnormal sound detection model.

Referring to fig. 5, the steps of the server side include:

step 401, receiving abnormal audio, wherein the abnormal audio comprises an MAC address and a model of a sound source to be detected and an audio digital signal corresponding to an unidentified abnormality;

step 402, classifying the abnormal audio according to the model of the sound source to be detected to obtain equipment-associated abnormal audio;

step 403, performing data cleaning and abnormal category clustering on the abnormal audio associated with the equipment, and adjusting the structure of the abnormal sound detection model according to the clustering result;

step 404, dividing the abnormal audio into a training set and a verification set;

step 405, performing model training on the abnormal sound detection model by using the training set and the verification set to obtain a trained abnormal sound detection model;

and 406, obtaining a sound source to be detected of the reporting equipment associated with the abnormal audio according to the MAC address in the header information, and updating the trained abnormal sound detection model into an embedded end system of an online updating system of the sound source to be detected.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An embedded end system of an online updating system for abnormal sound detection runs at a sound source end to be detected, and is characterized by comprising an equipment sound acquisition module, an equipment sound audio feature extraction module, a neural network module and an unknown abnormal reporting module;

2. The embedded end system of the online update system of claim 1, further adapted to receive a trained abnormal sound detection model and complete the update of the abnormal sound detection model.

3. A server-side system of an online update system for abnormal sound detection, the server-side system comprising an online learning module, a model training module, and a model update module;

4. The server-side system of the online update system of claim 3, wherein the online learning module classifies the abnormal audio according to the model of the sound source to be tested to obtain device-associated abnormal audio, and then establishes the training set and the verification set by using the device-associated abnormal audio for each sound source to be tested.

5. The server-side system of the online update system of claim 4, wherein the data processing of the device-associated abnormal audio by the online learning module comprises data cleaning and abnormal category clustering; the data processing obtains M clustering categories as M new abnormal categories;

6. The server-side system of an online update system according to claim 5, wherein the process of data cleansing and abnormal category clustering specifically comprises:

7. The server-side system of the online update system of claim 6, wherein the online learning module extracts audio of L minutes from the abnormal sound sampling segments as the verification set of the trained abnormal sound detection model, adds the audio to an original verification set to form an extended verification set, and adds the abnormal sound sampling segments without the verification set to an original training set to form an extended training set.

8. The server-side system of the online update system of claim 3, wherein the model training module performs the model training on the abnormal sound detection model by the model training module, and comprises:

step 202, taking one batch from the training set;

step 203, adding one to the iteration number;

and step 210, finishing training.

9. The server-side system of an online update system of claim 4, wherein the model update module issues the trained abnormal sound detection model and completes the update of the abnormal sound detection model according to the MAC address and the model of the sound source to be detected corresponding to the header information, and does not update the abnormal sound detection model for the embedded-side system that does not report an error.

10. An online update method applied to the embedded end system of the online update system as claimed in claim 1, wherein the online update method comprises:

11. An online update method applied to the embedded end system of the online update system as claimed in claim 2, wherein the online update method comprises: :

12. An online updating method applied to a server-side system of the online updating system according to claim 3, wherein the online updating method comprises: