US20230030911A1 - Abnormal sound detection method and apparatus - Google Patents

Abnormal sound detection method and apparatus Download PDF

Info

Publication number
US20230030911A1
US20230030911A1 US17/460,338 US202117460338A US2023030911A1 US 20230030911 A1 US20230030911 A1 US 20230030911A1 US 202117460338 A US202117460338 A US 202117460338A US 2023030911 A1 US2023030911 A1 US 2023030911A1
Authority
US
United States
Prior art keywords
abnormal sound
sound detection
spectrogram
sound signal
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US17/460,338
Other versions
US11579012B1 (en
Inventor
Tairong Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wistron Corp
Original Assignee
Wistron Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wistron Corp filed Critical Wistron Corp
Assigned to WISTRON CORPORATION reassignment WISTRON CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, TAIRONG
Publication of US20230030911A1 publication Critical patent/US20230030911A1/en
Application granted granted Critical
Publication of US11579012B1 publication Critical patent/US11579012B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H17/00Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves, not provided for in the preceding groups
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H3/00Measuring characteristics of vibrations by using a detector in a fluid
    • G01H3/04Frequency
    • G01H3/08Analysing frequencies present in complex vibrations, e.g. comparing harmonics present
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • the disclosure relates to a sound detection technology, and more particularly to an abnormal sound detection method and apparatus.
  • assembly defects adversely affect performance of sound-related electronic products such as speakers or microphones.
  • assembly defects may lead to electrical noise or mechanical abnormal sounds or vibrations.
  • Assembly defects are usually detected by experienced listeners at the ends of production lines, and such detection requires the application of log-swept sine chirps to speakers and the use of human auditory detection to analyze whether response signals thereof are normal.
  • results detected by human hearing evaluation vary with subjective factors such as the listeners' age, mood changes, and hearing fatigue, and this method is likely to cause occupational injuries to the listeners.
  • the disclosure provides an abnormal sound detection method and apparatus, which may detect a defect category corresponding to an abnormal sound signal by image recognition.
  • the abnormal sound detection method of the disclosure includes receiving an abnormal sound signal, converting the abnormal sound signal into a spectrogram, and executing image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
  • the step of executing the image recognition on the spectrogram includes inputting the spectrogram to a classification model, which is a neural network model, to obtain multiple probability values respectively corresponding to multiple specified labels, and using the specified label in correspondence with the greatest one among the probability values as the defect category.
  • a classification model which is a neural network model
  • the method further includes determining whether the defect category is consistent with a comparison result, and inputting the abnormal sound signal to a training dataset for retraining the classification model through the training dataset if the defect category is not consistent with the comparison result.
  • the method further includes the following steps. Whether the greatest one among the probability values is greater than a confidence index corresponding thereto is determined.
  • the specified label in correspondence with the greatest one among the probability values is used as the defect category in response to the greatest one among the probability values being greater than the confidence index corresponding thereto.
  • the abnormal sound signal is input to a training dataset for retraining the classification model through the training dataset in response to the greatest one among the plurality of probability values not being greater than the confidence index corresponding thereto.
  • the step of inputting the spectrogram to the classification model includes dividing the spectrogram into multiple sub-spectrograms according to time sequence of the spectrogram for inputting the sub-spectrograms to the classification model.
  • the classification model includes a bidirectional long short-term memory (BLSTM) layer, a max pooling layer, a flatten layer, and a fully connected layer.
  • BSSTM bidirectional long short-term memory
  • the step of converting the abnormal sound signal into the spectrogram includes executing fast Fourier transform on the abnormal sound signal for generating the spectrogram.
  • the step of receiving the abnormal sound signal includes receiving the abnormal sound signal from a sound detection model.
  • the sound detection model is for detecting whether an audio signal has an abnormal sound, and regards the audio signal to be the abnormal sound signal when determining that the audio signal has the abnormal sound.
  • the method before receiving the abnormal sound signal from the sound detection model, the method further includes receiving an audio signal from a recording device through the sound detection model.
  • the recording device is disposed on a target in a silent box or is disposed in the silent box for recording a sound emitted in the silent box to output the audio signal.
  • the abnormal sound detection apparatus of the disclosure includes a receiver, which is configured to receive an abnormal sound signal, and a processor, which is coupled to the receiver and is configured to convert the abnormal sound signal into a spectrogram and execute image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
  • the disclosure establishes an abnormal sound detection architecture based on deep learning (DL) and classifies abnormal sound signals of various malfunctions through this architecture, thereby reducing the number of machines returned for re-tests and providing relevant information for reference when repairing machines to speed up the repair progress.
  • DL deep learning
  • FIG. 1 is a block diagram of an abnormal sound detection apparatus according to an embodiment of the disclosure.
  • FIG. 2 is a schematic diagram of a system for detecting a target according to an embodiment of the disclosure.
  • FIG. 4 A and FIG. 4 B are schematic diagrams of abnormal sound signal spectrograms according to an embodiment of the disclosure.
  • FIG. 5 is an architecture diagram of a classification model according to an embodiment of the disclosure.
  • FIG. 1 is a block diagram of an abnormal sound detection apparatus according to an embodiment of the disclosure.
  • an abnormal sound detection apparatus 100 includes a processor 110 , a storage 120 , and a receiver 130 .
  • the processor 110 is coupled to the storage 120 and the receiver 130 .
  • the abnormal sound detection apparatus 100 is for analyzing an abnormal sound signal N which is received, so as to obtain a defect category corresponding to the abnormal sound signal N.
  • a sound detection model may be disposed in the abnormal sound detection apparatus 100 .
  • the sound detection model is a software or a module for determining whether an audio signal is normal or abnormal, and the audio signal determined to be abnormal is the abnormal sound signal.
  • the storage 120 includes a database 121 and a classification model 122 .
  • the database 121 stores a training dataset.
  • the training dataset includes abnormal sound signals of multiple known defect categories collected in advance (serving as a comparison result). These known abnormal sound signals are used to train the classification model 122 .
  • the classification model 122 is, for example, a neural network (NN) model including multiple layers, and this NN model is trained through deep learning.
  • the concept of deep learning is to inform the NN model of input-output relationships through a great amount of known data, thereby adjusting parameters such as weight, bias, and the like in the NN model.
  • the processing unit 110 is, for example, a central processing unit (CPU), a graphics processing unit (GPU), or other programmable microprocessors, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC), a programmable logic device (PLD), or other similar apparatuses.
  • CPU central processing unit
  • GPU graphics processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the storage 120 is, for example, any type of fixed or mobile random access memory, read-only memory, flash memory, secure digital card, hard disk, or other similar apparatuses or a combination thereof.
  • the storage 120 stores multiple program code fragments, and the program code fragments are executed by the processor 110 after installed for executing the abnormal sound detection method.
  • the receiver 130 is, for example, a communication port for being connected to a network card interface, a transmission line, or the like to receive the abnormal sound signal N.
  • FIG. 2 is a schematic diagram of a system for detecting a target according to an embodiment of the disclosure.
  • this system includes a silent box 210 , a sound detection model 220 , and the classification model 122 .
  • the sound detection model 220 may be disposed in the abnormal sound detection apparatus 100 .
  • the sound detection model 220 may be disposed in an apparatus different from the abnormal sound detection apparatus 100 .
  • a target T, a speaker 211 (for example, a loudspeaker), and a recording device 212 (for example, a microphone) are disposed in the silent box 210 .
  • the sound detection model 220 receives an audio signal from the recording device 212 and determines whether the audio signal is normal or abnormal, and the audio signal determined to be abnormal is the abnormal sound signal.
  • the recording device 212 is for recording sound emitted in the silent box 210 to output the audio signal.
  • the target T is placed in the silent box 210 for testing, which may avoid environmental interference.
  • the silent box 210 may perform transmission with the sound detection model 220 through a wired or wireless transmission method.
  • the sound detection model 220 transmits a test signal to the speaker 211 of the silent box 210 through a wired or wireless transmission method for playing the test signal by the speaker 211 , and records the sound emitted from the silent box 210 by the recording device 212 of the silent box 210 for outputting the audio signal.
  • the recording device 212 is disposed in the silent box 210 (the recording device 212 is not disposed on the target T), and the target T disposed with the speaker 211 is placed in the silent box 210 .
  • the target T may be detected through the following method: detecting overall stability of the speaker 211 on the target T and whether vibration of the speaker 211 affects resonance of a housing or elements of the test target T and generates noise.
  • the sound detection model 220 outputs the test signal to the speaker 211 disposed on the target T through wireless or wired transmission method so that the speaker 211 plays the test signal (sweep signal) of a specific frequency range.
  • the specific frequency range is generally set within 20 Hz to 20 kHz for scanning the resonance of the target T in this frequency range.
  • the recording device 212 in the silent box 210 receives (records) the sound emitted in the silent box 210 , including the sound emitted by the speaker 211 on the target T as well as the sweep and resonance sound emitted by the test target T.
  • the audio signal recorded by the recording device 212 is transmitted to the sound detection model 220 through a wired or wireless transmission method for the sound detection model 220 to determine whether the audio signal has the abnormal sound. If the audio signal has the abnormal sound, the audio signal is regarded as the abnormal sound signal and classified by the classification model 122 . In this way, the abnormal sound signal may be classified according to factors such as which component or structure causes the speaker 211 to generate resonance abnormal sound when playing a sweep signal.
  • the speaker 211 is disposed in the silent box 210 (the speaker 211 is not disposed on the target T), and the target T disposed with the recording device 212 is placed in the silent box 210 .
  • the target T may be detected through the following method: detecting receipt stability of the recording device 212 on the target T. That is, the speaker 211 (for example, an artificial mouth) disposed in the silent box 210 emits a test signal (sweep sound), and the recording device 212 on the target T receives (records) the sound emitted in the silent box 210 for outputting an audio signal.
  • the recorded audio signal is transmitted to the sound detection model 220 by a wireless or wired transmission method for the sound detection model 220 to determine whether the audio signal has abnormal sound.
  • FIG. 3 is a flowchart of an abnormal sound detection method according to an embodiment of the disclosure.
  • the processor 110 receives the abnormal sound signal N through the receiver 130 .
  • step S 310 the processor 110 converts the abnormal sound signal into a spectrogram.
  • the processor 110 executes fast Fourier transform (FFT) on the abnormal sound signal N to generate the spectrogram.
  • FFT fast Fourier transform
  • the reason why the abnormal sound signal N is converted to the spectrogram is that the abnormal sound and the test signal have time continuity during resonance. Therefore, converting time domain signals to the spectrogram allows abnormal sound features to show a phenomenon of time continuity and energy clustering in the spectrogram, retaining subtle features without losing these subtle features during conversion, so as to facilitate subsequent defect detection of the test target T by using computer vision technologies.
  • the factor of “not equipped with foam gaskets” causes a loudspeaker to vibrate and affect the housing of the target T, resulting in a vibrating sound due to resonance of the housing.
  • the factor of “poor quality of imported materials” refers to poor quality of the speaker 211 .
  • the resonance sound of the foreign body also causes abnormal sound generation.
  • FIG. 4 A and FIG. 4 B are schematic diagrams of abnormal sound signal spectrograms according to an embodiment of the disclosure.
  • the horizontal axis represents frequency
  • the vertical axis represents power ratio.
  • FIG. 4 A and FIG. 4 B show the spectrograms of abnormal sound signals caused by different defect categories.
  • step S 315 image recognition is executed on the spectrogram through the classification model 122 for obtaining the defect category corresponding to the abnormal sound signal N. That is, the spectrogram is input to the classification model 122 for obtaining multiple probability values respectively corresponding to multiple specified labels.
  • the specified label in correspondence with the greatest one among the probability values is used as the defect category.
  • the classification model 122 may be used to obtain six probability values corresponding to the six specified labels. The sum of the six probability values is 1. The highest probability value among the six probability values is selected, and the specified label corresponding to this highest probability value is the finally obtained defect category.
  • the classification model 122 may further include a human hearing weight to adjust the probability value corresponding to the output specified label so that an output result is closer to a determination result of a human ear.
  • the abnormal sound signal N is also provided to a relevant engineer for manual classification to obtain a comparison result. Therefore, the defect category obtained from the classification model 122 may be compared with the comparison result. If the two do not match, this abnormal sound signal N and the comparison result are input to the training dataset for retraining the classification model 122 through the training dataset.
  • a confidence index corresponding to each specified label may also be set. After obtaining the highest probability value, the highest probability value is further compared with the confidence index corresponding thereto. If the highest probability value is not greater than the confidence index corresponding thereto, it means that the defect category corresponding thereto is not one of the existing six labels. Therefore, the abnormal sound signal N is transmitted to the relevant engineer for manual identification to obtain the defect category corresponding to the abnormal sound signal N, and the abnormal sound signal N and the defect category corresponding thereto (not yet in the training dataset) are added (input) to the training dataset to retrain the classification model 122 .
  • FIG. 5 is an architecture diagram of a classification model according to an embodiment of the disclosure.
  • the classification model 122 includes a bidirectional long short-term memory (BLSTM) layer 505 , a max pooling layer 510 , a flatten layer 515 , and a fully connected layer 520 .
  • BSSTM bidirectional long short-term memory
  • the BLSTM layer 505 obtains feature data according to the following equations:
  • LSTM long short-term memory
  • the features data h t blstm retrieved on the BLSTM layer 505 is simplified through the max pooling layer 510 to obtain more important feature information.
  • the max pooling layer 510 calculates an output on each pooling window, and then selects the max value according to values in the pooling windows.
  • the max pooling layer 510 is calculated according to the following equation:
  • the flatten layer 515 is for flattening the feature data output by the max pooling layer 510 .
  • multi-dimensional feature data is transformed into a one-dimensional matrix.
  • the flattened feature data is input to the fully connected layer 520 , and after weight calculations, the probability values corresponding to multiple labels 525 - 1 to 525 -M are obtained in the spectrogram IM.
  • the sum of the probability values of the labels 525 - 1 to 525 -M is 1.
  • the labels 525 - 1 to 525 -M respectively have confidence indexes T 1 to TM corresponding thereto. After the probability values of the labels 525 - 1 to 525 -M are obtained, the highest probability value is taken out.
  • the probability value of the label 525 - 1 is higher than the confidence index T 1 corresponding thereto. If the probability value of the label 525 - 1 is higher than the confidence index T 1 corresponding thereto, then the defect category of the abnormal sound signal N is the label 525 - 1 . If the probability value of the label 525 - 1 is not higher than the confidence index T 1 corresponding thereto, then the abnormal sound signal N is sent to the relevant engineer for manual identification to obtain the defect category in correspondence with the abnormal sound signal N, and this abnormal sound signal N along with the corresponding defect category (not yet in the training dataset) are added (input) to the training dataset to retrain the classification model 122 .
  • the abnormal sound signal classification in the above embodiments may shorten repair time, provide more accurate defect detection than subjective determination by a human ear, and reduce relevant occupational injuries.
  • the abnormal sound signal may directly be analyzed through the classification model to know which type of failures causes the abnormal sound signal. In this way, a device that did not pass a test may be repaired at one time, and after analysis, elements and mechanisms that often fail may be known for improvements to increase the yield.

Abstract

An abnormal sound detection method and apparatus are provided. First, an abnormal sound signal is received. Next, the abnormal sound signal is converted into a spectrogram. Afterwards, image recognition is performed on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims the priority benefit of Taiwan application serial no. 110125758, filed on Jul. 13, 2021. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
  • BACKGROUND Technical Field
  • The disclosure relates to a sound detection technology, and more particularly to an abnormal sound detection method and apparatus.
  • Description of Related Art
  • Generally speaking, problems such as assembly defects adversely affect performance of sound-related electronic products such as speakers or microphones. For example, assembly defects may lead to electrical noise or mechanical abnormal sounds or vibrations. Assembly defects are usually detected by experienced listeners at the ends of production lines, and such detection requires the application of log-swept sine chirps to speakers and the use of human auditory detection to analyze whether response signals thereof are normal. However, results detected by human hearing evaluation vary with subjective factors such as the listeners' age, mood changes, and hearing fatigue, and this method is likely to cause occupational injuries to the listeners.
  • In addition, existing models only determine whether an abnormal sound signal exists, without classifying the abnormal sound signal. Therefore, defects of a target remain unknown, which leads to a significant increase in repair time.
  • SUMMARY
  • The disclosure provides an abnormal sound detection method and apparatus, which may detect a defect category corresponding to an abnormal sound signal by image recognition.
  • The abnormal sound detection method of the disclosure includes receiving an abnormal sound signal, converting the abnormal sound signal into a spectrogram, and executing image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
  • In an embodiment of the disclosure, the step of executing the image recognition on the spectrogram includes inputting the spectrogram to a classification model, which is a neural network model, to obtain multiple probability values respectively corresponding to multiple specified labels, and using the specified label in correspondence with the greatest one among the probability values as the defect category.
  • In an embodiment of the disclosure, after obtaining the defect category corresponding to the abnormal sound signal, the method further includes determining whether the defect category is consistent with a comparison result, and inputting the abnormal sound signal to a training dataset for retraining the classification model through the training dataset if the defect category is not consistent with the comparison result.
  • In an embodiment of the disclosure, after obtaining the probability values respectively corresponding to the specified labels, the method further includes the following steps. Whether the greatest one among the probability values is greater than a confidence index corresponding thereto is determined. The specified label in correspondence with the greatest one among the probability values is used as the defect category in response to the greatest one among the probability values being greater than the confidence index corresponding thereto. The abnormal sound signal is input to a training dataset for retraining the classification model through the training dataset in response to the greatest one among the plurality of probability values not being greater than the confidence index corresponding thereto.
  • In an embodiment of the disclosure, the step of inputting the spectrogram to the classification model includes dividing the spectrogram into multiple sub-spectrograms according to time sequence of the spectrogram for inputting the sub-spectrograms to the classification model.
  • In an embodiment of the disclosure, the classification model includes a bidirectional long short-term memory (BLSTM) layer, a max pooling layer, a flatten layer, and a fully connected layer.
  • In an embodiment of the disclosure, the step of converting the abnormal sound signal into the spectrogram includes executing fast Fourier transform on the abnormal sound signal for generating the spectrogram.
  • In an embodiment of the disclosure, the step of receiving the abnormal sound signal includes receiving the abnormal sound signal from a sound detection model. The sound detection model is for detecting whether an audio signal has an abnormal sound, and regards the audio signal to be the abnormal sound signal when determining that the audio signal has the abnormal sound.
  • In an embodiment of the disclosure, before receiving the abnormal sound signal from the sound detection model, the method further includes receiving an audio signal from a recording device through the sound detection model. The recording device is disposed on a target in a silent box or is disposed in the silent box for recording a sound emitted in the silent box to output the audio signal.
  • The abnormal sound detection apparatus of the disclosure includes a receiver, which is configured to receive an abnormal sound signal, and a processor, which is coupled to the receiver and is configured to convert the abnormal sound signal into a spectrogram and execute image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
  • Based on the above, the disclosure establishes an abnormal sound detection architecture based on deep learning (DL) and classifies abnormal sound signals of various malfunctions through this architecture, thereby reducing the number of machines returned for re-tests and providing relevant information for reference when repairing machines to speed up the repair progress.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of an abnormal sound detection apparatus according to an embodiment of the disclosure.
  • FIG. 2 is a schematic diagram of a system for detecting a target according to an embodiment of the disclosure.
  • FIG. 3 is a flowchart of an abnormal sound detection method according to an embodiment of the disclosure.
  • FIG. 4A and FIG. 4B are schematic diagrams of abnormal sound signal spectrograms according to an embodiment of the disclosure.
  • FIG. 5 is an architecture diagram of a classification model according to an embodiment of the disclosure.
  • DESCRIPTION OF THE EMBODIMENTS
  • Part of the embodiments of the disclosure will be described in detail below with accompanying drawings. For the reference numerals used in the following description, the same reference numerals appearing in different drawings will be regarded as the same or similar elements. These embodiments are only a part of the disclosure and do not disclose all possible implementations of the disclosure. More precisely, these embodiments only serve as examples of the method and apparatus within the scope of the claims of the disclosure.
  • FIG. 1 is a block diagram of an abnormal sound detection apparatus according to an embodiment of the disclosure. With reference to FIG. 1 , an abnormal sound detection apparatus 100 includes a processor 110, a storage 120, and a receiver 130. The processor 110 is coupled to the storage 120 and the receiver 130. The abnormal sound detection apparatus 100 is for analyzing an abnormal sound signal N which is received, so as to obtain a defect category corresponding to the abnormal sound signal N. In an embodiment, a sound detection model may be disposed in the abnormal sound detection apparatus 100. The sound detection model is a software or a module for determining whether an audio signal is normal or abnormal, and the audio signal determined to be abnormal is the abnormal sound signal.
  • The storage 120 includes a database 121 and a classification model 122. The database 121 stores a training dataset. The training dataset includes abnormal sound signals of multiple known defect categories collected in advance (serving as a comparison result). These known abnormal sound signals are used to train the classification model 122. Here, the classification model 122 is, for example, a neural network (NN) model including multiple layers, and this NN model is trained through deep learning. The concept of deep learning is to inform the NN model of input-output relationships through a great amount of known data, thereby adjusting parameters such as weight, bias, and the like in the NN model.
  • The processing unit 110 is, for example, a central processing unit (CPU), a graphics processing unit (GPU), or other programmable microprocessors, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC), a programmable logic device (PLD), or other similar apparatuses.
  • The storage 120 is, for example, any type of fixed or mobile random access memory, read-only memory, flash memory, secure digital card, hard disk, or other similar apparatuses or a combination thereof. The storage 120 stores multiple program code fragments, and the program code fragments are executed by the processor 110 after installed for executing the abnormal sound detection method.
  • The receiver 130 is, for example, a communication port for being connected to a network card interface, a transmission line, or the like to receive the abnormal sound signal N.
  • FIG. 2 is a schematic diagram of a system for detecting a target according to an embodiment of the disclosure. With reference to FIG. 2 , this system includes a silent box 210, a sound detection model 220, and the classification model 122. In an embodiment, the sound detection model 220 may be disposed in the abnormal sound detection apparatus 100. In other embodiments, the sound detection model 220 may be disposed in an apparatus different from the abnormal sound detection apparatus 100. A target T, a speaker 211 (for example, a loudspeaker), and a recording device 212 (for example, a microphone) are disposed in the silent box 210. The sound detection model 220 receives an audio signal from the recording device 212 and determines whether the audio signal is normal or abnormal, and the audio signal determined to be abnormal is the abnormal sound signal. The recording device 212 is for recording sound emitted in the silent box 210 to output the audio signal.
  • The target T is placed in the silent box 210 for testing, which may avoid environmental interference. The silent box 210, for example, may perform transmission with the sound detection model 220 through a wired or wireless transmission method. For example, the sound detection model 220 transmits a test signal to the speaker 211 of the silent box 210 through a wired or wireless transmission method for playing the test signal by the speaker 211, and records the sound emitted from the silent box 210 by the recording device 212 of the silent box 210 for outputting the audio signal.
  • In the embodiment shown in FIG. 2 , the recording device 212 is disposed in the silent box 210 (the recording device 212 is not disposed on the target T), and the target T disposed with the speaker 211 is placed in the silent box 210. The target T may be detected through the following method: detecting overall stability of the speaker 211 on the target T and whether vibration of the speaker 211 affects resonance of a housing or elements of the test target T and generates noise. Specifically, the sound detection model 220 outputs the test signal to the speaker 211 disposed on the target T through wireless or wired transmission method so that the speaker 211 plays the test signal (sweep signal) of a specific frequency range. The specific frequency range is generally set within 20 Hz to 20 kHz for scanning the resonance of the target T in this frequency range.
  • Afterwards, the recording device 212 in the silent box 210 receives (records) the sound emitted in the silent box 210, including the sound emitted by the speaker 211 on the target T as well as the sweep and resonance sound emitted by the test target T. The audio signal recorded by the recording device 212 is transmitted to the sound detection model 220 through a wired or wireless transmission method for the sound detection model 220 to determine whether the audio signal has the abnormal sound. If the audio signal has the abnormal sound, the audio signal is regarded as the abnormal sound signal and classified by the classification model 122. In this way, the abnormal sound signal may be classified according to factors such as which component or structure causes the speaker 211 to generate resonance abnormal sound when playing a sweep signal.
  • In addition, in other embodiments, the speaker 211 is disposed in the silent box 210 (the speaker 211 is not disposed on the target T), and the target T disposed with the recording device 212 is placed in the silent box 210. The target T may be detected through the following method: detecting receipt stability of the recording device 212 on the target T. That is, the speaker 211 (for example, an artificial mouth) disposed in the silent box 210 emits a test signal (sweep sound), and the recording device 212 on the target T receives (records) the sound emitted in the silent box 210 for outputting an audio signal. Next, the recorded audio signal is transmitted to the sound detection model 220 by a wireless or wired transmission method for the sound detection model 220 to determine whether the audio signal has abnormal sound.
  • FIG. 3 is a flowchart of an abnormal sound detection method according to an embodiment of the disclosure. With reference to FIG. 1 to FIG. 3 , in step S305, the processor 110 receives the abnormal sound signal N through the receiver 130.
  • Next, in step S310, the processor 110 converts the abnormal sound signal into a spectrogram. Here, the processor 110 executes fast Fourier transform (FFT) on the abnormal sound signal N to generate the spectrogram. The reason why the abnormal sound signal N is converted to the spectrogram is that the abnormal sound and the test signal have time continuity during resonance. Therefore, converting time domain signals to the spectrogram allows abnormal sound features to show a phenomenon of time continuity and energy clustering in the spectrogram, retaining subtle features without losing these subtle features during conversion, so as to facilitate subsequent defect detection of the test target T by using computer vision technologies.
  • There are many factors for generating an abnormal sound, such as “short circuits due to solder bridging of elements”, “flex cables disposed too tight”, “not equipped with foam gaskets”, “poor quality of imported materials”. Among the above, the factor of “short circuits due to solder bridging of elements” refers to the problem that the speaker 211 generates a direct current sound or is muted. The factor of “flex cables disposed too tight” refers to the fact that the recording device 212 is suspended on the housing (the target T), so the cables may pull the suspended recording device 212 if the cables are too short, resulting in poor sound reception or noise generation. The factor of “not equipped with foam gaskets” causes a loudspeaker to vibrate and affect the housing of the target T, resulting in a vibrating sound due to resonance of the housing. The factor of “poor quality of imported materials” refers to poor quality of the speaker 211. In addition, when a foreign body (such as a plastic bag) exists on the target T, the resonance sound of the foreign body also causes abnormal sound generation.
  • Generally speaking, the resonance of the target T in the specific frequency range generates harmonic features in the spectrogram. If an abnormal sound appears, harmonics in the spectrogram will cluster together and present a block of high brightness. The more severe the abnormal sound is, the greater and/or the brighter the block is, while the less severe the abnormal sound is, the smaller and/or the less bright the block is. FIG. 4A and FIG. 4B are schematic diagrams of abnormal sound signal spectrograms according to an embodiment of the disclosure. In the spectrogram shown in FIG. 4A and FIG. 4B, the horizontal axis represents frequency, and the vertical axis represents power ratio. FIG. 4A and FIG. 4B show the spectrograms of abnormal sound signals caused by different defect categories.
  • Afterwards, in step S315, image recognition is executed on the spectrogram through the classification model 122 for obtaining the defect category corresponding to the abnormal sound signal N. That is, the spectrogram is input to the classification model 122 for obtaining multiple probability values respectively corresponding to multiple specified labels. In addition, the specified label in correspondence with the greatest one among the probability values is used as the defect category. For example, in a stage of training the classification model 122, if there are six types of known defect categories included in the training dataset, then six specified labels are finally output by the classification model 122. In a stage of detection, the classification model 122 may be used to obtain six probability values corresponding to the six specified labels. The sum of the six probability values is 1. The highest probability value among the six probability values is selected, and the specified label corresponding to this highest probability value is the finally obtained defect category.
  • The classification model 122 may further include a human hearing weight to adjust the probability value corresponding to the output specified label so that an output result is closer to a determination result of a human ear.
  • In addition, in order to further verify the classification model 122, during transmitting the abnormal sound signal N to the classification model 122 for classification, the abnormal sound signal N is also provided to a relevant engineer for manual classification to obtain a comparison result. Therefore, the defect category obtained from the classification model 122 may be compared with the comparison result. If the two do not match, this abnormal sound signal N and the comparison result are input to the training dataset for retraining the classification model 122 through the training dataset.
  • In addition, a confidence index corresponding to each specified label may also be set. After obtaining the highest probability value, the highest probability value is further compared with the confidence index corresponding thereto. If the highest probability value is not greater than the confidence index corresponding thereto, it means that the defect category corresponding thereto is not one of the existing six labels. Therefore, the abnormal sound signal N is transmitted to the relevant engineer for manual identification to obtain the defect category corresponding to the abnormal sound signal N, and the abnormal sound signal N and the defect category corresponding thereto (not yet in the training dataset) are added (input) to the training dataset to retrain the classification model 122.
  • FIG. 5 is an architecture diagram of a classification model according to an embodiment of the disclosure. With reference to FIG. 5 , the classification model 122 includes a bidirectional long short-term memory (BLSTM) layer 505, a max pooling layer 510, a flatten layer 515, and a fully connected layer 520.
  • After receiving the abnormal sound signal N, the processor 110 converts the abnormal sound signal N into a spectrogram IM. Next, according to time sequence of the spectrogram IM, the spectrogram IM is divided into multiple sub-spectrograms. For example, the spectrogram IM is divided into multiple sub-spectrograms f1 to fT from low frequency to high frequency, and these sub-spectrograms f1 to fT are input to the BLSTM layer 505.
  • The BLSTM layer 505 obtains feature data according to the following equations:

  • {right arrow over (h)} t=LSTM(f t ,{right arrow over (h)} t−1)

  • Figure US20230030911A1-20230202-P00001
    t=LSTM(f t,
    Figure US20230030911A1-20230202-P00002
    t+1)

  • h t blstm ={right arrow over (h)} t+
    Figure US20230030911A1-20230202-P00003
    t
  • The BLSTM layer 505 respectively uses two long short-term memory (LSTM) models to do forward pass and backward pass calculations on sub-spectrograms ft (t=1 to T) for obtaining feature data {right arrow over (h)}t and feature data
    Figure US20230030911A1-20230202-P00004
    t, and then obtains feature data ht blstm based on the feature data {right arrow over (h)}t and the feature data
    Figure US20230030911A1-20230202-P00005
    t.
  • Afterwards, the features data ht blstm retrieved on the BLSTM layer 505 is simplified through the max pooling layer 510 to obtain more important feature information. The max pooling layer 510 calculates an output on each pooling window, and then selects the max value according to values in the pooling windows. The max pooling layer 510 is calculated according to the following equation:

  • f t(S)=MAX(h t blstm)
  • The flatten layer 515 is for flattening the feature data output by the max pooling layer 510. For example, multi-dimensional feature data is transformed into a one-dimensional matrix. Finally, the flattened feature data is input to the fully connected layer 520, and after weight calculations, the probability values corresponding to multiple labels 525-1 to 525-M are obtained in the spectrogram IM. Here, the sum of the probability values of the labels 525-1 to 525-M is 1. Here, the labels 525-1 to 525-M respectively have confidence indexes T1 to TM corresponding thereto. After the probability values of the labels 525-1 to 525-M are obtained, the highest probability value is taken out. Assuming that the label 525-1 has the highest probability value, it is further determined whether the probability value of the label 525-1 is higher than the confidence index T1 corresponding thereto. If the probability value of the label 525-1 is higher than the confidence index T1 corresponding thereto, then the defect category of the abnormal sound signal N is the label 525-1. If the probability value of the label 525-1 is not higher than the confidence index T1 corresponding thereto, then the abnormal sound signal N is sent to the relevant engineer for manual identification to obtain the defect category in correspondence with the abnormal sound signal N, and this abnormal sound signal N along with the corresponding defect category (not yet in the training dataset) are added (input) to the training dataset to retrain the classification model 122.
  • In summary, the abnormal sound signal classification in the above embodiments may shorten repair time, provide more accurate defect detection than subjective determination by a human ear, and reduce relevant occupational injuries. In addition, the abnormal sound signal may directly be analyzed through the classification model to know which type of failures causes the abnormal sound signal. In this way, a device that did not pass a test may be repaired at one time, and after analysis, elements and mechanisms that often fail may be known for improvements to increase the yield.

Claims (18)

What is claimed is:
1. An abnormal sound detection method, comprising:
receiving an abnormal sound signal;
converting the abnormal sound signal into a spectrogram; and
executing image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
2. The abnormal sound detection method according to claim 1, wherein a step of executing the image recognition on the spectrogram comprises:
inputting the spectrogram to a classification model for obtaining a plurality of probability values respectively corresponding to a plurality of specified labels, wherein the classification model is a neural network model; and
using the specified label in correspondence with a greatest one among the plurality of probability values as the defect category.
3. The abnormal sound detection method according to claim 2, after obtaining the defect category in correspondence with the abnormal sound signal, further comprising:
determining whether the defect category is consistent with a comparison result; and
inputting the abnormal sound signal to a training dataset for retraining the classification model through the training dataset in response to the defect category is not consistent with the comparison result.
4. The abnormal sound detection method according to claim 2, after obtaining the plurality of probability values respectively corresponding to the plurality of specified labels, further comprising:
determining whether the greatest one among the plurality of probability values is greater than a confidence index;
using the specified label in correspondence with the greatest one among the plurality of probability values as the defect category in response to the greatest one among the plurality of probability values being greater than the confidence index; and
inputting the abnormal sound signal to a training dataset for retraining the classification model through the training dataset in response to the greatest one among the plurality of probability values not being greater than the confidence index.
5. The abnormal sound detection method according to claim 2, wherein a step of inputting the spectrogram to the classification model comprises:
dividing the spectrogram into a plurality of sub-spectrograms according to time sequence of the spectrogram for inputting the plurality of sub-spectrograms to the classification model.
6. The abnormal sound detection method according to claim 2, wherein the classification model comprises a bidirectional long short-term memory layer, a max pooling layer, a flatten layer, and a fully connected layer.
7. The abnormal sound detection method according to claim 1, wherein a step of converting the abnormal sound signal into the spectrogram comprises:
executing fast Fourier transform on the abnormal sound signal for generating the spectrogram.
8. The abnormal sound detection method according to claim 1, wherein a step of receiving the abnormal sound signal comprises:
receiving the abnormal sound signal from a sound detection model,
wherein the sound detection model is for detecting whether an audio signal has an abnormal sound, and regards the audio signal to be the abnormal sound signal in response to determining that the audio signal has the abnormal sound.
9. The abnormal sound detection method according to claim 8, before a step of receiving the abnormal sound signal from the sound detection model, further comprising:
receiving the audio signal from a recording device through the sound detection model,
wherein the recording device is disposed on a target in a silent box or is disposed in the silent box for recording a sound emitted in the silent box to output the audio signal.
10. An abnormal sound detection apparatus, comprising:
a receiver, configured to receive an abnormal sound signal; and
a processor, coupled to the receiver, and configured to: convert the abnormal sound signal into a spectrogram, and execute an image recognition on the spectrogram for obtaining a defect category corresponding to the abnormal sound signal.
11. The abnormal sound detection apparatus according to claim 10, wherein the processor is configured to:
input the spectrogram to a classification model for obtaining a plurality of probability values respectively corresponding to a plurality of specified labels, wherein the classification model is a neural network model; and
use the specified label in correspondence with a greatest one among the plurality of probability values as the defect category.
12. The abnormal sound detection apparatus according to claim 11, wherein the processor is configured to:
determine whether the defect category is consistent with a comparison result;
input the abnormal sound signal to a training dataset for retraining the classification model through the training dataset in response to the defect category is not consistent with the comparison result.
13. The abnormal sound detection apparatus according to claim 11, wherein the processor is configured to:
determine whether the greatest one among the plurality of probability values is greater than a confidence index;
use the specified label in correspondence with the greatest one among the plurality of probability values as the defect category in response to the greatest one among the plurality of probability values being greater than the confidence index; and
input the abnormal sound signal to a training dataset for retraining the classification model through the training dataset in response to the greatest one among the plurality of probability values not being greater than the confidence index.
14. The abnormal sound detection apparatus according to claim 11, wherein the processor is configured to:
divide the spectrogram into a plurality of sub-spectrograms according to time sequence of the spectrogram for inputting the plurality of sub-spectrograms to the classification model.
15. The abnormal sound detection apparatus according to claim 11, wherein the classification model comprises a bidirectional long short-term memory layer, a max pooling layer, a flatten layer, and a fully connected layer.
16. The abnormal sound detection apparatus according to claim 11, wherein the processor is configured to:
execute fast Fourier transform on the abnormal sound signal for generating the spectrogram.
17. The abnormal sound detection apparatus according to claim 11, wherein the receiver is configured to:
receive the abnormal sound signal from a sound detection model,
wherein the sound detection model is for detecting whether an audio signal has an abnormal sound, and regards the audio signal to be the abnormal sound signal in response to determining that the audio signal has the abnormal sound.
18. The abnormal sound detection apparatus according to claim 17, further comprising:
the sound detection model, receiving the audio signal from a recording device,
wherein the recording device is disposed on a target in a silent box or is disposed in the silent box for recording a sound emitted in the silent box to output the audio signal.
US17/460,338 2021-07-13 2021-08-30 Abnormal sound detection method and apparatus Active US11579012B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW110125758 2021-07-13
TW110125758A TWI774472B (en) 2021-07-13 2021-07-13 Abnormal sound detection method and apparatus

Publications (2)

Publication Number Publication Date
US20230030911A1 true US20230030911A1 (en) 2023-02-02
US11579012B1 US11579012B1 (en) 2023-02-14

Family

ID=83807243

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/460,338 Active US11579012B1 (en) 2021-07-13 2021-08-30 Abnormal sound detection method and apparatus

Country Status (3)

Country Link
US (1) US11579012B1 (en)
CN (1) CN115620743A (en)
TW (1) TWI774472B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895286A (en) * 2023-09-11 2023-10-17 珠海芯烨电子科技有限公司 Printer fault monitoring method and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6486678B1 (en) * 2000-11-28 2002-11-26 Paul Spears Method for non-destructive analysis of electrical power system equipment

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6434515B1 (en) * 1999-08-09 2002-08-13 National Instruments Corporation Signal analyzer system and method for computing a fast Gabor spectrogram
TWI287216B (en) * 2006-09-22 2007-09-21 Univ Nat Chiao Tung Intelligent engine noise diagnostic system and diagnostic method thereof
US8175829B2 (en) * 2008-03-28 2012-05-08 Agilent Technologies, Inc. Analyzer for signal anomalies
TWI389579B (en) * 2009-04-27 2013-03-11 Univ Nat Chiao Tung Acoustic camera
CN101846594A (en) 2010-06-22 2010-09-29 上海交通大学 Fault detection device based on beam forming acoustic-image mode recognition and detection method thereof
TWI448978B (en) 2011-06-24 2014-08-11 Univ Nat Taiwan Normal Training method for fault detection system
US20130182865A1 (en) * 2011-12-30 2013-07-18 Agco Corporation Acoustic fault detection of mechanical systems with active noise cancellation
CN103810374B (en) 2013-12-09 2017-04-05 中国矿业大学 A kind of mechanical disorder Forecasting Methodology based on MFCC feature extractions
CN106323452B (en) 2015-07-06 2019-03-29 中达电子零组件(吴江)有限公司 A kind of detection method and detection device of equipment abnormal sound
CN106017954A (en) 2016-05-13 2016-10-12 南京雅信科技集团有限公司 Turnout point machine fault early warning system and method based on audio analysis
US11467570B2 (en) * 2017-09-06 2022-10-11 Nippon Telegraph And Telephone Corporation Anomalous sound detection apparatus, anomaly model learning apparatus, anomaly detection apparatus, anomalous sound detection method, anomalous sound generation apparatus, anomalous data generation apparatus, anomalous sound generation method and program
TWI761715B (en) * 2019-10-21 2022-04-21 緯創資通股份有限公司 Method and system for vision-based defect detection
CN111968670A (en) * 2020-08-19 2020-11-20 腾讯音乐娱乐科技(深圳)有限公司 Audio recognition method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6486678B1 (en) * 2000-11-28 2002-11-26 Paul Spears Method for non-destructive analysis of electrical power system equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895286A (en) * 2023-09-11 2023-10-17 珠海芯烨电子科技有限公司 Printer fault monitoring method and related device

Also Published As

Publication number Publication date
TW202303588A (en) 2023-01-16
US11579012B1 (en) 2023-02-14
CN115620743A (en) 2023-01-17
TWI774472B (en) 2022-08-11

Similar Documents

Publication Publication Date Title
US10861480B2 (en) Method and device for generating far-field speech data, computer device and computer readable storage medium
US10997965B2 (en) Automated voice processing testing system and method
TWI789645B (en) Stamping quality inspection system and stamping quality inspection method
CN105979462A (en) Test processing method and device based on microphone
US11579012B1 (en) Abnormal sound detection method and apparatus
CN105872205A (en) Information processing method and device
TWI761715B (en) Method and system for vision-based defect detection
CN113259832A (en) Microphone array detection method and device, electronic equipment and storage medium
CN114333865A (en) Model training and tone conversion method, device, equipment and medium
JP7407382B2 (en) Sound data processing method, sound data processing device and program
CN111341333A (en) Noise detection method, noise detection device, medium, and electronic apparatus
CN105810222A (en) Defect detection method, device and system for audio equipment
CN113068100A (en) Closed-loop automatic detection vibration reduction method, system, terminal and storage medium
CN111341343B (en) Online updating system and method for abnormal sound detection
CN110739006B (en) Audio processing method and device, storage medium and electronic equipment
US20230067447A1 (en) Abnormal sound specifying device, method of specifying abnormal sound, and nontransitory computer-readable storage medium storing computer-readable instructions for arithmetic device
CN116741200A (en) Locomotive fan fault detection method and device
US20230012559A1 (en) Abnormal sound identification device, abnormal sound identification method, and non-transitory storage medium
CN116705060A (en) Intelligent simulation method and system based on neural algorithm multi-source audio features
CN114302301B (en) Frequency response correction method and related product
CN112562717B (en) Howling detection method and device, storage medium and computer equipment
CN112735381B (en) Model updating method and device
KR102488319B1 (en) audio recognition method, audio recognition apparatus, electronic equipment, computer readable storage medium and computer program
US20220130411A1 (en) Defect-detecting device and defect-detecting method for an audio device
Poschadel et al. CNN-based multi-class multi-label classification of sound scenes in the context of wind turbine sound emission measurements

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: WISTRON CORPORATION, TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHEN, TAIRONG;REEL/FRAME:057404/0914

Effective date: 20210805

STCF Information on status: patent grant

Free format text: PATENTED CASE