WO2023054313A1

WO2023054313A1 - Abnormal sound determination method, abnormal sound determination program, and abnormal sound determination system

Info

Publication number: WO2023054313A1
Application number: PCT/JP2022/035838
Authority: WO
Inventors: 智之和田; 高史佐々; 隆士道川; 幸紀国本; 将宏重田
Original assignee: 国立研究開発法人理化学研究所; 株式会社トプコン
Priority date: 2021-09-29
Filing date: 2022-09-27
Publication date: 2023-04-06
Also published as: JP2023049213A

Abstract

According to the present invention, an abnormal sound determination method includes an acquisition step for acquiring sound data, a filter processing step for adjusting the sound data in accordance with a person's ability to hear, and a data generation step for generating adjusted sound data as input data for machine learning.

Description

Abnormal Sound Judgment Method, Abnormal Sound Judgment Program, and Abnormal Sound Judgment System

The present disclosure relates to an abnormal sound determination method, an abnormal sound determination program, and an abnormal sound determination system.

Conventionally, as an inspection method for structures, a method has been proposed in which concrete is hammered and abnormalities in the structure are detected from the hammering sound, such as lifting or corrosion of concrete. For example, Non-Patent Document 1 discloses an anomaly detection system that enables digitization, collection, storage, and analysis of hammering sounds of hammers and the like.

The system of Non-Patent Document 1 automatically determines the presence or absence of abnormalities in structures, but depending on the abnormal mode of the object, it may be necessary to use human inspection skills to determine abnormalities. However, in the situation where the number of inspectors has been decreasing in recent years, the opportunity to teach the skills of skilled inspectors is decreasing, and there is a problem that the skill training for other inspectors cannot be sufficiently performed.

In view of the above points, the present disclosure aims to provide an abnormal sound determination method, an abnormal sound determination program, and an abnormal sound determination system that can be used for inspection skill training.

In order to achieve the above object, the abnormal sound determination method according to the present disclosure includes an acquisition step of acquiring sound data, a filtering step of adjusting the sound data according to human hearing ability, and and a data generating step of generating the sound data as input data for machine learning.

In order to achieve the above-described object, the abnormal sound determination program according to the present disclosure includes an acquisition step of acquiring sound data, a filtering step of adjusting the sound data according to human hearing ability, and after adjustment and a data generation step of generating the sound data as input data for machine learning.

In order to achieve the above object, the abnormal sound determination system according to the present disclosure includes an input unit that acquires sound data, adjusts the sound data according to human hearing ability, and outputs the adjusted sound data. and a processing unit that generates input data for machine learning.

According to the present disclosure, it is possible to provide an abnormal sound determination method, an abnormal sound determination program, and an abnormal sound determination system that can be used for inspection skill training.

1 is a diagram showing an abnormal sound determination system according to an embodiment of the present disclosure; FIG. FIG. 4 is a diagram showing an outline of processing in the abnormal sound determination system; 4 is a flowchart showing processing of an abnormal sound determination program; It is a figure which shows sound data. It is the figure which transformed primary division data into the frequency domain, and expressed it. FIG. 4 is a diagram showing frequency weighting characteristics; It is a schematic diagram which shows a 1/3 octave band. FIG. 4 is a diagram showing time weighting characteristics; 4 is a diagram showing the configuration of image data; FIG.

Hereinafter, embodiments of the present disclosure will be described based on the drawings. The abnormal sound determination system 1 shown in FIG. have The input unit 11 may acquire the sound data 21 from the outside as a wired or wireless communication means, or function as a sensor (sound collecting means) such as a microphone, a vibration pickup, a vibration acceleration pickup, etc. to collect sound from the outside. It may be configured to be able to convert sound (vibration) into an electrical signal.

The output unit 12 can display the determination result 3 determined by the abnormal sound determination program 23 described later, for example, on a display, or can be sent to another device (for example, another display unit or speaker) wired or wirelessly. can output data about the determination result 3 by .

The control unit 13 is a circuit such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), etc., and is a processing unit that operates various programs. function as The control unit 13 controls operations of the input unit 11 , the output unit 12 and the storage unit 14 . Also, the control unit 13 can execute various programs such as the abnormal sound determination program 23 .

The storage unit 14 is a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), an optical disk, or a semiconductor memory. The storage unit 14 stores sound data 21 , image data 22 and an abnormal sound determination program 23 . The sound data 21 is data acquired by the input unit 11 . The sound data 21 can include abnormal data including abnormal sounds and non-abnormal sound data not including abnormal sounds. The plurality of sound data 21 to be input to the abnormal sound determination program 23 can include one or both of abnormal data and non-abnormal data depending on the application and function of the abnormal sound determination system 1 . The abnormal sound determination program 23 includes an input data generation section 231 and a determination section 232 . The input data generator 231 generates image data 22 based on the sound data 21 . The image data 22 is used as input data to be input to the determination section 232 .

The determination unit 232 has a function of inputting the image data 22 and determining whether or not the original sound data 21 contains an abnormal sound. Further, the determination unit 232 can perform machine learning using the input image data 22 as teacher data. When the image data 22 is used as teacher data, the image data 22 can include a label indicating whether or not the original sound data 21 contains an abnormal sound. The abnormal sound determination program 23 can be stored in a computer-readable storage medium (for example, a storage device such as the storage unit 14). The storage medium may be HDD, SSD, flash memory, or the like.

In addition, the abnormal sound determination system 1 may be composed of one device, or may be composed of a plurality of devices. For example, part of the input unit 11, the output unit 12, the control unit 13, and the storage unit 14 may be arranged in a plurality of different devices. Further, in the abnormal sound determination system 1, a plurality of devices including the input unit 11, the output unit 12, the control unit 13, and the storage unit 14 are connected to each other so as to be communicable by wire or wirelessly. may be configured to be able to execute the processing described in . The abnormal sound determination system 1 also includes a computer (not shown) for controlling the abnormal sound determination system 1 .

FIG. 2 is a diagram showing an outline of an abnormal sound determination method in the abnormal sound determination system 1. FIG. The sound data 21 acquired by the abnormal sound determination system 1 is converted into intermediate data (primary split data 21n, secondary split data 21nm) by the input data generator 231 (see FIG. 1). Image data 22 is then generated based on the intermediate data. A plurality of image data 22 can be generated corresponding to each sound data 21 . When the judgment unit 232 judges the abnormality of the image data 22, the abnormal sound judgment system 1 outputs the judgment result 3 of the image data 22 (in other words, the original sound data 21).

Next, the abnormal sound determination method of this embodiment will be described. FIG. 3 is a flow chart showing processing in the input data generator 231 of the abnormal sound determination program 23. As shown in FIG. In step S01, the control unit 13 acquires the sound data 21 via the input unit 11 (acquisition step). The sound data 21 includes, for example, a plurality of hammering sound waveforms 210 (sound waveforms) when performing a hammering test on concrete such as the inner wall of a tunnel (see FIG. 2). When the sound data 21 includes a plurality of hitting sound waveforms 210 , the control unit 13 can extract one of the hitting sound waveforms 210 from the sound data 21 .

FIG. 4 is a diagram showing the striking waveform 210 of the sound data 21. FIG. The vertical axis of FIG. 4 indicates sound pressure [Pa], and the horizontal axis indicates time [s]. Also, dashed lines in FIG. 4 indicate timings separated by 50 ms from the timing T0=0 [s].

In the filtering process from step S02 to step S07, the control unit 13 adjusts the sound data 21 according to human hearing ability.

In step S02, the control unit 13 performs time division processing for dividing the sound data 21 every 50 ms from timing T0 as time adjustment. The control unit 13 divides the sound data 21 into a plurality of predetermined time domain intervals including the pronunciation timing (T0). In the present embodiment, primary divided data 211 to 218 (hereinafter collectively referred to as primary divided data 21n (n= 1 to 8) are generated as intermediate data. Note that the section before -100 ms and the section after +300 ms are excluded from the primary divided data 21n. In the percussion waveform 210 of FIG. 4, when represented in the time domain, sound pressure values are observed over roughly the section (3) and the section (4).

In step S03, the control unit 13 converts each of the primary divided data 211 to 218 divided into sections (1) to (8) into frequency domain data. FIG. 5 is a diagram showing primary divided data 211 to 218 transformed into the frequency domain. The control unit 13 can transform the primary divided data 211 to 218 in the time domain into data in the frequency domain (frequency spectrum) using Fourier transform.

In step S04, the control unit 13 uses the frequency weighting characteristic F1 shown in FIG. 6 to weight the primary divided data 211 to 218 converted into the frequency domain, and adjust the sound pressure level.

When the sound data 21 is collected by a microphone or the like, the sound data 21 may include sounds in a frequency range that cannot be heard by humans. For example, assuming a hammering sound waveform 210 when an inspector conducts a hammering test, sound data 21 containing sounds in a frequency range that humans cannot hear is used as teacher data or judgment data (collectively, “input data ), it is difficult for the judging section 232 to reproduce the judging skill of a skilled inspector. Therefore, the primary divided data 211 to 218 are adjusted by the frequency weighting characteristic F1.

The frequency weighting characteristic F1 in FIG. 6 is a sound pressure adjustment that matches the human ear, and indicates that the hearing sensitivity is low in the low frequency region and the high frequency region. Therefore, the control unit 13 adjusts the sound pressure level by relatively decreasing the magnitude of the sound pressure level for each frequency of the sound data 21 in the low frequency region and the high frequency region, thereby following the human hearing sensitivity. can be weighted and adjusted.

In step S05, the control unit 13 extracts data in the audible band from the primary divided data 211 to 218 (sound data 21) using a 1/3 octave bandpass filter, and extracts the primary divided data 211 to 218 (sound data 21). Adjust the frequency band that is divided into multiple frequency bands. The control unit 13 applies a band-pass filter with a frequency interval of 1/3 octave to each of the primary divided data 211 to 218, and divides the frequencies into a plurality of bands (32 sections in this embodiment) to obtain intermediate data. Some secondary division data 21 nm (where n=1 to 8 and m=1 to 32, where n represents the time division domain and m represents the frequency division domain) is generated. Note that one octave is a frequency interval that doubles the frequency ratio.

FIG. 7 is a diagram showing each section [1] to [32] of the 1/3 octave bandpass filter. The lowest frequency section [1] is located near the boundary between the audible range and the inaudible range on the low frequency side. Also, the section [32] with the highest frequency is located near the boundary between the human audible range and the inaudible range on the high frequency side.

Note that the control unit 13 is not limited to the 1/3 octave bandpass filter, and applies a bandpass filter with a frequency interval of 1/N octave (for example, N is a natural number of 24 or more) for each of the primary divided data 211 to 218. Secondary division data 21 nm (n = 1 to 8, m = 1 to 32, where n represents the time division domain and m represents the frequency division domain) obtained by filtering and frequency-dividing in a plurality of sections. may be generated.

In step S06, the control unit 13 converts the secondary division data 21nm into time domain data. The control unit 13 can convert the time-domain secondary division data 21 nm into time-domain data (time spectrum) using an inverse Fourier transform.

In step S07, the control unit 13 uses the time weighting characteristic F2 shown in FIG. 8 to weight the secondary divided data 21 nm converted into the time domain, and adjust the sound pressure level (time weighting process). . The sound pressure of sound changes in an extremely short time. The time weighting characteristic F2 is a characteristic (so-called fast characteristic) approximated to the time response of the human ear, and has a rising time constant τ=125 ms. The slope from the rising timing (0 s) of the time weighting characteristic F2 is 34.7 dB/s. Note that the characteristic used for the time weighting process is not limited to the time weighting characteristic F2.

At step S08, the control unit 13 generates the image data 22 shown in FIG. 9 from the secondary division data 21nm obtained by the filtering processes at steps S02 to S07 (data generation step). Image data 22 is generated from a plurality of secondary divided data 21nm (sound data 21) after adjustment. The image data 22 is used as input data that can be read by the determination unit 232 of the abnormal sound determination program 23 .

The image data 22 includes sound pressures (sound pressure levels) corresponding to each section [1] to [32] of the frequency band and each section (1) to (8) of the time in two mutually orthogonal directions of frequency and time. are arranged as grayscale pixel values. The pixel value (grayscale gradation) of each cell 221 indicates the sound pressure level. As the sound pressure level used as the pixel value of the image data 22, an average value or an integrated value of the sound pressure of the secondary divided data 21 nm divided by frequency and time may be used. In addition, in FIG. 9, for the sake of explanation, the pixel values are simply divided into three levels and shown. The magnitude (brightness) of pixel values increases in the order of cell 221a, cell 221b, and cell 221c. Therefore, the sound pressure level corresponds to increase in order of the cell 221a, the cell 221b, and the cell 221c.

The abnormal sound determination system 1 generates one image data 22 corresponding to one sound data 21 input to the input unit 11 (specifically, sound data including one hitting sound waveform 210). A plurality of corresponding image data 22 are generated from the plurality of sound data 21 .

The image data 22 generated after adjusting the sound data 21 can be input to the determination unit 232 as teacher data for machine learning. When the image data 22 is used as teacher data, the teacher data can include a label indicating whether or not the original sound data 21 corresponding to the image data 22 contains an abnormal sound.

Also, the image data 22 can be input to the learned determination unit 232 as data to be determined. The control unit 13 can cause the determination unit 232 to determine whether or not the input image data 22 includes an abnormal sound. After that, the control unit 13 can output the determination result 3 of the presence/absence of abnormality in the image data 22 (sound data 21 ) by the determination unit 232 via the output unit 12 .

As described above, the abnormality determination method using the abnormal sound determination system 1 may include a determination step of inputting the image data 22, which is input data, into the machine learning program (determination unit 232) as teacher data or data to be determined. can.

In this embodiment, the judging section 232 is made to learn based on the sound data 21 adjusted according to the human hearing ability, and the judging section 232 judges whether or not there is an abnormal sound in the other sound data 21. . Therefore, for example, in the field of hammering sound inspection, the judgment result of the trainee who listened to the sound data 21 and the judgment result 3 obtained by judging the same sound data 21 by the abnormal sound judging program 23 are combined. By comparing, the trainee can improve the inspection skill by collating the judgment result of the experienced inspector with the assumed judgment result.

In addition, by using the abnormal sound determination system 1 (abnormal sound determination program 23) at the actual inspection work site, the inspector can refer to the determination result 3 equivalent to that of a skilled inspector without accompanying other inspectors. It is possible to perform hammering inspection work while Therefore, it is possible to improve the inspection skill of the inspector while performing the work at the site.

It should be noted that the abnormal sound determination method of the present embodiment may include a visualization step of creating a ground image in which grounds for determination of the presence or absence of abnormality are visualized by Grad-CAM. Grad-CAM is a technique for visualizing which part of an image is being judged by machine learning by focusing on the feature amount extracted by the last convolutional layer of a convolutional neural network. For example, by creating and displaying a basis image in which cells with a large contribution rate to the judgment basis are colored by color gradation in an image with the same number of rows and columns as the image data 22, a skilled inspector can mainly determine which frequency and It is possible to make other inspectors objectively grasp whether the sound of the timing is used as the basis for the determination of the presence or absence of an abnormality. Therefore, even if the trainee receives training using the abnormal sound determination program 23, even if he/she cannot obtain a sufficient opportunity for skill training by a person, what region of sound should be identified to make an accurate determination? can be learned by the learner.

As described above, the abnormal sound determination method executable in the abnormal sound determination system 1 according to the embodiment of the present disclosure includes the acquisition step (S01) of acquiring the sound data 21, and and a data generation step of generating the adjusted sound data 21 as input data for machine learning. As a result, even when the machine-learned determination program (e.g., determination unit 232) determines whether the input data is abnormal, it is possible to obtain a determination result similar to the abnormality determination performed by a skilled inspector. Thus, the abnormal sound determination method, abnormal sound determination program, and abnormal sound determination system 1 that can be used for inspection skill training can be configured.

In addition, since the image data 22 generated as input data is adjusted so that the audible range is included and the inaudible range is excluded, extra data that is less relevant to hearing skills is omitted to reduce the increase in the amount of data. can do.

Although the description of the embodiment of the present disclosure is finished above, the aspect of the present disclosure is not limited to this embodiment.

For example, in the filter processing process of the present embodiment, the configuration for adjusting all of the sound pressure level, frequency band and time has been described, but one or more of the sound pressure level, frequency band and time for the sound data 21 ( part or all) may be adjusted.

In addition, in the present embodiment, the sound data 21 includes the hammering sound waveform 210 of the hammering test for concrete. It may include a part or a plurality of sound waveforms (corresponding to the hitting sound waveform 210 shown in FIG. 4) of sound, vibrating sound, hitting sound, or noise. As the sound data 21, a sound waveform containing or not containing an abnormal sound can be used. Abnormal sounds may include tire sounds, vehicle failure sounds, road surface abnormal sounds, and the like.

In addition, the sound data 21 is not limited to data acquired by inspection, and may be data acquired by any other means.

Also, the order of the processing of steps S02 to S07 of the abnormal sound determination method shown in FIG. 3 is an example, and the order may be changed as appropriate.

1 abnormal sound determination system 3 determination result 11 input unit 12 output unit 13 control unit 14 storage unit 21 sound data 21n (211 to 218) primary divided data 21nm secondary divided data 22 image data 23 abnormal sound determination program 210 hammering waveform 221 (221a to 221c) Cell 231 Input data generator 232 Judgment unit F1 Frequency weighting characteristic F2 Time weighting characteristic T0 Timing τ Time constant

Claims

an acquisition step of acquiring sound data;
a filtering step of adjusting the sound data according to human hearing ability;
a data generation step of generating the adjusted sound data as input data for machine learning;
Abnormal sound determination method including
The abnormal sound determination method according to claim 1, wherein the filter processing step adjusts one or more of sound pressure level, frequency band and time for the sound data.
The abnormal sound determination method according to claim 2, wherein in the filtering step, all of the sound pressure level, the frequency band and the time are adjusted.
The filtering step includes
Adjusting the sound pressure level by weighting and adjusting the magnitude of the sound pressure level for each frequency of the sound data according to human hearing sensitivity,
As the adjustment of the frequency band, extracting audible band data from the sound data with a 1/3 octave bandpass filter, dividing the sound data into a plurality of bands,
As the adjustment of the time, the sound data is divided into a plurality of predetermined time domain segments including pronunciation timings.
The abnormal sound determination method according to claim 3.
4. The input data is image data in which the sound pressure levels corresponding to the frequency band and the time are arranged as grayscale pixel values in two mutually orthogonal directions of the frequency band and the time. The abnormal sound determination method according to claim 4.
The sound data includes abnormal data including abnormal sounds and non-abnormal sound data not including the abnormal sounds,
6. The abnormal sound determination method according to any one of claims 1 to 5, further comprising a determination step of inputting said input data to a machine learning program as teacher data or data to be determined.
　The abnormal sound determination method according to any one of claims 1 to 6, wherein the sound data includes any one of mechanical sound, vibration sound, and hammering sound.
The abnormal sound determination method according to any one of claims 1 to 7, including a visualization step of creating a ground image in which grounds for determining the presence or absence of an abnormality are visualized by Grad-CAM.
an acquisition step of acquiring sound data;
a filtering step of adjusting the sound data according to human hearing ability;
a data generation step of generating the adjusted sound data as input data for machine learning;
An abnormal sound judgment program for causing a computer to execute.
an input unit for acquiring sound data;
a processing unit that adjusts the sound data according to human hearing ability and generates the adjusted sound data as input data for machine learning;
Abnormal sound determination system.