WO2020026829A1

WO2020026829A1 - Sound data processing method, sound data processing device, and program

Info

Publication number: WO2020026829A1
Application number: PCT/JP2019/028229
Authority: WO
Inventors: 亮太藤井
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2018-07-31
Filing date: 2019-07-18
Publication date: 2020-02-06
Also published as: JPWO2020026829A1; US20210304786A1; JP7407382B2; US11830518B2; US20240046953A1

Abstract

Provided is a sound data processing method in a sound data processing device having a processing unit for acquiring input of sound data for a target and processing the sound data, the method having: a step for using acquired normal-sound data for the target to generate simulated abnormal-sound data as simulated abnormal sounds of the target; and a step for performing machine learning using the acquired normal-sound data and the generated simulated abnormal-sound data as sound data for learning, and generating a learning model for performing abnormal-sound sensing by determining an abnormal sound in the sound data for the target.

Description

Sound data processing method, sound data processing device, and program

The present disclosure relates to a sound data processing method, a sound data processing device, and a program for performing processing related to machine learning of target sound data.

2. Description of the Related Art In various facilities, a system that collects sound of a target object or a target space, analyzes acquired sound data to detect an abnormality, monitor the operation status of equipment, and determine whether a product is good or not has conventionally been used. ing. In this type of system, for example, there is a device that detects an abnormality in sound data of a target object and determines a failure when an abnormal sound is generated. In recent years, various studies have been made to determine abnormal sounds using machine learning processing based on a statistical method in order to detect abnormalities in the acquired sound data.

For example, Patent Literature 1 discloses an apparatus that detects abnormal sound of a machine using learning data of a given mechanical sound during a normal operation. The device of Patent Document 1 separates an input frequency-domain signal into two or more types of signals having different sound properties, extracts a predetermined acoustic feature amount for each of the two or more types of signals, and extracts Calculate the degree of abnormality of each of two or more types of signals using the normal model of the acoustic features and the two or more types of signals that have been learned in advance, and use the integrated degree of abnormality that integrates these degrees of abnormality. This is to determine whether the signal in the frequency domain is abnormal.

Japanese Patent Application Publication No. 2017-090606

When performing machine learning, it is important to generate a more suitable learning model and improve the accuracy of the determination result. In order to generate a suitable learning model, a large amount of data and data having appropriate characteristics are required as learning data. However, it may be difficult to appropriately acquire a large amount of sound data and sound data having appropriate characteristics as learning data so as to adapt to classification determination such as abnormality detection of target sound data.

The present disclosure has been devised in view of the conventional situation described above, and has a sound data processing method capable of generating a suitable learning model using appropriate learning data when performing machine learning of sound data. It is an object to provide a method, a sound data processing device and a program.

The present disclosure is a sound data processing method in a sound data processing device having a processing unit that inputs and acquires target sound data and performs processing of the sound data, wherein the acquired target normal sound data is used. Generating simulated abnormal sound data to be simulated abnormal sound of the target, and performing machine learning using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning; Generating a learning model for determining an abnormal sound in the target sound data and performing abnormal sound detection.

Further, the present disclosure is a sound data processing apparatus having a processing unit that inputs and acquires target sound data and performs processing of the sound data, wherein the processing unit converts the acquired normal sound data of the target. Using, a simulated abnormal sound generation unit that generates simulated abnormal sound data that becomes a simulated abnormal sound of the target, using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning. And a machine learning unit that performs machine learning and determines an abnormal sound of the target sound data to generate a learning model for performing abnormal sound detection.

Further, the present disclosure relates to a sound data processing device which is a computer, acquiring sound data of a target, and using the acquired normal sound data of the target, a simulated abnormal sound that is a simulated abnormal sound of the target. Generating data, performing machine learning using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning, determining abnormal sounds in the target sound data, And a step of generating a learning model for performing the detection.

The present disclosure is a sound data processing method in a sound data processing device having a processing unit that inputs and acquires target sound data and performs processing of the sound data, wherein the target sound data is obtained based on the acquired target sound data. Generating similar sound data that is a similar sound similar to the sound data; performing machine learning using the acquired target sound data and the generated similar sound data as sound data for learning; Generating a learning model for performing classification determination on data.

Further, the present disclosure is a sound data processing device having a processing unit that inputs and acquires target sound data and performs processing of the sound data, wherein the processing unit is based on the acquired target sound data, A similar environment generating unit that generates similar sound data that is a similar sound similar to the target sound data, and performs machine learning using the acquired target sound data and the generated similar sound data as sound data for learning. And a machine learning unit that generates a learning model for performing a classification determination on the target sound data.

Further, the present disclosure provides a sound data processing device, which is a computer, with a step of obtaining target sound data, and generating similar sound data that is a similar sound similar to the target sound data based on the obtained target sound data. Performing machine learning using the acquired target sound data and the generated similar sound data as sound data for learning, and generating a learning model for performing classification determination on the target sound data. And a program for executing the program.

According to the present disclosure, it is possible to generate a suitable learning model using appropriate learning data when performing machine learning of sound data.

FIG. 2 is a block diagram illustrating an example of a configuration of a sound data processing device according to the present embodiment. FIG. 2 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the first embodiment. 5 is a flowchart illustrating processing of the similar environment generation unit according to the first embodiment. FIG. 2 is a block diagram showing a functional configuration of the sound data processing apparatus according to the present embodiment during operation. Diagram conceptually explaining abnormality determination processing of sound data using machine learning FIG. 7 is a diagram conceptually illustrating a sound data abnormality determination process according to the first embodiment. FIG. 9 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the second embodiment. 11 is a flowchart showing processing of the normal sound processing unit according to the second embodiment. 9 is a flowchart showing processing of the abnormal sound selection unit according to the second embodiment. Flowchart showing processing of a mixing unit according to Embodiment 2. FIG. 9 is a diagram conceptually illustrating a sound data abnormality determination process according to the second embodiment. FIG. 9 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the third embodiment. The figure which shows an example of the display screen of the user interface (UI) for selecting an inspection object 9 is a flowchart showing processing at the time of learning of the sound data processing device according to the third embodiment. The figure explaining the generation processing of the simulated abnormal sound in the abnormal type case 1 The figure explaining the generation processing of the simulated abnormal sound in the abnormal type case 2 The figure explaining the generation processing of the simulated abnormal sound in the abnormal type case 3

Hereinafter, each embodiment specifically disclosing the configuration according to the present disclosure will be described in detail with reference to the drawings as appropriate. However, an unnecessary detailed description may be omitted. For example, a detailed description of a well-known item or a redundant description of substantially the same configuration may be omitted. This is to prevent the following description from being unnecessarily redundant and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the claimed subject matter.

(Background according to the present embodiment)
When machine learning of sound data is performed, data for learning may not be sufficiently obtained. Machine learning generally requires a lot of data for learning. In particular, the deep learning technology requires a large amount of data (tens of thousands to millions) in order to utilize the depth of the hierarchy. However, there are cases where learning data cannot be easily obtained depending on the use scene. In particular, sound data has less sample data obtained as existing data than image data, and an environment for retrieving and collecting learning data using the Internet, such as a tapping sound for equipment inspection, has not been established. For example, when performing machine learning of sound data such as a machine operating sound and a tapping sound of equipment inspection, there is a problem that a sufficient amount of learning data cannot be obtained.

方法 One way to use limited learning data effectively to obtain a large amount of learning data is to inflate the data (Data Argumentation). The inflating of data is a method of adding noise to existing learning data or performing processing such as inversion and rotation if it is an image to provide variations. However, for sound data, padding similar to image data cannot be easily applied. For example, it is conceivable that an audio waveform is subjected to STFT (Short Time Fourier Transform) processing to be converted into a spectrogram image and processed in the same manner as an image to perform data processing. It may not be possible. In other words, in machine learning of sound data, it is necessary to inflate the learning data after capturing the characteristics of the sound.

ユース In some use cases using machine learning of sound data, sound data of the target sound may not be obtained. For example, in the case of a machine operating sound, if a normal sound is recorded during operation, data collection is always possible, but an abnormal sound cannot be obtained unless an abnormal sound is recorded. To detect abnormal sounds using machine learning in a situation where it is difficult to acquire abnormal sounds, it is necessary to construct a system that detects abnormalities only with data for learning normal sounds.

As a method of detecting an abnormality using only normal sound learning data, a difference between a learned value and an evaluation value is calculated as described in Patent Document 1 and the like, and whether the difference value exceeds a predetermined threshold value There is a method of detecting an abnormality by evaluating whether or not the value is different from a normal value. However, in this method, a sound that can be detected as an abnormal sound is significantly different from a normal value.For example, in a use case where a small difference from the normal sound is an abnormal sound, it is difficult to detect the abnormal sound. is there.

In view of the background described above, in the present embodiment, a large amount of sound data, sound data having appropriate characteristics can be used as learning data, and a learning model suitable for performing machine learning of sound data is provided. An example of a system that is generated and enables appropriate evaluation during operation is shown below.

In the present embodiment, as a configuration example of a system for processing target sound data, a learning model is generated by performing machine learning using acquired sound data, and classification of sound data is determined using the generated learning model. 1 shows an example of a sound data processing device and a sound data processing method for performing an abnormality determination as an example. Here, as an example of target sound data, a case is assumed in which mechanical noise of a fan, a motor, or the like in a facility such as a data center or a factory is assumed, and abnormal sound in sound data is determined to detect abnormal noise.

(Configuration of sound data processing device)
FIG. 1 is a block diagram showing an example of a configuration of a sound data processing device according to the present embodiment. The sound data processing device is configured to include one or a plurality of microphones (microphones) 10, an AD converter 20, and

information processing devices

30 and 50. The

information processing devices

30 and 50 are configured by a computer such as a PC (Personal Computer) having a processor and a memory, and execute various types of information processing related to machine learning and the like according to the present embodiment.

The microphone 10 has a sound collection device such as a condenser microphone that inputs a sound wave generated in a target object or a target space and outputs the audio signal as an audio signal of an electric signal. The AD converter 20 converts an analog audio signal into digital sound data using a predetermined quantization bit and a sampling frequency.

(4) The information processing device 30 is connected to the AD converter 20, and inputs sound data to be collected by the microphone 10 and converted to digital data by the AD converter 20. The information processing device 30 is connected to the information processing device 50 via a communication path 40 such as a wired or wireless network or a communication line. In the illustrated example, the information processing device 30 functions as a terminal device of a local computer located at the site, the information processing device 50 functions as a server device of a remote computer located at another location, and the present invention is implemented by a plurality of information processing devices. Is executed in a distributed manner. The information processing device 50 may be a cloud computer on a network. The information processing device 30 mainly functions as a detection device that executes an abnormal sound detection process during operation using a learning model based on machine learning. The information processing device 50 mainly functions as a learning device that executes a machine learning process during learning for generating a learning model by performing machine learning. Note that the

information processing devices

30 and 50 may be configured to execute processing by one device such as a computer, or may be configured to execute processing by three or more devices such as computers. It is not limited to a simple device configuration.

The information processing apparatus 30 includes a processing unit 301, a storage unit 302, a storage unit 303, and a communication interface (communication IF) 304. The processing unit 301 includes various processing devices such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and an FPGA (Field Programmable Gate Array), and executes processing related to sound data. The storage unit 302 has a memory device such as a RAM (Random Access Memory), is used as a working memory of the processing unit 301, and is used for temporary storage in calculations during data processing. The storage unit 302 has a memory device such as a ROM (Read Only Memory) and stores various execution programs for executing the processing of the processing unit 301 and various setting data related to processing such as machine learning. The storage unit 303 includes various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), and an optical disk drive, and stores data such as target sound data and a learning model generated by machine learning. The communication interface 304 is an interface that performs wired or wireless communication, communicates with the information processing device 50 via the communication path 40, and transmits and receives data such as sound data and learning models.

The information processing device 50 includes a processing unit 501, a storage unit 502, a storage unit 503, and a communication interface (communication IF) 504. The processing unit 501 has various processing devices such as a CPU, a DSP, and an FPGA, and executes processing relating to sound data. The storage unit 502 has a memory device such as a RAM, is used as a working memory of the processing unit 501, and is used for temporary storage in calculations and the like during data processing. The storage unit 502 includes a memory device such as a ROM, and stores various execution programs for executing the processing of the processing unit 501 and various setting data related to processing such as machine learning. The storage unit 503 includes various storage devices such as an HDD, an SSD, and an optical disk drive, and includes target sound data, a learning model generated by machine learning, an abnormal sound database (abnormal sound DB), and a normal sound database (normal sound DB). , And data such as a general-purpose sound database (general-purpose sound DB). The abnormal sound database is a database that collects sound data in an abnormal state. The normal sound database is a database that collects sound data in a normal state. The general-purpose sound database is a database that collects various general-purpose sound data generated daily. The communication interface 504 is an interface that performs wired or wireless communication, communicates with the information processing device 30 via the communication path 40, and transmits and receives data such as sound data and learning models.

In the present embodiment, the target sound data collected by the microphone 10 is acquired, and the

information processing devices

30 and 50 execute the processing of the sound data. At the time of learning, machine learning of sound data is executed by the

information processing devices

30 and 50 to generate a learning model. During operation, the

information processing apparatuses

30 and 50 determine abnormalities in sound data using a learning model, and detect abnormal sounds.

Hereinafter, some embodiments of a sound data processing method and apparatus for executing processing including machine learning of sound data according to the present embodiment will be described.

(Embodiment 1)
In the first embodiment, an example will be described in which a similar sound environment of acquired sound data is created, a similar sound of the target sound data is generated, data for learning is inflated, and learning and evaluation of sound data are performed. .

FIG. 2 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the first embodiment. The sound data processing device has a function of the similar environment generation unit 201 and the function of the machine learning unit 202 during learning of machine learning. The functions of each unit of the similar environment generation unit 201 and the machine learning unit 202 are realized by the processing of the

processing units

301 and 501 of the

information processing devices

30 and 50.

The similar environment generation unit 201 generates a similar environment of the sound data to be learned acquired in the real environment, and uses the target sound data 251 acquired as the target sound data to generate similar sound data that is sound data of a similar sound. 253 is automatically generated to inflate the learning data. The machine learning unit 202 executes machine learning such as deep learning using artificial intelligence (AI: Artificial @ Intelligent) installed in the processing unit. The machine learning unit 202 performs the machine learning process using the acquired target sound data 251, similar sound data 253 generated based on the target sound data 251, and a general-purpose sound database (general-purpose sound DB) 254, and as a learning result. Is generated. The general-purpose sound database 254 stores general-purpose sound data including various general-purpose daily sounds such as environmental sounds and human voices.

The machine learning process in the machine learning unit 202 may be performed using one or more statistical classification techniques. Statistical classification techniques include, for example, linear classifiers (linear classifiers), support vector machines (support vector machines), quadratic classifiers (quadratic classifiers), kernel density estimation (kernel estimation), decision trees (decision tree), Artificial neural networks, Bayesian techniques and / or networks, hidden Markov models, binary classifiers, binary classifiers, multi-class classifiers ) Clustering (a clustering technique), random forest (a random forest technique), logistic regression (a logistic regression technique), linear regression (a linear regression technique), gradient boosting (a gradient boosting technique), and the like. However, the statistical classification technique used is not limited to these.

FIG. 3 is a flowchart showing processing of the similar environment generation unit 201 according to the first embodiment. The similar environment generation unit 201 inputs the target sound data 251 acquired by the microphone 10 or the like as sound data for learning (S11), performs similar sound generation processing on the target sound data 251 (S12), and performs similar sound generation. The data 253 is generated. At this time, the similar environment generation unit 201 generates a plurality of sound data similar to the original sound data by changing the frequency characteristics, sound volume, sound quality, and the like of the sound data using the filter 211, the sound volume change parameter 212, and the like. . That is, the similar environment generation unit 201 generates the similar sound data 253 by changing at least one of the frequency characteristics and the volume of the target sound data 251.

The filter 211 is a filter that changes the frequency characteristics of sound data, such as a low-pass filter (LPF) and a high-pass filter (HPF). The volume change parameter 212 is a parameter that changes the volume of the sound data, such as the volume of the entire frequency band of the sound data or the volume of a predetermined frequency band for emphasizing or reducing a specific frequency. The similar environment generation unit 201 creates various variations related to the original sound data by the above processing, and automatically generates a plurality of similar sound data 253. The similar environment generating unit 201 has means for inflating learning data by a plurality of different approaches, selects an appropriate inflating means according to the pattern of the target sound data, and adds learning sound data. It can also be generated.

Next, the similar environment generation unit 201 determines whether or not learning contradiction has occurred in the generated similar sound data 253 (S13). The learning inconsistency is determined, for example, by determining the degree of coincidence of frequencies of a plurality of generated sound data. If there is a sound data for learning that has a different label but a frequency that matches, a learning contradiction occurs. Judge. Subsequently, the similar environment generation unit 201 discards the learning-contradictory sound data (S14). As a result, from the generated similar sound data 253, sound data of different labels with the same frequency are removed, and learning inconsistency in the learning sound data is eliminated. In this manner, the similar environment generation unit 201 generates and adds the similar sound data 253 to the target sound data 251, thereby inflating the appropriate learning sound data in accordance with the characteristics of the target sound data 251. Do. Then, the similar environment generating unit 201 outputs the learning sound data with the data inflated (S15).

The machine learning unit 202 generates a learning model 252 by performing a machine learning process using the inflated learning sound data including the target sound data 251 and the similar sound data 253.

FIG. 4 is a block diagram showing a functional configuration at the time of operation in the sound data processing device according to the present embodiment. The sound data processing device has a function of the determination unit 401 during operation using a learning model based on machine learning. The function of each unit of the determination unit 401 is realized by the processing of the

processing units

301 and 501 of the

information processing devices

30 and 50. The function of the determination unit 401 can use processing during operation using a learning model based on general machine learning.

The determination unit 401 receives test sound data 451, which is sound data to be tested, and determines whether sound data is normal or abnormal based on likelihood or the like using a learning model 452 generated by machine learning, and a determination result 453. Is output. The learning model 452 is a result of learning sound data for learning as normal and abnormal with different labeling (clustering). Therefore, the determination unit 401 calculates the normal likelihood and the abnormal likelihood for the test sound data 451 to be determined, and determines which is closer to normal or abnormal. Then, based on the determination result 453 of the test sound data 451, the determination unit 401 outputs an abnormality determination result 454 indicating whether or not the target sound data is abnormal. Abnormal sound detection of the target sound is executed based on the abnormality determination result 454.

FIG. 5 is a diagram conceptually illustrating an abnormality determination process for sound data using machine learning. 5A shows an example of sound data classification using a simple threshold, and FIG. 5B shows an example of sound data classification using a learning model using machine learning. In FIG. 5, the data classification is simply shown in a two-dimensional space for easy understanding. The sound data of each inspection sound is indicated by a circle mark, and dot hatching indicates a normal sound, and hatched hatching indicates an abnormal sound.

(5) As shown in FIG. 5A, in the classification based on the linear boundary B1 using a simple threshold, a normal sound may be erroneously classified as an abnormal sound. On the other hand, as shown in FIG. 5B, in the classification based on the boundary B2 based on the learning model of the machine learning using the neural network, the normal sound and the abnormal sound can be accurately classified, and a more reliable judgment result is obtained. can get.

FIG. 6 is a diagram conceptually illustrating the sound data abnormality determination processing according to the first embodiment. In FIG. 6, (A) shows an example of classification of sound data by a learning model without data inflating as a comparative example, and (B) shows a learning model in which data inflating is performed by generating a similar sound as in the first embodiment. Each of the examples of the classification of the sound data by each is shown. FIG. 6 simply shows the classification of data in a two-dimensional space for easy understanding. The sound data of each inspection sound is indicated by a circle mark, and dot hatching indicates a normal sound, and hatched hatching indicates an abnormal sound. The broken circle marks represent sound data of the normal sound and the abnormal sound added by the data padding.

As shown in FIG. 6A, in a learning model obtained by performing machine learning using only sound data obtained at the time of learning, the boundary B3 may not be appropriately determined because there are few variations in data. In this case, in the sound data acquired during operation, a normal sound is erroneously determined as an abnormal sound, and an error (NG) occurs in the determination result. In particular, erroneous determination is likely to occur when the distribution of the characteristics of the sound data during learning is biased and the characteristics of the sound data during operation are slightly different from the sound data during learning due to environmental changes. On the other hand, as shown in FIG. 6B, sound data of a similar sound automatically generated is added to the sound data obtained at the time of learning to inflate the data for learning, and machine learning is performed. In the resulting learning model, a more appropriate boundary B4 is determined based on a large number of learning data. In this case, normal sound and abnormal sound can be accurately classified with respect to the sound data acquired at the time of operation, and a more reliable judgment result can be obtained. Therefore, abnormal noise detection can be accurately performed.

As described above, in this embodiment, learning data is inflated by automatically generating similar sound data corresponding to sound data in a similar environment based on target sound data acquired in a real environment. Thus, even if a large amount of learning data cannot be obtained in sound data for which appropriate learning data cannot be inflated by data processing similar to that of an image, a sufficient amount of appropriate learning data can be obtained. It can be used to generate a suitable learning model for machine learning. Further, by generating a similar environment of the sound data of the real environment acquired at the time of learning, it is possible to cope with a change in the environment during operation, and to obtain a highly accurate determination result with respect to the environment change. A learning model can be generated. As a result, it is possible to improve the accuracy of the classification judgment such as the abnormal judgment result of the sound data using the learning model by the machine learning.

(Embodiment 2)
In the second embodiment, when only normal sounds can be obtained as learning data, a simulated abnormal sound is generated using the abnormal sound database, so that the abnormal sound as the target learning data is generated. An example will be described in which sound data is created and the data for learning is inflated to perform learning and evaluation of the sound data.

FIG. 7 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the second embodiment. The sound data processing device has functions of a normal sound processing unit 601, an abnormal sound selection unit 602, a mixing unit 603, and a machine learning unit 604 when learning machine learning. Here, the normal sound processing unit 601, the abnormal sound selection unit 602, and the mixing unit 603 implement a function as a simulated abnormal sound generation unit that generates simulated abnormal sound data 653. The functions of the normal sound processing unit 601, the abnormal sound selection unit 602, the mixing unit 603, and the machine learning unit 604 are realized by the processing of the

processing units

301 and 501 of the

information processing devices

30 and 50.

The normal sound processing unit 601 performs data processing for generating a simulated abnormal sound using the normal sound data 651 obtained as sound data to be learned. The abnormal sound selection unit 602 uses the abnormal sound database (abnormal sound DB) 654 to select appropriate abnormal sound data according to the type and characteristics of the target sound data. The abnormal sound database 654 stores sound data corresponding to various abnormal sounds as sound data when an abnormality occurs. For example, in the case of a motor sound, a sound in which the number of revolutions is changed, a sound in which a member is rubbed, and the like are collected and stored in advance. The abnormal sound database 654 may store sound data indicating an abnormal state suitable for the inspection target.

The mixing unit 603 performs a mixing process between the processed normal sound data and the selected abnormal sound data to generate simulated abnormal sound data 653 that is simulated abnormal sound data, thereby inflating the learning data. I do. The machine learning unit 604 executes machine learning such as deep learning using artificial intelligence installed in the processing unit. The machine learning unit 604 performs machine learning using the acquired normal sound data 651 and the simulated abnormal sound data 653 generated based on the normal sound data 651, and generates a learning model 652 as a learning result.

The machine learning process in the machine learning unit 604 may be performed using one or more statistical classification techniques. Statistical classification techniques include, for example, linear classifiers (linear classifiers), support vector machines (support vector machines), quadratic classifiers (quadratic classifiers), kernel density estimation (kernel estimation), decision trees (decision tree), Artificial neural networks, Bayesian techniques and / or networks, hidden Markov models, binary classifiers, binary classifiers, multi-class classifiers ) Clustering (a clustering technique), random forest (a random forest technique), logistic regression (a logistic regression technique), linear regression (a linear regression technique), gradient boosting (a gradient boosting technique), and the like. However, the statistical classification technique used is not limited to these.

FIG. 8 is a flowchart showing the processing of the normal sound processing unit 601 according to the second embodiment. The normal sound processing unit 601 inputs the normal sound data 651 acquired by the microphone 10 or the like as sound data of a normal sound for learning (S21), and processes sound data for processing for mixing abnormal sounds. Perform processing. At this time, the normal sound processing unit 601 selects a filter that changes frequency characteristics, such as a low-pass filter (LPF) and a high-pass filter (HPF), based on the type of the sound data to be inspected (S22). Then, the normal sound processing unit 601 applies the selected filter, and processes the sound data by, for example, removing a specific frequency, moving a frequency, and the like (S23). Here, the sound data processing device assumes a state in which what is to be inspected is known in advance, and performs processing according to the characteristics of the sound data to be inspected. For example, processing processing such as reducing the specific frequency of the target sound of the stationary sound and removing it, and converting the pitch of a target sound whose peak frequency is 100 Hz to 200 Hz and shifting it to 200 Hz is performed. Further, the sound volume of the sound data of the target sound may be adjusted according to the characteristics of the sound data to be inspected. Then, the normal sound processing unit 601 outputs the sound data of the normal sound after the processing (S24).

To create a simulated abnormal sound that can be assumed, generate an abnormal sound by adding an abnormal sound to the normal sound, generate by subtracting the abnormal sound from the normal sound, and generate by changing some characteristics of the normal sound There are various generation methods. Therefore, the normal sound processing unit 601 processes the normal sound for mixing with the abnormal sound in order to generate a target abnormal sound in accordance with the environment of the normal sound, and processes the normal sound into an abnormal sound. , Etc. are performed. For example, the frequency of a part of the normal sound is reduced to add an abnormal sound. Alternatively, the frequency characteristic of the normal sound is changed to subtract the abnormal sound. Alternatively, when the state slightly higher than the sound in the normal state is the abnormal state, the frequency of the normal sound is shifted slightly higher. Further, in the tapping sound of the equipment inspection, when the sound that reverberates is in a normal state and the sound that does not reverberate is in an abnormal state, the filter processing is performed so as to cancel the component of the sound that reverberates from the normal sound. A pre-process for generating an abnormal sound is executed by these various data processing processes.

FIG. 9 is a flowchart showing a process performed by the abnormal sound selecting unit 602 according to the second embodiment. The abnormal sound selection unit 602 inputs the list information of the abnormal sound database 654 and the inspection target information regarding the type of the inspection target and the like (S31). The abnormal sound selection unit 602 determines whether to use the abnormal sound database 654 in accordance with the characteristics of the sound data to be inspected, that is, whether to perform mixing of abnormal sounds using the sound data in the abnormal sound database 654, Alternatively, it is determined whether the processing is performed only by processing the normal sound (S32). Here, when the abnormal sound database 654 is not used, the abnormal sound selecting unit 602 outputs silent sound data (S33). On the other hand, when the abnormal sound database 654 is used, the abnormal sound selecting unit 602 selects the sound data of the abnormal sound suitable for mixing from the abnormal sound database 654 based on the type of the sound data to be inspected (S34). Then, the abnormal sound selection unit 602 outputs sound data of the selected abnormal sound (S35).

FIG. 10 is a flowchart showing processing of the mixing unit 603 according to Embodiment 2. The mixing unit 603 inputs the sound data of the normal sound processed by the normal sound processing unit 601 as the sound data for mixing (S41), and converts the sound data of the abnormal sound selected by the abnormal sound selection unit 602. Input (S42). Then, the mixing unit 603 mixes the sound data by performing an addition process (superimposition process) of the processed normal sound and the abnormal sound (S43). Thereby, sound data of a simulated abnormal sound is generated. At this time, the mixing unit 603 multiplies the waveforms of the normal sound and the abnormal sound, adds the processed normal sound and the abnormal sound, subtracts the abnormal sound from the processed normal sound, For example, a process is performed in which no sound is used and the processed normal sound is used as it is as an abnormal sound. Then, the mixing unit 603 outputs the generated sound data of the simulated abnormal sound (S44). In this way, the mixing unit 603 generates the simulated abnormal sound data 653 by superimposing the abnormal sound data from the abnormal sound database 654 on the normal sound data 651, and adds the simulated abnormal sound data 653 according to the characteristics of the target sound data. Inflate sound data for appropriate learning. Note that the mixing unit 603 may adjust the volume of a plurality of patterns in the addition process, generate a plurality of different simulated abnormal sound data, and give the learning data a variation.

The machine learning unit 604 generates a learning model 652 by performing a machine learning process using the expanded sound data including the target normal sound data 651 and the simulated abnormal sound data 653.

The functional configuration of the sound data processing device during operation is the same as that of the first embodiment shown in FIG. The sound data processing device has a function of the determination unit 401 during operation using a learning model based on machine learning. The determination unit 401 receives test sound data 451, which is sound data to be tested, and determines whether sound data is normal or abnormal based on likelihood or the like using a learning model 452 generated by machine learning, and a determination result 453. Is output. Then, based on the determination result 453 of the test sound data 451, the determination unit 401 outputs an abnormality determination result 454 indicating whether or not the target sound data is abnormal. Abnormal sound detection of the target sound is executed based on the abnormality determination result 454.

FIG. 11 is a diagram conceptually illustrating the sound data abnormality determination processing according to the second embodiment. In FIG. 11, (A) shows an example of classification of sound data by a learning model without data inflating as a comparative example, and (B) shows learning in which data inflating is performed by generating a simulated abnormal sound as in the second embodiment. Each example of the classification of the sound data by the model is shown. In FIG. 11, the data classification is simply shown in a two-dimensional space for easy understanding. The sound data of each inspection sound is indicated by a circle mark, and dot hatching indicates a normal sound, and hatched hatching indicates an abnormal sound. The broken circle mark represents the sound data of the extraordinary sound added by the padding.

As shown in FIG. 11A, in the learning model obtained by performing the machine learning using only the sound data of the normal sound obtained at the time of learning, there is no learning result of the abnormal sound. Whether it is determined is undefined, and the boundary B5 may not be properly determined. In this case, an abnormal sound is erroneously determined as a normal sound in the sound data acquired during operation, and an error (NG) occurs in the determination result. In particular, when an abnormal sound that is close to the characteristic of a normal sound occurs, it is difficult to determine an appropriate determination criterion by learning only the normal sound, and an erroneous determination is likely to occur. On the other hand, as shown in FIG. 11B, the sound data of the simulated abnormal sound automatically generated is added to the sound data obtained at the time of learning to inflate the learning data, and machine learning is performed. In the learning model obtained as a result, a more appropriate boundary B6 is determined in consideration of the feature of the abnormal sound. In this case, normal sound and abnormal sound can be accurately classified with respect to the sound data acquired at the time of operation, and a more reliable judgment result can be obtained. Therefore, abnormal noise detection can be accurately performed.

As described above, in the present embodiment, the simulated abnormal sound data corresponding to the simulated abnormal sound is automatically generated based on the normal sound data of the target acquired in the real environment, so that the learning data Inflate. As a result, even if the actual learning data at the time of the abnormality is not obtained, the abnormal sound can be simulated together with the normal sound, and a sufficient amount of the appropriate learning data can be used for machine learning. Can be generated. In addition, machine learning using simulated abnormal sound data makes it possible to judge abnormalities due to subtle differences even in use cases where the difference between the characteristics of normal sounds and abnormal sounds is small, for example. Accuracy can be improved. As a result, it is possible to improve the accuracy of the classification judgment such as the abnormal judgment result of the sound data using the learning model by the machine learning.

(Embodiment 3)
In the third embodiment, an example will be described in which the processing in the second embodiment is partially changed, and a simulated abnormal sound is generated according to the type of abnormality set based on target sound data. The following description focuses on the differences from the second embodiment, and a description of similar configurations and functions will be omitted.

FIG. 12 is a block diagram showing a functional configuration at the time of learning in the sound data processing device according to the third embodiment. The sound data processing device has functions of a normal sound processing unit 701, an abnormal sound selection unit 721, an abnormal sound processing unit 722, a mixing unit 703, and a machine learning unit 704 when learning machine learning. Here, the normal sound processing unit 701, the abnormal sound selection unit 721, the abnormal sound processing unit 722, and the mixing unit 703 implement a function as a simulated abnormal sound generation unit that generates the simulated abnormal sound data 753. The functions of the normal sound processing unit 701, the abnormal sound selection unit 721, the abnormal sound processing unit 722, the mixing unit 703, and the machine learning unit 704 are realized by the processing of the

processing units

301 and 501 of the

information processing devices

30 and 50. .

The normal sound processing unit 701 performs data processing for generating a simulated abnormal sound using the normal sound data 651 obtained as the sound data of the inspection target (that is, the learning target). The abnormal sound selection unit 721 uses the abnormal sound database (abnormal sound DB) 654 to select appropriate abnormal sound data according to the type and characteristics of the sound data to be inspected. The abnormal sound processing unit 722 performs data processing for generating a simulated abnormal sound using the selected abnormal sound data. The mixing unit 703 performs mixing processing of the processed normal sound data and the abnormal sound data, and generates simulated abnormal sound data 753 that is simulated abnormal sound data, thereby inflating the learning data. . The machine learning unit 704 executes machine learning such as deep learning using artificial intelligence installed in the processing unit, as in the second embodiment. The machine learning unit 704 performs machine learning using the acquired normal sound data 651 and the simulated abnormal sound data 753 generated based on the normal sound data and / or the abnormal sound data, and a learning model 752 as a learning result. Generate

In the third embodiment, the sound data processing device sets an abnormal type 756 according to the type of the sound data to be inspected, and performs a different process for each abnormal type to generate a simulated abnormal sound. The sound data processing device switches the operation of the normal sound processing unit 701, the abnormal sound selection unit 721, and the abnormal sound processing unit 722 according to the set abnormal type 756. The mode of the abnormal sound when an abnormality occurs with respect to the normal sound differs depending on the abnormal type. Generally, the abnormal type is associated with an inspection target such as a target device, a target object, and a target space. For example, each target device, such as a device including a rotating body such as a motor and a device including a driving mechanism such as a fan belt, is characterized by a characteristic of sound when an abnormality occurs. In the following, as an example of the type of the sound data of the inspection target for which the process of generating the simulated abnormal sound is performed, an example in which the abnormal type is set according to the type of the target device will be described.

The sound data processing device has a display unit including a display device such as a liquid crystal display and an organic EL (Electro-Luminescence) display. The sound data processing device has a user interface (UI) including a display screen and the like displayed on a display unit, and can receive a selection input by a user operation. Using the user interface (UI) 755, the sound data processing device accepts a selection input of a target device and sets an abnormality type 756 according to the target device. Note that the abnormality type 756 may be directly input and set by a user operation. Further, the sound data processing device may set the abnormal type 756 according to the type and characteristics of the sound data to be inspected based on the identification information of the sound data.

Examples of the abnormal type 756 include the following cases 1 to 4.
Case 1: Mixing of abnormal noise (a different sound is generated from the sound at normal time). Case 1 is an abnormality caused by, for example, a bearing abnormality of the rotating body, a fan belt abnormality, an abnormal contact of the drive system, and the like.
Case 2: fluctuation of the peak frequency (the peak frequency of the sound in the normal state rises or falls). Case 2 is an abnormality that occurs due to, for example, a change in the rotation speed of the rotating body.
Case 3: Missing peak frequency (missing peak frequency of sound at normal time). Case 3 is an abnormality that occurs due to, for example, a change in a contact portion of the drive system.
Case 4: change in volume (normal sound level increases or decreases). The case 4 is an abnormality that occurs due to, for example, an increase or decrease in friction of the rotating body or the drive system.

FIG. 13 is a diagram illustrating an example of a display screen of a user interface (UI) for selecting an inspection target. The setting screen 761 in the display screen of the user interface is provided with a target setting input unit 762 for selecting and inputting a type of a target device to be inspected by a user operation. The target setting input unit 762 has a pull-down menu display in which names of test target types such as a motor, a compressor, a belt, and an arm are displayed as a list of target devices. When the user selects and inputs a predetermined target device in the target setting input unit 762, the sound data processing device sets a target device to be inspected, and sets a sound abnormality type corresponding to the target device. By using such a user interface, it is possible to improve the operability when setting the type of abnormality or the target device.

FIG. 14 is a flowchart showing a learning process performed by the sound data processing device according to the third embodiment. The sound data processing device uses the user interface 755 to input the setting of the target device (S51), and sets the abnormality type 756 according to the target device (S52). Then, the sound data processing device switches the operation mode in the normal sound processing unit 701, the abnormal sound selection unit 721, and the abnormal sound processing unit 722 according to the abnormal type 756, and processes the normal sound and the selection and processing of the abnormal sound. (S53). At this time, peak shift, filtering, level increase / decrease, mixing level setting, and the like are executed as normal sound and / or abnormal sound processing. A specific example of the process according to the type of abnormality will be described later. Subsequently, the sound data processing device performs a mixing process between the normal sound and the abnormal sound in the mixing unit 703 (S54), and generates and outputs simulated abnormal sound data 753 (S55).

FIG. 15 is a diagram illustrating a process of generating a simulated abnormal sound in case 1 of the abnormal type. 15A shows an example of a time waveform of a normal sound, and FIG. 15B shows an example of a time waveform of an abnormal sound. The horizontal axis represents time, and the vertical axis represents volume level. (C) shows an example of a frequency characteristic of a normal sound at a predetermined time, and (D) shows an example of a frequency characteristic of an abnormal sound at a predetermined time. The horizontal axis represents frequency, and the vertical axis represents signal level. In case 1, abnormal noise is added to the normal sound when a bearing error, a fan belt error, an abnormal contact of the drive system, or the like occurs. The illustrated example is an example in which a pulse-like sound is intermittently added to a normal sound, and in the frequency characteristic of an abnormal sound, the signal level increases in all bands like white noise. In some cases, an abnormal sound component may be added only to a predetermined frequency band (for example, around 1 kHz).

In case 1, the abnormal sound selection unit 721, the abnormal sound processing unit 722, and the mixing unit 703 mainly operate to execute processing for adding an abnormal sound to a normal sound. In the sound data processing device, the abnormal sound selection unit 721 selects appropriate abnormal sound data from the abnormal sound database 654, the abnormal sound processing unit 722 performs processing of the selected abnormal sound data, and sets a mixing level. As processing for processing abnormal sound data, processing such as peak shift is performed. Then, the normal sound data and the abnormal sound data are mixed according to the mixing level set by the mixing unit 703, and the simulated abnormal sound data 753 is output. Note that the normal sound processing unit 701 may appropriately process the normal sound data before mixing with the abnormal sound data.

FIG. 16 is a diagram illustrating a process of generating a simulated abnormal sound in case 2 of the abnormal type. In FIG. 16, (A) shows an example of a time waveform of a normal sound, and (B) shows an example of a time waveform of an abnormal sound. The horizontal axis represents time, and the vertical axis represents volume level. (C) shows an example of a frequency characteristic of a normal sound at a predetermined time, and (D) shows an example of a frequency characteristic of an abnormal sound at a predetermined time. The horizontal axis represents frequency, and the vertical axis represents signal level. In case 2, when the rotation speed changes due to an abnormality of the rotating body such as the motor, the peak frequency of the sound fluctuates, and the band of the frequency component where the peak occurs moves. In the illustrated example, the normal sound has a peak in the 4 kHz band, and the abnormal sound has a peak frequency fluctuating from 4 kHz to 2 kHz, a strong peak occurs in the 2 kHz band, and the 4 kHz peak disappears.

In case 2, the normal sound processing unit 701 and the mixing unit 703 mainly operate to execute a process of shifting the peak of the normal sound. In the sound data processing device, the normal sound processing unit 701 processes the normal sound data 651, varies the peak frequency of the normal sound data, and outputs simulated abnormal sound data 753. The mixing unit 703 may mix the abnormal sound data with the normal sound data after the peak shift.

FIG. 17 is a diagram illustrating a process of generating a simulated abnormal sound in case 3 of the abnormal type. In FIG. 17, (A) shows an example of a time waveform of a normal sound, and (B) shows an example of a time waveform of an abnormal sound. The horizontal axis represents time, and the vertical axis represents volume level. (C) shows an example of a frequency characteristic of a normal sound at a predetermined time, and (D) shows an example of a frequency characteristic of an abnormal sound at a predetermined time. The horizontal axis represents frequency, and the vertical axis represents signal level. In case 3, a change in the contact portion of the drive system occurs, and when the contact state changes, such as when a specific portion newly contacts or separates, a drop occurs in the peak frequency of the sound. In the illustrated example, the normal sound has a peak in a band around 2 kHz, and the abnormal sound has no peak near 2 kHz.

In case 3, the normal sound processing unit 701 and the mixing unit 703 mainly operate to execute processing for filtering the normal sound. In the sound data processing device, the normal sound processing unit 701 processes the normal sound data 651, attenuates a predetermined frequency in the normal sound data by a filter, and outputs simulated abnormal sound data 753. Note that the mixing unit 703 may mix abnormal sound data with filtered normal sound data.

In the case 4, when the friction of the rotating body such as a motor or the driving system such as a fan belt or a gear increases or decreases, the volume level rises or falls and fluctuates. For example, friction varies due to shortage or excess of grease injected between members, and the volume of sound of the target device increases or decreases.

In case 4, the normal sound processing unit 701 and the mixing unit 703 mainly operate to execute processing for increasing or decreasing the level of the normal sound. In the sound data processing device, the normal sound processing unit 701 processes the normal sound data 651, increases or decreases the volume level of the normal sound data by changing the filter gain, and outputs simulated abnormal sound data 753. The mixing unit 703 may mix the abnormal sound data with the normal sound data after the level adjustment.

As described above, in the present embodiment, different types of abnormalities are set depending on the type of target device or the like for which the machine learning of sound data is performed, and the respective processes are performed in accordance with the abnormal types to perform simulation. Generate abnormal abnormal sound. As a result, it is possible to generate a simulated abnormal sound in an abnormal state having different characteristics for each abnormal type, and it is possible to generate simulated abnormal sound data appropriate for each aspect of the abnormal type.

(Embodiment 4)
As the fourth embodiment, a functional configuration of a mode in which at least two of the first, second, and third embodiments described above are combined may be employed. In the fourth embodiment, for example, in an environment where only the target normal sound can be acquired, the sound data of the simulated abnormal sound is generated by the functional blocks of the second embodiment shown in FIG. 7, and the learning including the simulated abnormal sound is performed. Sound data of similar sounds is further generated by the functional blocks of Embodiment 1 shown in FIG. 2 based on the sound data for learning, and machine learning is performed using the sound data for learning including the simulated abnormal sound and the similar sounds. I do. In this way, a simulated abnormal sound and a similar sound are generated, the sound data for learning is inflated, machine learning using a large amount of learning data is enabled, and more accurate abnormal sound detection can be performed. I do.

Further, as a modified example, after the machine learning by the functional configuration according to any of the first, second, third, and fourth embodiments is once performed, additional learning is performed by adding sound data for learning, thereby performing further optimization. It is also possible to generate a learning model. For example, after generating a simulated abnormal sound by the functional blocks of the second embodiment and performing machine learning, if an actual abnormal sound can be obtained, additional learning using the obtained abnormal sound is performed. In this manner, additional learning or the like using similar abnormal sounds generated and used is executed. Alternatively, after similar sounds are generated by the functional blocks of the first embodiment and machine learning is performed, additional learning using additionally acquired normal sounds and abnormal sounds is performed, and further generation and execution of the simulated abnormal sounds of the second embodiment are performed. The additional learning and the like based on the data added by generation of the similar sound in the first mode are executed.

学習 By performing the combination of the padding processes of a plurality of types of learning data as described above, it is possible to generate a learning model using more appropriate learning data. Further, by combining additional learning using the acquired learning data, it is possible to generate a learning model using more appropriate learning data. Therefore, it is possible to improve the accuracy of the classification judgment such as the abnormal judgment result of the sound data using the learning model by the machine learning.

As described above, the sound data processing method according to the present embodiment includes the sound data including the

information processing devices

30 and 50 having the processing

units

301 and 501 for inputting and acquiring target sound data and processing the sound data. A sound data processing method performed by the processing device, wherein the similar environment generation unit 201 generates similar sound data 253 that is a similar sound similar to the target sound data 251 based on the acquired target sound data 251; A step of performing machine learning using the acquired target sound data 251 and the generated similar sound data 253 as sound data for learning in the unit 202, and generating a learning model 252 for performing classification determination on the target sound data. And Thereby, even when a large amount of learning data is not obtained, by generating and using similar sound data, a suitable amount of machine learning can be performed using a sufficient amount of appropriate learning data. A learning model can be generated. In addition, a classification model operation such as abnormal sound determination can be performed by a learning model generated using a sufficient amount of learning data, and the accuracy of classification determination regarding target sound data can be improved.

In the sound data processing method according to the present embodiment, in the step of generating similar sound data, a similar environment of the target sound data 251 is generated, and at least one of the frequency characteristic and the volume of the target sound data 251 is changed. Thus, a plurality of similar sound data 253 is generated. Thus, a plurality of similar sound data similar to the target sound data can be generated based on the target sound data acquired in the real environment. Further, by using the similar sound data based on the similar environment as the data for learning, it is possible to cope with an environmental change at the time of operation, and it is possible to improve the accuracy of classification determination regarding the target sound data.

In the sound data processing method according to the present embodiment, in the step of generating similar sound data, similar sound data 253 is generated using a filter that changes the frequency characteristic of the target sound data 251. This makes it possible to generate similar sound data relating to the target sound data by changing the frequency characteristics of the target sound data.

In the sound data processing method according to the present embodiment, in the step of generating similar sound data, the similar sound data is generated using a sound volume change parameter that changes the sound volume of the entire frequency band of the target sound data 251 or the sound volume of a specific frequency band. The sound data 253 is generated. This makes it possible to generate similar sound data relating to the target sound data by changing the volume of the entire frequency band of the target sound data or the volume of a specific frequency band.

In the sound data processing method according to the present embodiment, in the step of generating similar sound data, for the plurality of generated similar sound data 253, data in which learning contradiction occurs in machine learning is discarded. Thereby, for example, data in which learning inconsistency occurs, such as sound data of different labels having the same frequency, can be removed, and appropriate machine learning can be executed.

Further, in the sound data processing method according to the present embodiment, in the step of generating a learning model, a learning model for determining an abnormal sound of the target sound data and performing abnormal sound detection as a classification determination regarding the target sound data. 252 is generated. Thereby, machine learning is performed using a sufficient amount of appropriate learning data including the target sound data acquired in the real environment and the automatically generated similar sound data, and the abnormal sound detection based on the abnormal sound determination result is performed. A learning model can be generated.

In the sound data processing method according to the present embodiment, in the step of generating a learning model, the general-purpose sound database 254 storing general-purpose sound data including general-purpose sounds together with the target sound data 251 and the similar sound data 253 is learned. Machine learning using sound data for the As a result, machine learning is performed using a sufficient amount of appropriate learning data including general-purpose sound data, a more preferable learning model can be generated, and the accuracy of classification determination regarding target sound data can be improved.

The sound data processing device according to the present embodiment is a sound data processing device including

information processing devices

30 and 50 having

processing units

301 and 501 for inputting and acquiring target sound data and processing the sound data. The

processing units

301 and 501 include a similar environment generation unit 201 that generates similar sound data 253 that is a similar sound similar to the target sound data 251 based on the obtained target sound data 251, And a machine learning unit 202 that performs machine learning using the obtained similar sound data 253 as sound data for learning, and generates a learning model 252 for performing classification determination on target sound data. As a result, even when a large amount of learning data cannot be obtained, a suitable learning model for machine learning can be generated using a sufficient amount of appropriate learning data. The accuracy of the determination can be improved.

The program according to the present embodiment includes a step of acquiring target sound data in a sound data processing device including

information processing devices

30 and 50, which are computers, and a method similar to target sound data 251 based on acquired target sound data 251. Generating similar sound data 253 as a similar sound to perform, and performing machine learning using the acquired target sound data 251 and the generated similar sound data 253 as sound data for learning, and performing classification determination on the target sound data And a step of generating a learning model 252 for performing the following.

The sound data processing method according to the present embodiment is based on sound data in a sound data processing device including

information processing devices

30 and 50 having

processing units

301 and 501 for inputting and acquiring target sound data and processing the sound data. A method of generating simulated abnormal sound data 653 that becomes a simulated abnormal sound of the target by using the obtained normal sound data 651 of the target, and a simulated abnormal sound generated by the acquired normal sound data 651 Machine learning using the data 653 as sound data for learning, generating an learning model 652 for determining abnormal sounds in the target sound data and detecting abnormal sounds. As a result, even when learning data at the time of actual abnormality is not obtained, by generating and using the simulated abnormal sound data, a sufficient amount of appropriate learning data is used for machine learning. Suitable learning model can be generated. In addition, the operation of abnormal sound determination can be performed by a learning model generated by machine learning including simulated abnormal sound data, and the accuracy of abnormal sound detection for target sound data can be improved.

In the sound data processing method according to the present embodiment, in the step of generating the simulated abnormal sound data, the normal

sound processing units

601 and 701 execute data processing of the normal sound data 651. This makes it possible to generate simulated abnormal sound data by processing the acquired normal sound data.

The sound data processing method according to the present embodiment executes at least one of peak shift, filtering, and volume change of normal sound data as data processing. Thereby, it is possible to generate simulated abnormal sound data corresponding to each abnormal state such as a fluctuation in a peak frequency of a normal sound, a lack of a peak frequency, a change in volume, and the like.

Further, in the sound data processing method according to the present embodiment, in the step of generating simulated abnormal sound data, mixing is performed by using the normal sound data 651 and the abnormal sound data selected from the abnormal sound database 654 stored in advance. The unit 603 performs a mixing process of the normal sound data and the abnormal sound data to generate the simulated abnormal sound data 653. As a result, it is possible to generate simulated abnormal sound data by adding the normal sound data acquired in the real environment and the abnormal sound data prepared in advance and performing a mixing process.

Further, in the sound data processing method according to the present embodiment, in the step of generating the simulated abnormal sound data, the normal sound processing unit 601 performs normal sound data and abnormal sound data for performing the mixing process in the mixing unit 603. At least one of the data processing processes. This makes it possible to process the normal sound data acquired in the real environment and generate sound data for mixing for generating simulated abnormal sound data.

In the sound data processing method of the present embodiment, in the data processing, at least one of the removal of the specific frequency and the frequency shift in the normal sound data 651 is performed using a filter. As a result, the frequency characteristics of the normal sound data can be changed and processed to generate simulated abnormal sound data, or to generate data for mixing when generating the simulated abnormal sound data.

Further, in the sound data processing method according to the present embodiment, in the step of generating the simulated abnormal sound data, the abnormal type 756 is set, and according to the abnormal type 756, only the normal sound data or the normal sound data and the abnormal sound data are set. To generate simulated abnormal sound data. At this time, the abnormal type 756 may be set based on the type of the target sound data. Thereby, it is possible to generate the simulated abnormal sound in the abnormal state for each abnormal type, and to generate the simulated abnormal sound data appropriate for each aspect of the abnormal type.

In the sound data processing method according to the present embodiment, in the step of generating the simulated abnormal sound data, the abnormal sound selecting unit 602 causes the abnormal sound data from the abnormal sound database 654 for performing the mixing process in the mixing unit 603. Is performed. This makes it possible to generate mixing sound data for generating simulated abnormal sound data from the abnormal sound database stored in advance.

In the sound data processing method according to the present embodiment, in the selection process, suitable abnormal sound data is selected from the abnormal sound database 654 based on the type of the target sound data. This makes it possible to select abnormal sound data based on the type of target sound data, and extract appropriate mixing sound data for generating simulated abnormal sound data.

Further, in the sound data processing method according to the present embodiment, in the selection processing, it is determined whether or not the abnormal sound database 654 is used in accordance with the characteristics of the target sound data. Output data. Thus, when simulated abnormal sound data is generated from processed normal sound data without using the abnormal sound database, the appropriate simulated abnormal sound is output by outputting silent sound data for mixing as abnormal sound data. Data can be generated.

information processing devices

30 and 50 having

processing units

301 and 501 use the acquired target normal sound data 651 to generate a simulated abnormal sound data 653 that is a simulated abnormal sound of the target (the normal sound processing unit 601 and the abnormal sound processing unit 601). Using the sound selection unit 602, the mixing unit 603), the acquired normal sound data 651 and the generated simulated abnormal sound data 653 as sound data for learning, machine learning is performed, and an abnormal sound of the target sound data is determined. And a machine learning unit 604 that generates a learning model 652 for performing abnormal noise detection. As a result, a suitable learning model for machine learning can be generated using a sufficient amount of appropriate learning data, even when actual learning data at the time of abnormality is not obtained. It is possible to improve the accuracy of data abnormality detection.

The program according to the present embodiment includes a step of acquiring target sound data in a sound data processing apparatus including the

information processing apparatuses

30 and 50 which are computers, and simulating the target by using the obtained target normal sound data 651. Generating simulated abnormal sound data 653 that becomes a typical abnormal sound, and performing machine learning using the acquired normal sound data 651 and the generated simulated abnormal sound data 653 as sound data for learning. Generating a learning model 652 for determining abnormal sounds and detecting abnormal sounds.

Although various embodiments have been described with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It is obvious that those skilled in the art can conceive various changes or modifications within the scope of the claims, and these naturally belong to the technical scope of the present invention. I understand. Further, the components in the above embodiment may be arbitrarily combined without departing from the spirit of the invention.

Further, the present disclosure supplies a program realizing the functions of the sound data processing method and the sound data processing apparatus according to the above-described embodiment to an information processing apparatus which is a computer via a network or various storage media, and The program that is read and executed by the processor of the processing device, and the recording medium on which the program is stored may be applicable.

This application is based on Japanese Patent Applications (No. 2018-144436 and No. 2018-144337) filed on Jul. 31, 2018, the contents of which are incorporated herein by reference. .

The present disclosure is useful as a sound data processing method, a sound data processing device, and a program that can generate a suitable learning model using appropriate learning data when performing machine learning of data.

Reference Signs List 10 microphone 20 AD converter 30 information processing device (terminal device)
40 communication path 50 information processing device (server device)
201 Similar

environment generation unit

202, 604 Machine learning unit 251

Target sound data

252, 452, 652 Learning model 253 Similar sound data 254 General-

purpose sound database

301, 501 Processing unit 302, 502

Storage unit

303, 503

Storage unit

304, 504 Communication interface 401 Judgment unit 451 Test sound data 453 Judgment result 454 Abnormal judgment result 601 Normal sound processing unit 602 Abnormal sound selection unit 603 Mixing unit 651 Normal sound data 653 Simulated abnormal sound data 654 Abnormal sound database

Claims

A sound data processing method in a sound data processing device having a processing unit for inputting and acquiring target sound data and performing processing of the sound data,
Using the acquired normal sound data of the target, generating simulated abnormal sound data to be a simulated abnormal sound of the target,
A learning model for performing machine learning using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning, and determining abnormal sounds in the target sound data to detect abnormal sounds. Generating
A sound data processing method comprising:
The sound data processing method according to claim 1,
In the step of generating the simulated abnormal sound data,
Performing data processing of the normal sound data;
Sound data processing method.
The sound data processing method according to claim 2, wherein
As the data processing, at least one of peak shift, filtering, and volume change of the normal sound data is executed.
Sound data processing method.
The sound data processing method according to claim 1,
In the step of generating the simulated abnormal sound data,
Using the normal sound data and abnormal sound data selected from an abnormal sound database held in advance,
Performing a mixing process of the normal sound data and the abnormal sound data to generate the simulated abnormal sound data;
Sound data processing method.
The sound data processing method according to claim 4, wherein
In the step of generating the simulated abnormal sound data,
For performing the mixing process, execute at least one data processing process of the normal sound data and the abnormal sound data,
Sound data processing method.
The sound data processing method according to claim 5, wherein
In the data processing,
Removal of a specific frequency in the normal sound data using a filter, performing at least one process of frequency shift,
Sound data processing method.
The sound data processing method according to any one of claims 1 to 6, wherein
In the step of generating the simulated abnormal sound data,
Set an abnormal type, according to the abnormal type, only the normal sound data, or perform processing using the normal sound data and abnormal sound data, to generate the simulated abnormal sound data,
Sound data processing method.
The sound data processing method according to claim 5, wherein:
In the step of generating the simulated abnormal sound data,
Executing a process of selecting the abnormal sound data for performing the mixing process;
Sound data processing method.
9. The sound data processing method according to claim 8, wherein
In the selection process,
Based on the type of the target sound data, select abnormal sound data that matches from the abnormal sound database,
Sound data processing method.
The sound data processing method according to claim 8, wherein:
In the selection process,
Determine the presence or absence of using the abnormal sound database according to the characteristics of the target sound data, and if the abnormal sound database is not used, output silent sound data.
Sound data processing method.
A sound data processing device having a processing unit for inputting and acquiring target sound data and performing processing of the sound data,
The processing unit includes:
Using the acquired normal sound data of the target, a simulated abnormal sound generation unit that generates simulated abnormal sound data that is a simulated abnormal sound of the target,
A learning model for performing machine learning using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning, and determining abnormal sounds in the target sound data to detect abnormal sounds. A machine learning unit that generates
A sound data processing device comprising:
Computer sound data processing device,
Obtaining sound data of the target;
Using the acquired normal sound data of the target, generating simulated abnormal sound data to be a simulated abnormal sound of the target,
A learning model for performing machine learning using the acquired normal sound data and the generated simulated abnormal sound data as sound data for learning, and determining abnormal sounds in the target sound data to detect abnormal sounds. Generating
A program for executing
A sound data processing method in a sound data processing device having a processing unit for inputting and acquiring target sound data and performing processing of the sound data,
Based on the acquired target sound data, generating similar sound data that is a similar sound similar to the target sound data,
Performing machine learning using the acquired target sound data and the generated similar sound data as sound data for learning, and generating a learning model for performing classification determination on the target sound data;
A sound data processing method comprising:
The sound data processing method according to claim 13,
In the step of generating the similar sound data,
A similar environment of the target sound data is generated, and a plurality of the similar sound data is generated by changing at least one of a frequency characteristic and a volume of the target sound data,
Sound data processing method.
The sound data processing method according to claim 14,
In the step of generating the similar sound data,
Generating the similar sound data using a filter that changes the frequency characteristic of the target sound data,
Sound data processing method.
The sound data processing method according to claim 14,
In the step of generating the similar sound data,
Generating the similar sound data using a volume change parameter that changes the volume of the entire frequency band of the target sound data, or a specific frequency band,
Sound data processing method.
The sound data processing method according to any one of claims 14 to 16, wherein
In the step of generating the similar sound data,
For the plurality of similar sound data generated, discard data that causes learning contradiction in the machine learning,
Sound data processing method.
The sound data processing method according to claim 13,
In the step of generating the learning model,
As a classification determination regarding the target sound data, to generate a learning model for performing an abnormal sound detection by determining an abnormal sound of the target sound data,
Sound data processing method.
The sound data processing method according to claim 13,
In the step of generating the learning model,
Along with the target sound data and the similar sound data, machine learning is performed using a general-purpose sound database storing general-purpose sound data including general-purpose sounds as the learning sound data,
Sound data processing method.
A sound data processing device having a processing unit for inputting and acquiring target sound data and performing processing of the sound data,
The processing unit includes:
Based on the acquired target sound data, a similar environment generation unit that generates similar sound data that is a similar sound similar to the target sound data,
A machine learning unit that performs machine learning using the acquired target sound data and the generated similar sound data as sound data for learning, and generates a learning model for performing classification determination on the target sound data,
A sound data processing device comprising:
Computer sound data processing device,
Obtaining sound data of the target;
Based on the acquired target sound data, generating similar sound data that is a similar sound similar to the target sound data,
Performing machine learning using the acquired target sound data and the generated similar sound data as sound data for learning, and generating a learning model for performing classification determination on the target sound data;
A program for executing