CN109983311B

CN109983311B - Degraded portion estimation device, degraded portion estimation system, and degraded portion estimation method

Info

Publication number: CN109983311B
Application number: CN201680090835.8A
Authority: CN
Inventors: 阿部芳春
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2016-11-22
Filing date: 2016-11-22
Publication date: 2021-03-19
Anticipated expiration: 2036-11-22
Also published as: WO2018096582A1; JP6173649B1; CN109983311A; JPWO2018096582A1

Abstract

A deteriorated part estimation device (6) is provided with: a short-time Fourier transform unit (31) for acquiring a plurality of signals related to the inspection target deviceMicrophone (13)₁～13₃) Calculating time series of short-time Fourier transform coefficients corresponding to the respective microphone signals (DS1, DS2, DS3) for the respective microphone signals (DS1, DS2, DS 3); a time-series array generation unit (32) that generates a time-series array for input to the neural network using the time series of the short-time Fourier transform coefficients; a neural network unit (35) which is configured from a neural network, receives the input of the time-series array, and outputs a degradation degree corresponding to a test target site in the test target device; and a determination unit (36) that determines a degraded portion in the device under inspection using the degradation degree.

Description

Degraded portion estimation device, degraded portion estimation system, and degraded portion estimation method

Technical Field

The present invention relates to a degraded portion estimation device, a degraded portion estimation system, and a degraded portion estimation method that estimate a degraded portion in an apparatus as an inspection target.

Background

Conventionally, there has been developed a defective portion estimation device that estimates a portion of a defect (hereinafter referred to as a defective portion) in an apparatus to be inspected (hereinafter referred to as an "inspection target apparatus") based on a sound generated from the apparatus to be inspected. Also, a deterioration part estimation apparatus has been developed which estimates a part of deterioration in an inspection target apparatus (hereinafter referred to as a "deterioration part") from a sound generated from the inspection target apparatus.

For example, a failure site estimation device of patent document 1 includes: a sound collector (1) which is provided in a mobile body and collects operating sounds generated from the mobile body and devices located in the vicinity of the range of motion of the mobile body; a reference sample sequence analyzing means (2, 3, 4) for analyzing the collected normal operating sound to obtain a reference sample sequence (104); target sample sequence analyzing means (2, 3, 4) for analyzing the collected operation sound of the diagnostic object to obtain a target sample sequence (105); a variation curve generation unit (6) that obtains a variation sequence between the reference sample sequence (104) and the target sample sequence (105) and generates a variation curve (106); a shape feature extraction unit (7) that extracts shape features (107) of a variation curve (106); and a matching determination unit (9) that matches the shape feature (107) with the template and determines a failure portion caused in the moving body and the device located in the vicinity of the moving range of the moving body (see the abstract and fig. 1 of patent document 1).

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open publication No. 2014-105075

Disclosure of Invention

Problems to be solved by the invention

The failure site estimation device of patent document 1 acquires sounds generated at each site in an examination target apparatus using 1 microphone (sound collector 1) provided in a mobile body. Thus, when a plurality of locations are arranged apart from each other along the moving direction of the moving object, it is possible to recognize at which location the acquired sound is generated. However, when a plurality of locations are arranged apart from each other along a plane perpendicular to the moving direction of the moving object, it is not possible to identify at which location the acquired sound is generated. That is, the defective portion estimation device of patent document 1 has a problem that the resolution with respect to the sound arrival direction is low and the estimation accuracy of the defective portion is low.

The same problem also exists in a degraded portion estimating apparatus.

The present invention has been made to solve the above-described problems, and an object thereof is to provide a degraded portion estimation device, a degraded portion estimation system, and a degraded portion estimation method that can accurately estimate a degraded portion in an inspection target apparatus.

Means for solving the problems

The deterioration part estimation device of the present invention includes: a short-time Fourier transform unit that acquires microphone signals corresponding to a plurality of microphones provided in an inspection target device, and calculates a time series of short-time Fourier transform coefficients corresponding to the respective microphone signals; a time-series array generating unit that generates a plurality of time-series arrays for input to the neural network using the time series of the short-time fourier transform coefficients; a neural network unit configured from a neural network, receiving an input of the time-series array, and outputting a degradation degree corresponding to an inspection target portion in the inspection target device; and a determination unit that determines a degraded portion in the inspection target apparatus using the degradation degree.

Effects of the invention

According to the present invention, since the structure is as described above, it is possible to estimate the deteriorated portion in the inspection target apparatus with high accuracy.

Drawings

Fig. 1A is an explanatory diagram showing essential parts of an inspection target apparatus according to embodiment 1 of the present invention.

Fig. 1B is an explanatory diagram showing essential parts of a deterioration part estimation system according to embodiment 1 of the present invention.

Fig. 1C is a hardware configuration diagram showing essential parts of the deterioration part estimation device according to embodiment 1 of the present invention.

Fig. 2 is a functional block diagram showing essential parts of the deterioration part estimation device according to embodiment 1 of the present invention.

Fig. 3A is an explanatory diagram showing a cross-correlation spectrum generated by the cross-correlation operation unit in embodiment 1 of the present invention. Fig. 3B is an explanatory diagram showing an autocorrelation spectrum generated by the autocorrelation calculating unit in embodiment 1 of the present invention.

Fig. 4 is an explanatory diagram showing a configuration of a neural network in the neural network unit according to embodiment 1 of the present invention.

Fig. 5 is an explanatory diagram showing a three-dimensional array in an input layer within the neural network shown in fig. 4.

Fig. 6 is an explanatory diagram showing a three-dimensional array in a convolutional layer within the neural network shown in fig. 4.

Fig. 7 is a flowchart showing the operation of the deteriorated region estimation device according to embodiment 1 of the present invention.

Fig. 8 is a flowchart showing the detailed operation of the short-time fourier transform unit according to embodiment 1 of the present invention.

Fig. 9 is a flowchart showing the detailed operation of the cross-correlation calculation unit according to embodiment 1 of the present invention.

Fig. 10 is a flowchart showing the detailed operation of the autocorrelation calculating unit in embodiment 1 of the present invention.

Fig. 11 is a flowchart showing the detailed operation of the neural network unit according to embodiment 1 of the present invention.

Fig. 12 is a flowchart showing the detailed operation of the judgment unit in embodiment 1 of the present invention.

Detailed Description

Hereinafter, embodiments for carrying out the present invention will be described in more detail with reference to the accompanying drawings.

Embodiment mode 1

Fig. 1A is an explanatory diagram showing essential parts of an inspection target apparatus according to embodiment 1 of the present invention. Fig. 1B is an explanatory diagram showing essential parts of a deterioration part estimation system according to embodiment 1 of the present invention. Fig. 1C is a hardware configuration diagram showing essential parts of the deterioration part estimation device according to embodiment 1 of the present invention. A deterioration part estimation system 100 according to embodiment 1 will be described with reference to fig. 1.

In the figure, 1 is an elevator. An elevator 1 has a hoistway 2. A car 3 and a plurality of movable members not shown are provided in the hoistway 2. The elevator 1 is installed in a building, not shown, and the car 3 is vertically movable along the hoistway 2 between the uppermost floor and the lowermost floor of the building. The elevator 1 constitutes an inspection target device.

A microphone array device 4, an audio interface device 5, and a deterioration part estimation device 6 are provided on the ceiling of the car 3. The microphone array device 4, the audio interface device 5, and the degradation portion estimation device 6 constitute essential parts of the degradation portion estimation system 100.

The microphone array device 4 has a substantially cylindrical base 11 and 3 microphones 13 mounted on a substantially circular mounting surface 12 of the base 11₁～13₃. Microphone 13₁～13₃The sounds generated by the elevators 1 are acquired separately. Microphone 13₁～13₃Analog signals AS1, AS2, and AS3 corresponding to the acquired waveforms of the sounds are output, respectively.

Here, the microphone 13₁～13₃Are arranged at substantially equal intervals along the circumferential portion of the mounting surface 12. Namely, a microphone 13₁～13₃The placement surface 12, which is substantially perpendicular to the direction of movement of the car 3, is disposed at a position corresponding to each vertex of a regular triangle, not shown. Thus, the microphone 13₁～13₃The interval therebetween is set to the maximum value in the mounting surface 12.

The audio interface device 5 uses, for example, an analog-to-digital conversion circuit corresponding to a multichannel input. Audio interface device 5 obtains microphone 13₁～13₃The output analog signals AS1, AS2, and AS3 convert the analog signals AS1, AS2, and AS3 into digital signals DS1, DS2, and DS3, respectively. The audio interface device 5 outputs the digital signals DS1, DS2, and DS3 to the degradation site estimation device 6.

Specifically, for example, the audio interface device 5 converts the analog signals AS1, AS2, AS3 into digital signals DS1, DS2, DS3 by way of linear PCM (Pulse Code Modulation). At this time, the sampling frequency is set to, for example, 48 kilohertz (kHz), and the number of quantization bits is set to, for example, 16 bits. As a result of this conversion, the digital signals DS1, DS2, and DS3 become signals each composed of a plurality of frames.

The deteriorated portion estimation device 6 is constituted by a Computer such as a Personal Computer (PC). The computer has an input-output interface 21, a processor 22 and a memory 23.

The input/output interface 21 is constituted by, for example, a USB (Universal Serial Bus) terminal, a LAN (Local Area Network) terminal, and the like. The USB terminal is connected to the audio interface device 5 so as to be able to communicate with the audio interface device. The LAN terminal is connected to an unillustrated control device for the elevator 1 through an unillustrated LAN cable so as to be communicable. The control device uses the signal output from the computer in controlling the operation of the elevator 1.

The memory 23 temporarily stores the digital signals DS1, DS2, DS3 output from the audio interface device 5. In addition, the memory 23 stores a program for causing the computer to function as the short-time fourier transform unit 31, the time-series array generating unit 32, the neural network unit 35, and the determination unit 36 shown in fig. 2. The processor 22 reads and executes the program stored in the memory 23, thereby realizing the functions of the short-time fourier transform unit 31, the time-series array generating unit 32, the neural network unit 35, and the determination unit 36 shown in fig. 2. In addition, the memory 23 appropriately stores data such as various values calculated by the program.

The Processor 22 is constituted by, for example, a CPU (Central Processing Unit), a microprocessor, a microcontroller, a DSP (Digital Signal Processor), or the like. The Memory 23 is configured by a semiconductor Memory such as a RAM (Random Access Memory), a ROM (Read Only Memory), a flash Memory, an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), or the like, a magnetic disk, an optical magnetic disk, or the like.

Fig. 2 is a functional block diagram showing essential parts of the deterioration part estimation device according to embodiment 1 of the present invention. Fig. 3A is an explanatory diagram showing a cross-correlation spectrum generated by the cross-correlation operation unit in embodiment 1 of the present invention. Fig. 3B is an explanatory diagram showing an autocorrelation spectrum generated by the autocorrelation calculating unit in embodiment 1 of the present invention. The degraded portion estimation device 6 will be described with reference to fig. 2 and 3.

The short-time fourier transform unit 31 acquires digital signals (hereinafter referred to as "microphone signals") DS1, DS2, and DS3 output from the audio interface device 5. The Short-time Fourier Transform unit 31 calculates time series of Short-time Fourier Transform coefficients corresponding to the microphone signals DS1, DS2, and DS3 by performing Short-time Fourier transforms (STFTs) on the microphone signals DS1, DS2, and DS3, respectively. That is, each time series is composed of a plurality of short-time fourier transform coefficients, and each short-time fourier transform coefficient represents a complex spectrum.

The time-series array generator 32 generates a time-series array for input to the neural network unit 35 (hereinafter, simply referred to as a "time-series array") using the time series of the short-time fourier transform coefficients calculated by the short-time fourier transform unit 31. In the example shown in fig. 2, the time-series array generator 32 includes a cross-correlation calculator 33 and an auto-correlation calculator 34.

The cross-correlation operation unit 33 generates a time-series array based on the cross-correlation spectrum between the microphone signals DS1 and DS2 by using the time-series of short-time fourier transform coefficients corresponding to the microphone signal DS1 and the time-series of short-time fourier transform coefficients corresponding to the microphone signal DS 2. The cross-correlation operation unit 33 generates a time-series array based on the time series of the cross-correlation spectrum between the microphone signals DS2 and DS3 by using the time series of the short-time fourier transform coefficients corresponding to the microphone signal DS2 and the time series of the short-time fourier transform coefficients corresponding to the microphone signal DS 3. Further, the cross-correlation operation unit 33 generates a time-series array based on the time series of the cross-correlation spectrum between the microphone signals DS3 and DS1 by using the time series of the short-time fourier transform coefficients corresponding to the microphone signal DS3 and the time series of the short-time fourier transform coefficients corresponding to the microphone signal DS 1.

Hereinafter, a time-series array based on a time series of cross-correlation spectra will be referred to as a "cross-correlation spectrogram". Each cross-correlation spectrum is represented by a two-dimensional array in which a first dimension is set to a direction corresponding to a frequency (hereinafter referred to as a "frequency direction") and a second dimension is set to a direction corresponding to a frame (hereinafter referred to as a "frame direction"). Fig. 3A shows an example of a cross-correlation spectrogram. The cross-correlation spectrum shown in FIG. 3A is represented by a two-dimensional array of (W/2+1) rows and columns, T. Here, T is the number of frames constituting the microphone signals DS1, DS2, DS3, respectively, and W is the frame length of each frame.

The autocorrelation calculating unit 34 generates a time-series array based on a time series of autocorrelation spectra in the microphone signal DS1, using a time series of short-time fourier transform coefficients corresponding to the microphone signal DS 1. The autocorrelation calculating unit 34 generates a time-series array based on a time series of autocorrelation spectra in the microphone signal DS2, using a time series of short-time fourier transform coefficients corresponding to the microphone signal DS 2. Further, the autocorrelation calculating unit 34 generates a time-series array based on the time series of the autocorrelation spectrum in the microphone signal DS3, using the time series of the short-time fourier transform coefficients corresponding to the microphone signal DS 3.

Hereinafter, a time-series array based on a time series of autocorrelation spectra will be referred to as an "autocorrelation spectrum map". Each autocorrelation spectrum is represented by a two-dimensional array in which the first dimension is set as the frequency direction and the second dimension is set as the frame direction. Fig. 3B shows an example of an autocorrelation spectrum. The autocorrelation spectrum shown in FIG. 3B is represented by a two-dimensional array of (W/2+1) rows and T columns.

The neural network unit 35 uses a so-called "neural network". The neural network in the neural network unit 35 (hereinafter, sometimes simply referred to as "neural network") is a hierarchical type, and has an input layer, an intermediate layer, and an output layer. Each layer within the neural network is made up of a plurality of cells.

The input layer receives a plurality of time-series arrays generated by the time-series array generating unit 32. Specifically, for example, the input layer receives inputs of 3 cross-correlation spectrograms generated by the

cross-correlation operation unit

33 and 3 autocorrelation spectrograms generated by the autocorrelation operation unit 34.

A plurality of cells included in the output floor (hereinafter referred to as "output cells") correspond one-to-one to at least some of a plurality of locations in the elevator 1 (hereinafter referred to as "inspection target locations"). Each inspection target site corresponds to, for example, any one of the car 3, a movable member, and other various devices provided in the hoistway 2.

When a time-series array is input to the input layer, the neural network is trained in advance using dedicated training data such that the output value (so-called "activity degree") of each output unit indicates the degree of deterioration (hereinafter, referred to as "deterioration degree") in the corresponding examination target region. That is, the neural network unit 35 outputs the degradation degree corresponding to each of the inspection target sites to the input of the time-series array generated by the time-series array generating unit 32.

The determination unit 36 determines whether or not the plurality of inspection target portions include a deterioration portion, that is, whether or not the elevator 1 has a deterioration portion, using the deterioration degree output from the neural network unit 35. When determining that there is a deteriorated portion, the determination unit 36 outputs a determination result indicating the deteriorated portion and a degree of deterioration corresponding to the deteriorated portion. On the other hand, when determining that there is no degraded portion, the determination unit 36 outputs a determination result indicating that fact. The determination result output by the determination unit 36 is input to a control device, not shown, for the elevator 1, for example.

The short-time fourier transform unit 31, the time-series array generation unit 32, the neural network unit 35, and the determination unit 36 constitute essential parts of the degradation portion estimation device 6.

Fig. 4 is an explanatory diagram showing a configuration of a neural network in the neural network unit according to embodiment 1 of the present invention. Fig. 5 is an explanatory diagram showing a three-dimensional array in an input layer within the neural network shown in fig. 4. Fig. 6 is an explanatory diagram showing a three-dimensional array in a convolutional layer within the neural network shown in fig. 4. The neural network in the neural network unit 35 will be described with reference to fig. 4 to 6.

As shown in fig. 4, the neural network has an input layer, an intermediate layer, and an output layer. In the example shown in fig. 4, the intermediate layer is composed of 1 convolutional layer and 2 nonlinear layers. The number of layers in the neural network is L +1, and each layer is assigned with a continuous number L of 0-L. That is, in the example shown in fig. 4, L is 4.

Each layer within the neural network is made up of a plurality of cells. In the figure, white circles represent the respective cells. Each of the cells included in each layer is combined with a plurality of cells that are at least a part of the plurality of cells included in the previous layer for that layer. In addition, the cells in the layers of the respective layers are not bonded to each other. The degree of bonding between the cells is represented by the load factor W. Furthermore, each cell has a bias B related to the binding.

In the present specification, an arbitrary object is expressed by a capital letter of 1 character in principle, and an arbitrary element of the object is expressed by a lower case letter (i.e., a subscript) of 1 character or more following the capital letter, following the so-called "subscript notation". For example, in each layer in the neural network, an index indicating a cell included in the layer is denoted by k, and an index indicating a cell included in a layer preceding the layer is denoted by i. The load factor W indicating the degree of bonding between the unit k included in the layer l and the unit i included in the layer i preceding the layer l is referred to as "Wlik". The cell k contained in layer l has a bias B denoted as "Blk".

However, as a result of substituting a value of a corresponding element for a subscript, the subscript may contain a capital letter, a numeral, or a symbol. For example, in fig. 4, a load coefficient Wlik indicating the degree of bonding between a cell k included in the output layer L (L ═ L) and a cell i included in the nonlinear layer L (L ═ L-1) is referred to as "Wlik". The load coefficient Wlik indicating the degree of bonding between the unit k included in the nonlinear layer L (L ═ L-1) and the unit i included in the nonlinear layer L (L ═ 2) is referred to as "WL-1 ik".

The input layer has a structure in which a plurality of surfaces (hereinafter referred to as "two-dimensional array surfaces") in which a plurality of cells are arranged along first and second dimensions that are orthogonal to each other, that is, a structure in which a three-dimensional array in which the stacking direction of the two-dimensional array surfaces is the third dimension is laminated. The two-dimensional array plane included in the input layer corresponds one-to-one to the plurality of time-series arrays input to the neural network unit 35. That is, a corresponding time-series array is input to each two-dimensional array plane included in the input layer.

Fig. 5 shows an example of a three-dimensional array in the input layer. In the example shown in fig. 5, the input layer has a three-dimensional array structure in which two-dimensional array planes of (M + COMBINATIONS (M, 2)) rows and T columns are stacked. Here, M is the microphone 13 that the microphone array device 4 has₁～13₃The number of COMBINATIONS, COMBINATIONS (M, 2) is a function of the number of COMBINATIONS of 2 out of M. In embodiment 1, M is 3, and the number of COMBINATIONS based on COMBINATIONS (M, 2) is 3. Any one of the 3 cross-correlation spectrograms generated by the cross-correlation operation unit 33 or any one of the 3 autocorrelation spectrograms generated by the autocorrelation operation unit 34 is input to each two-dimensional array plane.

The convolution layer performs a convolution operation for the three-dimensional region Ω gtf within the input layer. As shown in fig. 4, the three-dimensional region Ω gtf includes a plurality of cells. Here, the subscript g is an index indicating a two-dimensional array plane on which the cells in the three-dimensional region Ω are arranged, the subscript t is an index indicating a frame corresponding to the cells in the three-dimensional region Ω, and the subscript f is an index indicating a frequency corresponding to the cells in the three-dimensional region Ω.

Fig. 6 shows an example of a three-dimensional array in a convolutional layer. In the example shown in fig. 6, the buildup layer has a three-dimensional array structure in which two-dimensional array surfaces of D ((W/2+1)/DF) rows and (T/DT) columns are stacked. That is, each two-dimensional array plane included in the convolutional layer has a shape configured by down-sampling each two-dimensional array plane included in the input layer with respect to the frame direction and the frequency direction. Here, DT is an interval for the frame direction, and DF is an interval for the frequency direction. If the value of DT is 1, no thinning in the frame direction is performed, and if the value of DT is D (D is an integer of 2 or more), it means that down-sampling is performed at D-1 intervals. Similarly, if the value of DF is 1, the thinning-out in the frequency direction is not performed, and if the value of DF is D (D is an integer of 2 or more), the down-sampling is performed at D-1 intervals. The two-dimensional array surfaces included in the convolution layer correspond one-to-one to the features D (D is 1, …, D) obtained by convolution operation.

In the middle of the processing by the neural network unit 35, the array shape of the cells in the convolutional layer is converted from a three-dimensional array to a one-dimensional array (i.e., a vector). Each layer (L2, …, L) after the build-up layer has a structure based on a one-dimensional array of a plurality of cells.

The 2 nonlinear layers each have a configuration based on a one-dimensional array of a plurality of cells. Each cell included in the non-linear layer has a non-linear output function. The output function is, for example, an S-type (SIGMOID) function or a TANH function.

The output layer has a configuration based on a one-dimensional array of a plurality of output cells. As described above, the output means corresponds one-to-one to the inspection target portion in the elevator 1. The neural network is trained in advance so that the activity level of each output unit indicates the degree of deterioration in the corresponding examination target region.

Next, the operation of the degraded portion estimating device 6 will be described with reference to the flowchart of fig. 7.

During operation of the elevator 1, the audio interface device 5 continuously performs the following processing: acquisition microphone 13₁～13₃The output analog signals AS1, AS2 and AS3 convert the analog signals AS1, AS2 and AS3 into digital signals DS1, DS2 and DS3,the digital signals DS1, DS2, and DS3 are output to the degradation portion estimation device 6. The digital signals DS1, DS2, DS3 output from the audio interface device 5 are temporarily stored in the memory 23. The degraded portion estimating apparatus 6 executes the following processing of steps ST1 to ST4 with respect to the microphone signals DS1, DS2, DS3, which are the digital signals DS1, DS2, DS3 stored in the memory 23.

First, in step ST1, the short-time fourier transform unit 31 acquires the microphone signals DS1, DS2, and DS3 stored in the memory 23. The short-time fourier transform unit 31 performs short-time fourier transforms on the microphone signals DS1, DS2, and DS3, respectively, thereby calculating time series of short-time fourier transform coefficients corresponding to the microphone signals DS1, DS2, and DS3, respectively. Each time series is composed of a plurality of short-time fourier transform coefficients, and each short-time fourier transform coefficient represents a complex spectrum.

Next, in step ST2, the time-series array generator 32 generates a plurality of time-series arrays using the time series of short-time fourier transform coefficients calculated in step ST1 by the short-time fourier transform unit 31. Specifically, for example, the time-series array generator 32 generates a cross-correlation spectrum between the microphone signals DS1 and DS2, a cross-correlation spectrum between the microphone signals DS2 and DS3, a cross-correlation spectrum between the microphone signals DS3 and DS1, an autocorrelation spectrum in the microphone signal DS1, an autocorrelation spectrum in the microphone signal DS2, and an autocorrelation spectrum in the microphone signal DS 3.

Next, at step ST3, the neural network unit 35 receives the input of the plurality of time-series arrays generated at step ST2 by the time-series array generating unit 32. The neural network unit 35 outputs the degradation degree corresponding to each of the inspection target portions to the input of the time-series array.

Next, in step ST4, the determination unit 36 determines whether or not there is a deterioration part in the elevator 1 using the deterioration degree output by the neural network unit 35 in step ST 3. When determining that there is a deteriorated portion, the determination unit 36 outputs a determination result indicating the deteriorated portion and a degree of deterioration corresponding to the deteriorated portion. On the other hand, when determining that there is no degraded portion, the determination unit 36 outputs a determination result indicating that fact.

Next, details of the processing of step ST1 shown in fig. 7, that is, the processing of the short-time fourier transform unit 31 will be described with reference to the flowchart of fig. 8.

First, the short-time fourier transform unit 31 acquires the microphone signals DS1, DS2, and DS3 stored in the memory 23 (step ST 11).

Then, the short-time fourier transform unit 31 selects the microphone 13₁～13₃Any 1 microphone (step ST 12). In the following, a number m (m is 0. ltoreq. m) representing the selected microphone is set<An integer of M). The short-time fourier transform unit 31 executes the following processing of steps ST13 to ST18 on the microphone signal corresponding to the selected microphone m. Here, as an example, it is assumed that the microphone 13 corresponding to m 0 is selected₁。

The microphone signal corresponding to the microphone m is composed of T frames, and the number T indicating each frame is assumed. Note that the signal sequence in the microphone signal is denoted as g (m, i). Here, i represents a discrete time index in the signal sequence g, and is an integer satisfying 0 ≦ i < NS (NS is the number of sound collection samples). The short-time fourier transform unit 31 performs the following processing of steps ST14 to T18 for each frame T (step ST 13).

First, the short-time fourier transform unit 31 is used for the selected microphone 13₁The corresponding microphone signal DS1 shifts the frame position by the LF point every time the frame number t is increased by 1 (step ST14), and cuts out a frame at the frame position corresponding to the frame number t (step ST 15). The processing in steps ST14 and ST15 is realized by the following equation (1), for example.

frame(i)

＝g(m，t×LF+i)(i＝0、1、2、…、W-1)(1)

Here, W is a frame length of each frame, and LF is a movement amount of the frame. In embodiment 1, W is 1024 and LF is 512.

Next, the short-time fourier transform unit 31 multiplies the signal representing the clipped frame by a time window (step ST 16). The processing of step ST16 is realized by, for example, the following equation (2).

x(i)

＝frame(i)×window(i)(i＝0、1、2、…、W-1)

window(i)

＝0.54-0.46cos(2πi/(W-1))(2)

Here, window (i) is a function of the time window. In the example of the above equation (2), a hamming window function is used.

Next, the short-time Fourier Transform unit 31 performs Fast Fourier Transform (FFT) on the sequence x (i) obtained by multiplying the window function (step ST 17). The processing of step ST17 is realized by, for example, the following equation (3).

Zf＝FFT(x[0：W])(f＝0、1、2、…、W-1)(3)

Here, Zf represents a complex spectrum, x [ 0: w represents the sequence [ x (0), x (1), …, x (W-1) ]. Further, FFT (x) represents an FFT operation function that performs a fast fourier transform for the sequence x.

Next, the short-time fourier transform unit 31 separates the real part and imaginary part of the complex spectrum Zf and stores the separated parts in the memory 23 (step ST 18). The processing of step ST18 is realized by, for example, the following equation (4).

Xmtf＝Re(Zf)(0≤f≤W/2)

Ymtf＝Im(Zf)(0≤f≤W/2)(4)

Here, Xmtf denotes the real part of the complex spectrum Zf in the microphone m, frame t, frequency f, and Ymtf denotes the imaginary part of the complex spectrum Zf in the microphone m, frame t, frequency f. Re (c) is a function of extracting the real part a of the complex number c ═ a + bj, and im (c) is a function of extracting the imaginary part b of the complex number c ═ a + bj. In addition, j is an imaginary unit, which is the square root of-1 (i.e., j x 2 is-1, where x is the power operator).

In addition, as shown in equation (3) above, the input to the FFT operation is a real number sequence, and therefore, the real part of Zf has even symmetry Xmtf-XmtW-f with respect to f, and the imaginary part of Zf has odd symmetry Ymtf-YmtW-f with respect to f. Therefore, the short-time Fourier transform unit 31 omits redundant f > W/2 and stores only the part where f is 0. ltoreq. W/2 in the memory 23.

The short-time Fourier transform unit 31 is associated with each of the remaining microphones 13₂、13₃The corresponding microphone signals DS2, DS3 (step ST12) perform the same steps as described aboveST13 to ST 18. As a result, the memory 23 stores the time series of the short-time fourier transform coefficients, which are the time series of the complex spectrum Zf corresponding to each of the microphone signals DS1, DS2, and DS 3.

Next, the details of the processing of the cross-correlation operation unit 33 in the processing of step ST2 shown in fig. 7 will be described with reference to the flowchart of fig. 9.

First, the cross-correlation operation unit 33 selects the microphone 13₁～13₃Any 2 microphones (step ST 21). In the following, a number m indicating one of the selected 2 microphones and a number n indicating the other microphone are set. m and n are respectively 0-m<n<And M is an integer.

The cross-correlation operation unit 33 executes the following processing of steps ST22 to ST25 for the time series of complex spectra corresponding to the selected microphones m and n, respectively. That is, the cross-correlation operation unit 33 performs the following processes of steps ST23 to ST25 for each frame t (step ST 22).

First, the cross-correlation operation unit 33 acquires the real part Xmtf and the imaginary part Ymtf of the complex spectrum corresponding to the microphone m and the real part Xntf and the imaginary part Yntf of the complex spectrum corresponding to the microphone n from the memory 23. The cross-correlation operation unit 33 calculates a complex cross-correlation spectrum Cf by multiplying the complex spectrum corresponding to the microphone m and the complex spectrum corresponding to the microphone n (step ST 23). The processing of step ST23 is realized by, for example, the following equation (5).

Cf＝(Xmtf*Xntf+Ymtf*Yntf)

+(Xmtf*Yntf-Ymtf*Xntf)j(5)

Next, the cross-correlation operation unit 33 takes the absolute value of the complex cross-correlation spectrum Cf, and converts the absolute value into a logarithm, thereby taking the decibel value (dB value). Thereby, the complex cross-correlation spectrum Cf is converted into intensity (step ST24), and the dynamic range is compressed. The processing of step ST24 is realized by, for example, the following equation (6).

Hf＝10*LOG10(ABS(Cf))(0≤f≤W/2)(6)

Here, Hf is the intensity of the cross-correlation spectrum, LOG10(x) is the usual logarithmic function, and ABS (c) is the absolute value function. The absolute value function is a function for calculating an absolute value by SQRT (a × 2+ b × 2) (SQRT is a square root function) for a complex number c + a + bj (a and b are real numbers, and j is an imaginary unit).

A region for storing a cross-correlation spectrum chart Hmntf, which is a time-series array based on a time series of the cross-correlation spectrum Hf corresponding to the combination (m, n), is prepared in the memory 23. The cross-correlation operation unit 33 stores the calculated Hf in the address corresponding to the frame t in the area (step ST 25). The processing of step ST25 is realized by, for example, the following equation (7).

Hmntf＝Hf(0≤f≤W/2)(7)

After the end of the processing at steps ST23 to ST25 for all frames t, the cross-correlation operation unit 33 executes the processing at steps ST22 to ST25 for the next combination (m, n) (step ST 21). Finally, the cross-correlation operation unit 33 performs the processing of steps ST22 to ST25 for all combinations (m, n).

As a result of the above processing, the memory 23 stores the time series of the cross-correlation spectra between the microphone signals DS1 and DS2, the time series of the cross-correlation spectra between the microphone signals DS2 and DS3, and the time series of the cross-correlation spectra between the microphone signals DS3 and DS1, that is, the cross-correlation spectrogram Hmntf based on each time series. Each cross-correlation spectrum Hmntf satisfies the following formula (8).

Hmntf(0≤m<n<M，0≤t<T，0≤f≤W/2)(8)

Next, the details of the processing of the autocorrelation calculating unit 34 in the processing of step ST2 shown in fig. 7 will be described with reference to the flowchart of fig. 10.

First, the autocorrelation calculating unit 34 selects the microphone 13₁～13₃Any 1 microphone (step ST 31). In the following, a number m (m is 0. ltoreq. m) representing the selected microphone is set<An integer of M).

The autocorrelation calculating unit 34 performs the following processing of steps ST32 to ST35 on the time series of the complex spectrum corresponding to the selected microphone m. That is, the autocorrelation calculating unit 34 performs the following processing of steps ST33 to ST35 for each frame t (step ST 32).

First, the autocorrelation calculating unit 34 acquires the real part Xmtf and the imaginary part Ymtf of the complex spectrum corresponding to the microphone m from the memory 23. The autocorrelation calculating unit 34 calculates an autocorrelation spectrum Af by multiplying the complex spectrum corresponding to the microphone m by the complex conjugate of the complex spectrum (step ST 33). The processing of step ST33 is realized by, for example, the following equation (9). In addition, since the imaginary part of the autocorrelation spectrum Af becomes 0 according to the theory of fourier transform, the calculation of the imaginary part is omitted in expression (9).

Af

＝Xmtf*Xmtf+Ymtf*Ymtf(0≤f≤W/2)(9)

Next, the autocorrelation calculating unit 34 takes the absolute value of the autocorrelation spectrum Af, converts the absolute value into a logarithm, and thereby takes a decibel value (dB value). Thereby, the autocorrelation spectrum Af is converted into the intensity Gf (step ST34), and the dynamic range is compressed. The processing of step ST34 is realized by, for example, the following equation (10).

Gf

＝10*LOG10(ABS(Af))(0≤f≤W/2)(10)

A region for storing an autocorrelation spectrum map Gmtf, which is a time-series array based on a time series of autocorrelation spectra corresponding to the microphone m, is prepared in the memory 23. The autocorrelation calculating unit 34 stores the calculated Gf in the address corresponding to the frame t in the area (step ST 35). The processing of step ST35 is realized by, for example, the following equation (11).

Gmtf＝Gf(0≤f≤W/2)(11)

After the end of the processing at steps ST33 to ST35 for all frames t, the autocorrelation calculating unit 34 performs the processing at steps ST32 to ST35 for the next microphone m (step ST 31). Finally, the autocorrelation calculating unit 34 performs the processing of steps ST32 to ST35 for all the microphones m.

As a result of the above processing, the memory 23 stores the time series of the autocorrelation spectrum in the microphone signal DS1, the time series of the autocorrelation spectrum in the microphone signal DS2, and the time series of the autocorrelation spectrum in the microphone signal DS3, that is, the autocorrelation spectrum Gmtf in each time series. Each autocorrelation spectrum Gmtf satisfies the following expression (12).

Gmtf(0≤m<M，0≤t<T，0≤f≤W/2)(12)

Next, the details of the processing at step ST3 shown in fig. 7, that is, the processing by the neural network unit 35 will be described with reference to the flowcharts of fig. 4 to 6 and 11.

First, the neural network unit 35 acquires a plurality of (3 in embodiment 1) cross-correlation spectrograms Hmntf and a plurality of (3 in embodiment 1) autocorrelation spectrograms Gmtf from the memory 23. The neural network unit 35 generates a three-dimensional array shown in fig. 5, which is a three-dimensional array in which two-dimensional array planes corresponding to the cross-correlation spectrogram Hmntf and two-dimensional array planes corresponding to the autocorrelation spectrogram Gmtf are stacked (step ST 41). The processing of step ST41 is realized by writing a value in the memory 23 according to the following expression (13), for example.

g←0

for m such that 0≤m<M：

Egtf＝Gmtf(0≤f≤W/2)

g←g+1

for m such that 0≤m<M-1：

for m such that m<n<M：

Egtf＝Hmntf(0≤f≤W/2)

g←g+1(13)

Here, Egtf is a three-dimensional array value corresponding to the frequency f, the frame t, and the two-dimensional array plane g. The neural network unit 35 inputs the generated three-dimensional array to an input layer of the neural network.

Next, the neural network unit 35 calculates the activity degrees of the corresponding cells in the convolutional layers from the activity degrees of the cells in the input layer (step ST 42). At this time, the convolution layer performs convolution operation for the three-dimensional region Ω gtf within the input layer. The processing of step ST42 is realized by, for example, the following equation (14).

l←1

Here, l is an index indicating each layer in the neural network (here, l indicating the convolution layer is 1), k is an index indicating each cell in the convolution layer, Alk is the activity of the cell k, OFUN is an output function, and Σ is a sum operation. Ω gtf (k) is a set of indices (g, t, f) indicating the respective cells in the input layer to which the cell k is coupled. Each (g, t, f) in the set is an index indicating a two-dimensional array plane in which the cell is arranged in the input layer, a frame corresponding to the cell in the input layer, and a frequency corresponding to the cell in the input layer. Wlgtfk denotes the loading coefficient of cell k for Egtf and Bltk denotes the bias that cell k has. In addition, the convolution operation is performed such that the index k in the convolutional layer refers to all the cells contained in the convolutional layer.

Next, the neural network unit 35 converts the array shape of the cells in the convolutional layer from a three-dimensional array to a one-dimensional array (step ST 43). This process is realized by storing the above Alk in a one-dimensional array with k as an index.

Next, the neural network unit 35 sequentially executes the processing of step ST45 below for each of the nonlinear layers L (L is 2, …, L-1) (step ST 44). That is, the neural network unit 35 calculates the activity degree of each cell in the layer l from the activity degree of each cell in the layer p (p ═ l-1) (step ST 45). More specifically, the neural network unit 35 multiplies the activity degree of each cell in the layer p by the load coefficient, calculates the sum of the multiplication values of the plurality of cells, adds an offset to the sum value, and calculates the activity degree of the corresponding cell in the layer l by using the output function. The processing of step ST45 is realized by, for example, the following equation (15).

p←l-1

Alk＝OFUN(Σi Wlik*Api+Blk)(15)

Here, Alk is the activity of cell k in layer l, OFUN is the output function, Σ i is the sum operation associated with cell i in layer p to which cell k is coupled, Api is the activity of cell i in layer p, and Blk is the offset. The output function OFUN can use a nonlinear function, i.e., a type S (SIGMOID) function or a TANH function.

Next, the neural network unit 35 calculates the activity degrees of the corresponding cells in the output layer L based on the activity degrees of the cells in the nonlinear layer L-1, which is the layer preceding the output layer L (step ST 46). The processing of step ST46 is realized by, for example, the following equation (16).

p←L-1

ALk＝Σi WLik*Api+BLk(0≤k<K)(16)

Here, ALk is the activity of the cell K in the layer L, Σ i is the sum operation of the cells i in the layer p (p ═ L-1) to which the cell K is coupled, Api is the activity of the cell i in the layer p (p ═ L-1), BLk is the offset, and K is the number of inspection target sites in the elevator 1. In embodiment 1, since the neural network unit 35 calculates the degree of linear degradation, an output function is not provided in the output layer, and the nonlinear processing based on the output function is not executed.

In the following, the subscript k indicates each cell in the output layer L and indicates the part to be inspected corresponding to the cell. In the memory 23, regions for storing the degradation degrees Rk corresponding to the respective inspection target sites k are prepared. The neural network unit 35 stores the activity ALk calculated by the above equation (16) in the area of the memory 23 as the degradation Rk (step ST 47). The processing of step ST47 is realized by, for example, the following equation (17).

Rk＝ALk(0≤k<K)(17)

Next, the processing of step ST4 shown in fig. 7, that is, the processing of the determination unit 36 will be described in detail with reference to the flowchart of fig. 12.

First, the determination unit 36 acquires the degradation degree Rk corresponding to each inspection target region k from the memory 23 (step ST 51).

Next, the determination unit 36 obtains the maximum degradation degree R among the obtained degradation degrees Rk and an index k indicating the inspection target site k corresponding to the maximum degradation degree R (step ST 52). The processing of step ST52 is realized by, for example, the following equation (18).

k*＝ARGMAXk Rk

R*＝Rk*(18)

Here, ARGMAXk Rk is a function of returning an index of Rk at which index k takes the maximum value (when it is k, k satisfies Rk ≧ Rk).

Next, the determination unit 36 compares the maximum degradation degree R with a predetermined threshold value θ (step ST 53).

When the maximum degradation degree R is equal to or less than the threshold value θ (no in step ST53), the determination unit 36 determines that there is no degraded portion (step ST 54). The determination unit 36 outputs a determination result in which the index indicating the deterioration portion is-1 and the degree of deterioration corresponding to the deterioration portion is 0 (step ST 55). That is, the determination result in this case indicates that the elevator 1 has no deteriorated portion.

On the other hand, when the maximum degradation degree R exceeds the threshold value θ (yes in step ST53), the determination unit 36 determines that there is a degraded portion (step ST 56). The determination unit 36 outputs a determination result indicating that the index of the degraded portion is k and the degree of degradation corresponding to the degraded portion is R (step ST 57). That is, the determination result in this case indicates the deterioration portion in the elevator 1 and the degree of deterioration corresponding to the deterioration portion.

The time-series array generated by the time-series array generating unit 32 is not limited to the cross-correlation spectrogram Hmntf and the autocorrelation spectrogram Gmtf. For example, the time-series array generator 32 may generate only the autocorrelation spectrum Gmtf without generating the cross-correlation spectrum Hmntf. Thus, only the autocorrelation spectrum Gmtf is input to the neural network, and therefore, the number of parameters in the neural network can be reduced.

Alternatively, for example, the time-series array generator 32 may generate, instead of the cross-correlation spectrogram Hmntf and the autocorrelation spectrum Gmtf, a time-series array based on a time series of real parts of autocorrelation spectra Af corresponding to the microphone signals DS1, DS2, and DS3, and a time-series array based on a time series of imaginary parts of autocorrelation spectra Af corresponding to the microphone signals DS1, DS2, and DS 3. These time series arrays can be represented by a two-dimensional array in which the first dimension is set as the frequency direction and the second dimension is set as the frame direction, as in the cross-correlation spectrogram Hmntf and the autocorrelation spectrogram Gmtf. In this case, it is required to perform processing equivalent to intensity conversion, cross-correlation operation, and the like in the neural network. Therefore, from the viewpoint of maintaining the output accuracy of the neural network, it is preferable to increase the number of layers within the neural network.

Alternatively, for example, the time-series array generator 32 may generate, instead of the cross-correlation spectrogram Hmntf and the autocorrelation spectrum Gmtf, a time-series array based on a time series of real parts of the complex spectrum Zf corresponding to the microphone signals DS1, DS2, and DS3, and a time-series array based on an imaginary part of the complex spectrum Zf corresponding to the microphone signals DS1, DS2, and DS 3. These time series arrays can be represented by a two-dimensional array in which the first dimension is set as the frequency direction and the second dimension is set as the frame direction, as in the cross-correlation spectrogram Hmntf and the autocorrelation spectrogram Gmtf. In this case, it is required to perform processes equivalent to intensity conversion, autocorrelation operation, cross-correlation operation, and the like in the neural network. Therefore, from the viewpoint of maintaining the output accuracy of the neural network, it is preferable to increase the number of layers within the neural network.

When it is determined that there is a deteriorated portion, the determination result output from the determination unit 36 may indicate at least the deteriorated portion, or may not indicate the degree of deterioration corresponding to the deteriorated portion.

The determination unit 36 may perform a certain determination regarding the degraded portion using the degradation degree Rk output from the neural network unit 35, and the determination is not limited to the determination of the presence or absence of the degraded portion.

The microphone array device 4 may have a plurality of microphones, and the number of the microphones is not limited to 3. For example, the microphone array apparatus 4 may also have 4 microphones. In this case, the microphones may be arranged at positions corresponding to respective vertices of a square along the circumferential portion of the placement surface 12. By increasing the number of microphones, the resolution with respect to the sound arrival direction is improved, and therefore, examination target portions arranged closer to each other can be identified. In addition, as the number of microphones increases, it is required to increase the number of parameters in the neural network.

The microphone array device 4 may be provided on the car bottom of the car 3 instead of the car top of the car 3. In addition, the microphone array device 4 may be provided at any position in the hoistway 2. However, from the viewpoint of increasing the resolution in the sound arrival direction and reducing the number of microphones, it is preferable that the microphone array device 4 is provided in a moving body such as the car 3.

The inspection target device is not limited to the elevator 1. The inspection target device may be a railway vehicle, for example. In this case, the microphone array device 4, the audio interface device 5, and the deterioration part estimation device 6 may be installed in the railway vehicle. In addition, the inspection target apparatus may be an apparatus including a mobile body or an apparatus configured by a mobile body, and may be any apparatus.

As described above, the deteriorated region estimation device 6 according to embodiment 1 includes: a short-time Fourier transform unit 31 for acquiring and processing a plurality of microphones 13 provided in the device to be inspected₁～13₃Calculating time series of short-time fourier transform coefficients corresponding to the microphone signals DS1, DS2, DS3, respectively, for the microphone signals DS1, DS2, DS3, respectively; a time-series array generating unit 32 that generates a time-series array for input to the neural network using the time series of the short-time fourier transform coefficients; a neural network unit 35 configured by a neural network, which receives an input of the time-series array and outputs a degradation degree corresponding to an inspection target portion in the inspection target device; and a determination unit 36 that determines a degraded portion in the inspection target apparatus using the degradation degree. This makes it possible to take the sound arrival direction into consideration, and improve the accuracy of estimating the degraded portion.

Further, the inspection target apparatus includes a moving body, a microphone 13₁～13₃Is arranged on the moving body. This makes it possible to consider the direction of arrival of sound and reduce the number of microphones.

The deterioration part estimation system 100 according to embodiment 1 includes a plurality of microphones 13 provided in the device to be inspected₁～13₃. The deteriorated portion estimation system 100 includes a deteriorated portion estimation device 6, and the deteriorated portion estimation device 6 includes: a short-time Fourier transform unit 31 for obtaining the signal corresponding to each of the microphones 13₁～13₃Calculating time series of short-time fourier transform coefficients corresponding to the microphone signals DS1, DS2, DS3, respectively, for the microphone signals DS1, DS2, DS3, respectively; a time-series array generating unit 32 that generates a time-series array for input to the neural network using the time series of the short-time fourier transform coefficients; a neural network unit 35 configured by a neural network, which receives an input of the time-series array and outputs a degradation degree corresponding to an inspection target portion in the inspection target device; and a determination unit (36) for determining the position of the object,which determines a deteriorated portion in the inspection target apparatus using the degree of deterioration. This makes it possible to take the sound arrival direction into consideration, and improve the accuracy of estimating the degraded portion.

Further, the deteriorated portion estimation method according to embodiment 1 includes the steps of: the short-time fourier transform unit 31 acquires information related to a plurality of microphones 13 provided in the device to be inspected₁～13₃Calculating time series of short-time fourier transform coefficients corresponding to the respective microphone signals DS1, DS2, and DS3 for the respective microphone signals DS1, DS2, and DS3 (step ST 1); the time-series array generator 32 generates a time-series array for input to the neural network using the time series of short-time fourier transform coefficients (step ST 2); the neural network unit 35, which is composed of a neural network, receives the input of the time-series array, and outputs a degradation degree corresponding to the part to be inspected in the apparatus to be inspected (step ST 3); and the determination unit 36 determines a degraded portion in the inspection target apparatus using the degradation degree (step ST 4). This makes it possible to take the sound arrival direction into consideration, and improve the accuracy of estimating the degraded portion.

In the present application, any component of the embodiment may be modified or omitted within the scope of the invention.

Industrial applicability

The deterioration part estimation device of the present invention can be used to estimate a deterioration part in an inspection target device such as an elevator or a railway vehicle.

Description of the reference symbols

1: an elevator; 2: a hoistway; 3: a car; 4: a microphone array device; 5: an audio interface device; 6: a degraded portion estimating device; 11: a base; 12: a carrying surface; 13₁、13₂、13₃: a microphone; 21: an input/output interface; 22: a processor; 23: a memory; 31: a short-time Fourier transform unit; 32: a time-series array generating unit; 33: a cross-correlation operation unit; 34: an autocorrelation calculating unit; 35: a neural network unit; 36: a determination unit; 100: a degraded portion estimation system.

Claims

1. A degraded portion estimating apparatus, wherein the degraded portion estimating apparatus has:

a short-time Fourier transform unit that acquires microphone signals corresponding to a plurality of microphones provided in an inspection target device, and calculates a time series of short-time Fourier transform coefficients corresponding to the respective microphone signals;

a time-series array generating unit that generates a plurality of time-series arrays for input to a neural network using the time series of the short-time fourier transform coefficients;

a neural network unit configured to receive an input of the plurality of time-series arrays and output a degradation degree corresponding to an inspection target portion in the inspection target device; and

a determination unit that determines a deteriorated portion in the inspection target apparatus using the degree of deterioration.

2. The degradation portion estimation device according to claim 1,

the time-series array generating unit generates the time-series array based on a time series of autocorrelation spectra in the microphone signals and the time-series array based on a cross-correlation spectrum between the microphone signals, using the time series of short-time fourier transform coefficients.

3. The degradation portion estimation device according to claim 1,

the time-series array generating unit generates the time-series array based on a time series of autocorrelation spectra in the microphone signal using the time series of short-time fourier transform coefficients.

4. The degradation portion estimation device according to claim 1,

the time-series array generating section generates the time-series array based on a time series of a real part in the short-time fourier transform coefficient and the time-series array based on a time series of an imaginary part in the short-time fourier transform coefficient, using the time series of the short-time fourier transform coefficient.

5. The degradation portion estimation device according to claim 1,

the neural network includes an input layer that receives a plurality of inputs to the time-series array, and a convolution layer that performs a convolution operation on the input layer.

6. The degradation portion estimation device according to claim 5,

the input layer has a three-dimensional array structure in which two-dimensional array surfaces of a plurality of cells are stacked, the time-series array corresponding to each of the two-dimensional array surfaces is input to the input layer,

the convolutional layer performs the convolution operation for a three-dimensional region within the input layer.

7. The degradation portion estimation device according to claim 1,

the microphones are arranged at positions corresponding to respective vertices of a regular polygon.

8. The degradation portion estimation device according to claim 1,

the inspection target apparatus includes a moving body, and the microphone is provided on the moving body.

9. The degradation portion estimation device according to claim 8,

the inspection target device is constituted by an elevator, the moving body is constituted by a car of the elevator, and the microphone is provided at a ceiling portion or a bottom portion of the car.

10. The degradation portion estimation device according to claim 1,

the determination unit determines whether or not the deterioration portion exists, and outputs a determination result indicating the deterioration portion and the degree of deterioration corresponding to the deterioration portion when the determination unit determines that the deterioration portion exists.

11. A degraded portion estimating system, wherein the degraded portion estimating system has:

a plurality of microphones provided to the inspection target device; and

a deteriorated portion estimation device includes a time-series array generation unit configured to acquire a microphone signal corresponding to each of the microphones and calculate a time series of short-time Fourier transform coefficients corresponding to each of the microphone signals, a neural network unit configured to receive inputs of the plurality of time-series arrays and output a degree of deterioration corresponding to a portion to be inspected in the apparatus to be inspected, and a determination unit configured to determine a deteriorated portion in the apparatus to be inspected using the degree of deterioration.

12. A degraded portion estimating method, wherein the degraded portion estimating method has the steps of:

a neural network unit configured to receive an input of the plurality of time-series arrays and output a degradation degree corresponding to an inspection target portion of the inspection target device; and

a determination unit determines a degraded portion in the inspection target apparatus using the degradation degree.