WO2020054410A1 - ニューラルネットワーク装置 - Google Patents
ニューラルネットワーク装置 Download PDFInfo
- Publication number
- WO2020054410A1 WO2020054410A1 PCT/JP2019/033625 JP2019033625W WO2020054410A1 WO 2020054410 A1 WO2020054410 A1 WO 2020054410A1 JP 2019033625 W JP2019033625 W JP 2019033625W WO 2020054410 A1 WO2020054410 A1 WO 2020054410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- processing unit
- input
- layer processing
- layer
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 109
- 238000012545 processing Methods 0.000 claims abstract description 231
- 238000006243 chemical reaction Methods 0.000 claims description 41
- 230000004913 activation Effects 0.000 claims description 35
- 230000006870 function Effects 0.000 claims description 32
- 238000011176 pooling Methods 0.000 claims description 24
- 238000012886 linear function Methods 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 abstract description 22
- 230000009466 transformation Effects 0.000 abstract description 6
- 238000000034 method Methods 0.000 description 33
- 230000008569 process Effects 0.000 description 31
- 238000001514 detection method Methods 0.000 description 13
- 230000008859 change Effects 0.000 description 7
- 230000007613 environmental effect Effects 0.000 description 7
- 230000000903 blocking effect Effects 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Definitions
- the present technology relates to a neural network device, and more particularly to a neural network device capable of improving recognition performance.
- ⁇ ⁇ ⁇ Consider automatic recognition technology (identification, detection, etc.) for various signals such as images and audio.
- a neural network is considered as a method for recognition (for example, see Non-Patent Document 1).
- a neural network processing device that receives a signal as an input and outputs a result of recognition processing on the signal includes, for example, a convolution layer processing unit, an activation layer processing unit, a pooling layer processing unit, a convolution layer processing unit, and an activation layer.
- the processing unit, the pooling layer processing unit, the convolution layer processing unit, and the activation layer processing unit are provided in this order from the input side to the output side.
- Such a neural network processing device receives data of a certain signal as input, performs data conversion with eight components from a first convolutional layer processing unit to a last activation layer processing unit, and obtains a recognition result for input data. Output.
- the magnitude of the signal input to the neural network may be biased.
- a very loud signal such as an environmental sound in a train or an aircraft may be input to this neural network, but on the other hand, the environmental sound signal of the office to be identified is mostly small.
- the microphone when the microphone is hit, a very large signal is input to the microphone compared to other environmental sounds. When the microphone is closed, a signal that is much smaller than other environmental sounds is input to the microphone. Even when these are detected individually or simultaneously, it is required to construct and learn a neural network to cope with the deviation of the magnitude of the input signal.
- the present technology has been made in view of such a situation, and aims to improve recognition performance.
- the neural network device includes a nonlinear conversion layer processing unit that performs conversion using a nonlinear function having learnable parameters.
- conversion is performed by a non-linear conversion layer processing unit that performs conversion using a non-linear function having a parameter that can be learned.
- FIG. 14 is a diagram illustrating a configuration example of a computer.
- the present technology is intended to improve a recognition performance by constructing a neural network having a non-linear transformation based on a parameter that can be learned as a component. That is, high performance can be obtained even when the size of the neural network is limited.
- non-linear conversion is performed by, for example, one or more of a logarithmic function, a power function, an exponential function, a trigonometric function, a hyperbolic function, and other linear or non-linear functions, and a function obtained by performing four arithmetic operations or combining them.
- the neural network of the present technology has components that correspond to the deviation of the magnitude of the input signal.
- This neural network has, as a component, a non-linear transformation using a parameter that can be learned.
- the components of such a non-linear transform perform an optimal scale transform on the bias of the magnitude of the input signal, so that the neural network can analyze the portion where the magnitude of the input signal is concentrated in more detail. become.
- the features of the neural network to which the present technology is applied will be described as an example of a neural network to which the present technology is applied, which has a “logarithmic layer” that performs a non-linear transformation using a logarithmic function as a component. .
- FIG. 1 is a diagram illustrating a configuration example of an embodiment of a neural network processing device to which the present technology is applied.
- the neural network processing device 11 shown in FIG. 1 is configured by a neural network, and includes a convolutional layer processing unit 21, an activation layer processing unit 22, a pooling layer processing unit 23, a logarithmic layer processing unit 24, and a convolutional layer processing unit. 25, an activation layer processing unit 26, a pooling layer processing unit 27, a convolution layer processing unit 28, and an activation layer processing unit 29.
- the neural network processing device 11 is a neural network in which a logarithmic layer processing unit 24, that is, a logarithmic layer is introduced, in addition to a general configuration.
- the neural network processing device 11 performs processing of each layer (layer) of the neural network on input data that is input data, and outputs a recognition result of a predetermined recognition target for the input data.
- the convolutional layer processing unit 21 to the activation layer processing unit 29 are each layer of the neural network.
- the convolution layer processing unit 21 performs a convolution layer process on the supplied input data, and supplies the processing result to the activation layer processing unit 22.
- the activation layer processing unit 22 performs an activation layer process on the processing result supplied from the convolution layer processing unit 21 and supplies the processing result to the pooling layer processing unit 23.
- the pooling layer processing unit 23 performs pooling layer processing on the processing result supplied from the activation layer processing unit 22, and supplies the processing result to the logarithmic layer processing unit 24.
- the logarithmic layer processing unit 24 performs a non-linear conversion process using a logarithmic function on the processing result supplied from the pooling layer processing unit 23 as a logarithmic layer process, and supplies the processing result to the convolutional layer processing unit 25.
- the convolution layer processing unit 25 performs a convolution layer process on the processing result supplied from the logarithmic layer processing unit 24 and supplies the processing result to the activation layer processing unit 26.
- the activation layer processing unit 26 performs an activation layer process on the processing result supplied from the convolutional layer processing unit 25, and supplies the processing result to the pooling layer processing unit 27.
- the pooling layer processing unit 27 performs pooling layer processing on the processing result supplied from the activation layer processing unit 26, and supplies the processing result to the convolutional layer processing unit 28.
- the convolution layer processing unit 28 performs a convolution layer process on the processing result supplied from the pooling layer processing unit 27 and supplies the processing result to the activation layer processing unit 29.
- the activation layer processing unit 29 performs an activation layer process on the processing result supplied from the convolution layer processing unit 28, and outputs the processing result as a recognition result of a recognition target for input data.
- step S11 the convolution layer processing unit 21 performs a convolution layer process on the supplied input data, and supplies the processing result to the activation layer processing unit 22.
- step S12 the activation layer processing unit 22 performs an activation layer process on the processing result supplied from the convolution layer processing unit 21, and supplies the processing result to the pooling layer processing unit 23.
- step S13 the pooling layer processing unit 23 performs pooling layer processing on the processing result supplied from the activation layer processing unit 22, and supplies the processing result to the logarithmic layer processing unit 24.
- step S14 the logarithmic layer processing unit 24 performs logarithmic layer processing on the processing result supplied from the pooling layer processing unit 23, and supplies the processing result to the convolutional layer processing unit 25.
- step S15 the convolution layer processing unit 25 performs a convolution layer process on the processing result supplied from the logarithmic layer processing unit 24, and supplies the processing result to the activation layer processing unit 26.
- step S16 the activation layer processing unit 26 performs an activation layer process on the processing result supplied from the convolution layer processing unit 25, and supplies the processing result to the pooling layer processing unit 27.
- step S17 the pooling layer processing unit 27 performs pooling layer processing on the processing result supplied from the activation layer processing unit 26, and supplies the processing result to the convolutional layer processing unit 28.
- step S18 the convolution layer processing unit 28 performs a convolution layer process on the processing result supplied from the pooling layer processing unit 27, and supplies the processing result to the activation layer processing unit 29.
- step S19 the activation layer processing unit 29 performs an activation layer process on the processing result supplied from the convolution layer processing unit 28, and outputs the processing result as a recognition result of a recognition target for input data, The recognition processing ends.
- the neural network processing device 11 performs the process of converting the data input in each layer of the neural network, and outputs the processing result as the recognition result of the recognition target. At this time, by performing nonlinear conversion processing in at least one layer, high recognition performance can be obtained even for a small-scale neural network. That is, the recognition performance can be improved.
- a neural network learning device that generates the neural network processing device 11 by learning is configured, for example, as shown in FIG. Note that, in FIG. 3, portions corresponding to those in FIG.
- the neural network learning device 51 generates (constructs) the neural network processing device 11 by learning based on the data of the signal input from the database 52.
- the neural network learning device 51 includes an input data selection unit 61 and a coefficient update unit 62.
- the input data selection unit 61 selects the data used for learning from the signal data recorded in the database 52 and supplies the selected data to the coefficient update unit 62 and the neural network processing device 11.
- the coefficient update unit 62 performs processing of the neural network coefficients, that is, the processing in the layer of the neural network processing device 11, in accordance with the supply of the data from the input data selection unit 61 and the supply of the recognition result from the neural network processing device 11.
- the used coefficients are updated and supplied to the neural network processing device 11.
- a learning system for learning the neural network processing device 11 is configured by the neural network processing device 11, the neural network learning device 51, and the database 52.
- step S41 the input data selection unit 61 performs input data selection for selecting input data to be used for learning from among signal data recorded in the database 52, and converts the selected input data into a coefficient update unit 62 , And the convolutional layer processing unit 21 of the neural network processing device 11.
- steps S42 to S50 are performed. These processes are performed in steps S11 to S19 of FIG. Since the processing is the same as that of the processing, the description thereof is omitted.
- the convolutional layer processing unit 21 in the neural network processing device 11 on the leftmost side that is, the input side in FIG.
- the conversion process (data conversion) on the data is performed by the nine components (layers) up to the unit 29.
- ⁇ Data obtained by the processing in the activation layer processing unit 29 is supplied to the coefficient updating unit 62 as the recognition result of the recognition target for the input data.
- the convolutional layer and the logarithmic layer that is, the convolutional layer processing unit 21, the logarithmic layer processing unit 24, the convolutional layer processing unit 25, and the convolutional layer processing unit 28 are supplied from the coefficient updating unit 62. It is assumed that convolutional layer processing and logarithmic layer processing, that is, data conversion processing (conversion processing) are performed using the coefficients.
- step S51 the coefficient updating unit 62 updates the coefficient based on the input data supplied from the input data selecting unit 61 and the recognition result supplied from the activation layer processing unit 29 of the neural network processing device 11.
- step S51 the coefficients of the neural network are updated by the coefficient updating unit 62 so that the input data and the recognition result have a desired relationship, that is, a desired input / output relationship is realized.
- the coefficients used in the three convolutional layers that is, the coefficients used in the convolutional layer processing in the convolutional layer processing unit 21, the convolutional layer processing unit 25, and the convolutional layer processing unit 28, and the logarithm in the logarithmic layer processing unit 24
- the coefficients (parameters) used in the layer processing are updated.
- the update of the coefficient can be performed by, for example, an error back propagation method.
- the coefficient updating unit 62 supplies the updated coefficient to each unit of the neural network processing device 11.
- the convolutional layer processing unit 21, the logarithmic layer processing unit 24, the convolutional layer processing unit 25, and the convolutional layer processing unit 28 replace the held coefficient with a coefficient newly supplied from the coefficient update unit 62, and Update.
- step S52 the coefficient updating unit 62 determines whether or not a condition for terminating the learning is satisfied.
- the condition for terminating the learning may be any condition, such as an error between a desired input / output relationship and an actual input / output relationship being equal to or less than a threshold.
- step S52 If it is determined in step S52 that the condition for terminating the learning is not satisfied, the process returns to step S41, and the above-described process is repeatedly performed.
- step S52 if it is determined in step S52 that the condition for terminating the learning is satisfied, the learning process ends.
- the final neural network processing device 11 is obtained by learning, and the neural network processing device 11 uses the coefficient supplied from the coefficient updating unit 62 and finally held to input data using the coefficient. Perform recognition processing.
- the learning system learns the neural network processing device 11 by updating the coefficients used in the neural network processing device 11.
- FIG. 5 shows a relational expression of input and output of a logarithmic layer (logarithmic layer processing unit 24) and a graph.
- the horizontal axis indicates the input x of the logarithmic layer
- the vertical axis indicates the output y of the logarithmic layer.
- the logarithmic layer (the logarithmic layer processing unit 24) outputs 0 as the output y.
- the logarithmic layer increases the rate of change of the output y with respect to the input x as the input x increases.
- the value of the function that becomes smaller is output as the output y.
- the coefficient p is updated (learned) by the coefficient updating unit 62.
- the rate of change of output y is very large, especially for a small positive input x. Further, it has a coefficient (parameter) p, and by changing the coefficient p, the relationship between the input x and the output y can be changed as shown in FIG.
- the rate of change of the output y with respect to the small positive input x is large, so that the neural network having the logarithmic layer as a component, that is, the neural network processing device 11,
- the signal (input data) can be analyzed in more detail.
- this neural network is particularly effective when the magnitude of the input signal (input data) is biased, such as when the above-described office environment sound or microphone closing sound is identified. It is.
- the portion indicated by arrow Q11 indicates a user interface, that is, an operation “direct tap” in which the user directly taps the microphone portion with a finger as an operation by the user.
- the portion indicated by arrow Q12 indicates, as a user interface, an operation “rub” by which the user rubs the microphone portion with a finger.
- the portion indicated by arrow Q13 indicates, as a user interface, an operation of the user closing the microphone portion with his / her finger “close”. Further, the portion indicated by arrow Q14 shows, as a user interface, an operation “blocking tap” in which the user taps while closing the microphone portion with the finger (tapping while closing).
- acoustic data obtained by collecting sound with a microphone is used as input data to perform recognition processing for recognizing each operation by a neural network, that is, a sound generated when each operation is performed.
- FIG. 7 shows the result of the recognition processing.
- FIG. 7 shows a rate and a detection success rate when each operation is detected by the neural network processing device 11 having introduced a logarithmic layer. That is, the vertical axis in FIG. 7 indicates the detection success rate when each operation is detected (recognized).
- DNN general neural network
- the detection success rate of the operation “direct tap” is shown in the part indicated by the arrow Q21
- the detection success rate of the operation “rub” is shown in the part indicated by the arrow Q22.
- the portion indicated by arrow Q23 indicates the detection success rate of the operation "blocking”
- the portion indicated by arrow Q24 indicates the detection success rate of the operation "blocking tap”.
- the left side in the figure shows the detection success rate when a general neural network is used, and the right side uses the neural network processing device 11 in the figure. It shows the detection success rate in the case.
- the detection target sound when the threshold value is set so that the excessive detection rate is 0.01%, that is, the detection success rate of the operation of the recognition target.
- the value of the coefficient (parameter) p of the logarithmic layer learned in the operation “close” is 4.25, and the coefficient p learned in the other three types of operations “direct tap”, “rub”, and “block tap”. (2.34, 1.29, and 1.06, respectively).
- the effective range of the logarithmic layer is not limited to the case where the ambient sound of the office or the closing sound of the microphone is identified.
- a general range is used for an audio signal that often converts a signal magnitude to a logarithmic scale (such as a decibel value) or the like. It is effective for an audio signal that often converts a signal magnitude to a logarithmic scale (such as a decibel value) or the like. It is effective for
- the present technology may be effective for other signals such as images.
- the present technology is similarly effective not only in a small-scale neural network but also in a large-scale neural network.
- the neural network described with reference to FIGS. 1 to 4 is an example of a neural network having a component that performs non-linear conversion using a learnable coefficient (parameter), and various other modified examples are considered.
- various examples other than the logarithmic layer can be considered as the constituent elements.
- FIGS. 8 and 9 For example, as an example of a component (layer) that performs non-linear conversion, input and output relational expressions and graphs are shown in FIGS. 8 and 9 for an inverse proportional layer using an inverse proportional function and a power layer using a power function. Shown in 8 and 9, the horizontal axis indicates the input x, and the vertical axis indicates the output y.
- FIG. 8 shows the relationship between the input x and the output y in the inverse proportional layer.
- the input x is negative, that is, when x ⁇ 0, the inverse proportional layer outputs 0 as the output y.
- FIG. 9 shows the relationship between the input x and the output y in the power layer.
- the power layer outputs 0 as the output y when the input x is negative, that is, when x ⁇ 0.
- the coefficient (parameter) as p the coefficient (parameter) as p
- the coefficient p is updated (learned) by the coefficient updating unit 62.
- non-linear transformations include not only logarithmic functions, power functions (including inversely proportional functions), but also one or more of exponential functions, trigonometric functions, hyperbolic functions, and other linear or non-linear functions, and their four arithmetic operations, This may be performed using a function obtained by synthesis or the like.
- this component that is, the component (layer) that performs the nonlinear conversion, can be introduced at an arbitrary position in the neural network in an arbitrary form.
- the present component may be introduced at a plurality of locations in the neural network.
- coefficients (parameters) of this component may be applied commonly to all dimensions of the input x, or may be different for each dimension.
- the logarithmic layer (logarithmic layer processing unit 24) has 16 types of input channels.
- Different parameters may be applied to each.
- the parameters (coefficients) of this component may not be included in the learning target, and may be fixed values.
- the fixed value may be determined by a human in some way. For example, a fixed value may be determined based on a certain rule determined by a human based on a statistical value of the distribution of the magnitude of the input signal.
- the initial value for learning the parameters (coefficients) of this component may be determined based on the value thus determined by the human.
- the parameters of this component and the coefficients of the other components of the neural network may be learned at the same time, or one may be fixed while the other is learned.
- the recognition performance of the neural network can be improved. Moreover, according to the present technology, high recognition performance can be obtained even for a small-scale neural network.
- Example of computer configuration By the way, the above-described series of processing can be executed by hardware or can be executed by software.
- a program constituting the software is installed in a computer.
- the computer includes a computer incorporated in dedicated hardware, a general-purpose personal computer that can execute various functions by installing various programs, and the like.
- FIG. 10 is a block diagram illustrating a configuration example of hardware of a computer that executes the series of processes described above by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- the input / output interface 505 is further connected to the bus 504.
- An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
- the input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like.
- the output unit 507 includes a display, a speaker, and the like.
- the recording unit 508 includes a hard disk, a nonvolatile memory, and the like.
- the communication unit 509 includes a network interface and the like.
- the drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input / output interface 505 and the bus 504 and executes the program, for example. Is performed.
- the program executed by the computer (CPU 501) can be provided by being recorded on a removable recording medium 511 as a package medium or the like, for example. Further, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 508 via the input / output interface 505 by attaching the removable recording medium 511 to the drive 510.
- the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508.
- the program can be installed in the ROM 502 or the recording unit 508 in advance.
- the program executed by the computer may be a program in which processing is performed in chronological order in the order described in this specification, or may be performed in parallel or at a necessary timing such as when a call is made. It may be a program that performs processing.
- the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and processed jointly.
- each step described in the above-described flowchart can be executed by a single device, or can be shared and executed by a plurality of devices.
- one step includes a plurality of processes
- the plurality of processes included in the one step can be executed by one device or can be shared and executed by a plurality of devices.
- the present technology may have the following configurations.
- a neural network device comprising a non-linear conversion layer processing unit for performing conversion using a non-linear function having parameters that can be learned.
- An input unit to which an input signal is input is further provided, The neural network device according to any one of (1) to (3), wherein the input unit receives the input signal having a bias in signal magnitude.
- a pooling layer processing unit The neural network device according to any one of (1) to (4), wherein the processing of the nonlinear conversion layer processing unit is performed after the processing of the pooling layer processing unit.
- a convolutional layer processing unit The neural network device according to any one of (1) to (5), wherein the processing of the non-linear conversion layer processing unit is performed before the processing of the convolution layer processing unit.
- (7) The neural network device according to any one of (1) to (6), wherein the conversion by the non-linear function of the non-linear conversion layer processing unit is performed as an activation function.
- ⁇ 11 ⁇ neural network processing unit ⁇ 21 ⁇ convolutional layer processing unit, ⁇ 24 ⁇ logarithmic layer processing unit, ⁇ 25 ⁇ convolutional layer processing unit, ⁇ 28 ⁇ convolutional layer processing unit, ⁇ 51 ⁇ neural network learning device, ⁇ 61 ⁇ input data selection unit, ⁇ 62 ⁇ coefficient updating unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Image Analysis (AREA)
Abstract
Description
〈ニューラルネットワーク処理装置の構成例〉
本技術は、学習可能なパラメータによる非線形な変換を構成要素として持つニューラルネットワークを構築することで、認識性能を向上させることができるようにするものである。すなわち、ニューラルネットワークの規模に制限がある場合でも、高い性能を得ることができるようにするものである。
次に、図1に示したニューラルネットワーク処理装置11の動作について説明する。
また、ニューラルネットワーク処理装置11を学習により生成するニューラルネットワーク学習装置は、例えば図3に示すように構成される。なお、図3において図1における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
次に、図3に示した学習システムにより行われる学習処理について説明する。すなわち、以下、図4のフローチャートを参照して、学習システムにより行われる学習処理について説明する。
ここで、ニューラルネットワークに対数レイヤを導入することによる認識性能の向上について説明する。
ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。
学習可能なパラメータを持つ非線形な関数により変換を行う非線形変換レイヤ処理部を備える
ニューラルネットワーク装置。
(2)
前記非線形変換レイヤ処理部の非線形な関数による変換は、対数関数による変換である
(1)に記載のニューラルネットワーク装置。
(3)
前記非線形変換レイヤ処理部の非線形な関数による変換は、複数の非線形な関数の組み合わせによる変換である
(1)に記載のニューラルネットワーク装置。
(4)
入力信号が入力される入力部をさらに備え、
前記入力部には、信号の大きさに偏りがある前記入力信号が入力される
(1)乃至(3)の何れか一項に記載のニューラルネットワーク装置。
(5)
プーリングレイヤ処理部をさらに備え、
前記非線形変換レイヤ処理部の処理は、前記プーリングレイヤ処理部による処理の後に行われる
(1)乃至(4)の何れか一項に記載のニューラルネットワーク装置。
(6)
畳み込みレイヤ処理部をさらに備え、
前記非線形変換レイヤ処理部の処理は、前記畳み込みレイヤ処理部による処理の前に行われる
(1)乃至(5)の何れか一項に記載のニューラルネットワーク装置。
(7)
前記非線形変換レイヤ処理部の非線形な関数による変換は、活性化関数として行われる
(1)乃至(6)の何れか一項に記載のニューラルネットワーク装置。
Claims (7)
- 学習可能なパラメータを持つ非線形な関数により変換を行う非線形変換レイヤ処理部を備える
ニューラルネットワーク装置。 - 前記非線形変換レイヤ処理部の非線形な関数による変換は、対数関数による変換である
請求項1に記載のニューラルネットワーク装置。 - 前記非線形変換レイヤ処理部の非線形な関数による変換は、複数の非線形な関数の組み合わせによる変換である
請求項1に記載のニューラルネットワーク装置。 - 入力信号が入力される入力部をさらに備え、
前記入力部には、信号の大きさに偏りがある前記入力信号が入力される
請求項1に記載のニューラルネットワーク装置。 - プーリングレイヤ処理部をさらに備え、
前記非線形変換レイヤ処理部の処理は、前記プーリングレイヤ処理部による処理の後に行われる
請求項1に記載のニューラルネットワーク装置。 - 畳み込みレイヤ処理部をさらに備え、
前記非線形変換レイヤ処理部の処理は、前記畳み込みレイヤ処理部による処理の前に行われる
請求項1に記載のニューラルネットワーク装置。 - 前記非線形変換レイヤ処理部の非線形な関数による変換は、活性化関数として行われる
請求項1に記載のニューラルネットワーク装置。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/250,777 US20210312231A1 (en) | 2018-09-11 | 2019-08-28 | Neural network device |
BR112021004116-8A BR112021004116A2 (pt) | 2018-09-11 | 2019-08-28 | dispositivo de rede neural |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018-169718 | 2018-09-11 | ||
JP2018169718A JP2022001968A (ja) | 2018-09-11 | 2018-09-11 | ニューラルネットワーク装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020054410A1 true WO2020054410A1 (ja) | 2020-03-19 |
Family
ID=69777577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/033625 WO2020054410A1 (ja) | 2018-09-11 | 2019-08-28 | ニューラルネットワーク装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20210312231A1 (ja) |
JP (1) | JP2022001968A (ja) |
BR (1) | BR112021004116A2 (ja) |
WO (1) | WO2020054410A1 (ja) |
-
2018
- 2018-09-11 JP JP2018169718A patent/JP2022001968A/ja active Pending
-
2019
- 2019-08-28 WO PCT/JP2019/033625 patent/WO2020054410A1/ja active Application Filing
- 2019-08-28 US US17/250,777 patent/US20210312231A1/en active Pending
- 2019-08-28 BR BR112021004116-8A patent/BR112021004116A2/pt not_active Application Discontinuation
Non-Patent Citations (3)
Title |
---|
GOOFFELLOW, IAN J. ET AL.: "Maxout Networks", ARXIV, vol. 4, 20 September 2013 (2013-09-20), pages 1 - 9, XP055252282, Retrieved from the Internet <URL:https://arxiv.org/pdf/1302.4389.pdf> [retrieved on 20191113] * |
HE, KAIMING ET AL.: "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification", ARXIV, 6 February 2015 (2015-02-06), pages 1 - 11, XP055694220, Retrieved from the Internet <URL:https://arxiv.org/pdf/1502.01852.pdf> [retrieved on 20191113] * |
MIYASHITA, DAISUKE ET AL.: "Convolutional Neural Networks using Logarithmic Data Representation", ARXIV, 17 March 2016 (2016-03-17), pages 1 - 10, XP080686928, Retrieved from the Internet <URL:https://arxiv.org/pdf/1603.01025.pdf> [retrieved on 20191113] * |
Also Published As
Publication number | Publication date |
---|---|
JP2022001968A (ja) | 2022-01-06 |
US20210312231A1 (en) | 2021-10-07 |
BR112021004116A2 (pt) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240038218A1 (en) | Speech model personalization via ambient context harvesting | |
CN111292764B (zh) | 辨识系统及辨识方法 | |
CN110600017A (zh) | 语音处理模型的训练方法、语音识别方法、系统及装置 | |
US11514925B2 (en) | Using a predictive model to automatically enhance audio having various audio quality issues | |
JP6987075B2 (ja) | オーディオ源分離 | |
CN103258533B (zh) | 远距离语音识别中的模型域补偿新方法 | |
WO2022141868A1 (zh) | 一种提取语音特征的方法、装置、终端及存储介质 | |
WO2020140374A1 (zh) | 语音数据处理方法、装置、设备及存储介质 | |
KR102401959B1 (ko) | 다채널 음향 신호를 이용한 심화 신경망 기반의 잔향 제거, 빔포밍 및 음향 인지 모델의 결합 학습 방법 및 장치 | |
CN111128222B (zh) | 语音分离方法、语音分离模型训练方法和计算机可读介质 | |
US10262680B2 (en) | Variable sound decomposition masks | |
WO2016119388A1 (zh) | 一种基于语音信号构造聚焦协方差矩阵的方法及装置 | |
KR20200029351A (ko) | 샘플 처리 방법, 장치, 기기 및 저장 매체 | |
EP4371112A1 (en) | Speech enhancement | |
CN108847251B (zh) | 一种语音去重方法、装置、服务器及存储介质 | |
CN112488306A (zh) | 一种神经网络压缩方法、装置、电子设备和存储介质 | |
WO2020054410A1 (ja) | ニューラルネットワーク装置 | |
US11322169B2 (en) | Target sound enhancement device, noise estimation parameter learning device, target sound enhancement method, noise estimation parameter learning method, and program | |
JP7024615B2 (ja) | 音響信号分離装置、学習装置、それらの方法、およびプログラム | |
CN116982111A (zh) | 音频特征补偿方法、音频识别方法及相关产品 | |
US20230343312A1 (en) | Music Enhancement Systems | |
CN111354372A (zh) | 一种基于前后端联合训练的音频场景分类方法及系统 | |
WO2024018390A1 (en) | Method and apparatus for speech enhancement | |
CN113744754B (zh) | 语音信号的增强处理方法和装置 | |
CN113470686B (zh) | 语音增强方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19859078 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112021004116 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112021004116 Country of ref document: BR Kind code of ref document: A2 Effective date: 20210304 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19859078 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |