WO2022134391A1 - 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备 - Google Patents

融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备 Download PDF

Info

Publication number
WO2022134391A1
WO2022134391A1 PCT/CN2021/087524 CN2021087524W WO2022134391A1 WO 2022134391 A1 WO2022134391 A1 WO 2022134391A1 CN 2021087524 W CN2021087524 W CN 2021087524W WO 2022134391 A1 WO2022134391 A1 WO 2022134391A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
neuron
weight
output
inference
Prior art date
Application number
PCT/CN2021/087524
Other languages
English (en)
French (fr)
Inventor
赵卫
臧大伟
程东
杜炳政
谢小平
张佩珩
谭光明
姚宏鹏
Original Assignee
中国科学院西安光学精密机械研究所
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院西安光学精密机械研究所, 中国科学院计算技术研究所 filed Critical 中国科学院西安光学精密机械研究所
Publication of WO2022134391A1 publication Critical patent/WO2022134391A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present invention relates to artificial neurons and neural networks, in particular to a fusion neuron model, a neural network structure and its inference method and training method, computer-readable storage medium and computer equipment.
  • the so-called neural network is a collection of methods for modeling high-complexity data through multi-layer nonlinear transformation, as the basic unit of artificial neural network, where the artificial neuron model is It contains three basic elements: (1) The weight wi corresponds to a group of connections of biological neuron synapses, and the connection strength is represented by the weight wi on each connection, where the positive weight wi indicates activation, Negative weight wi indicates inhibition; (2) a summation unit, which is used to obtain the weighted summation of multiple input signals; (3) a nonlinear activation function, which introduces nonlinear factors into neurons, so that the neural network can Arbitrarily approximate any nonlinear function and limit the neuron output amplitude to a certain range.
  • the multiplication operation of the synaptic connection weights and the input data, and the addition operation of the summation unit are a linear model. Therefore, after the summation, a nonlinear activation function is required to convert the linear The value calculated by the model is mapped to the nonlinear space, which enhances the description ability of the neural network. Without a nonlinear activation function, a neural network can only perform linear transformations.
  • This artificial neuron and network model based on linear model + nonlinear activation can be easily and quickly calculated in general-purpose digital electronic computers, but it is difficult to achieve in analog computing devices such as optical computing and DNA computing. Due to the versatility of digital electronic computers, both linear multiply-add operations and nonlinear activation operations can be converted into binary Boolean logic forms, and the calculation operations are completed by the logic operation unit inside the CPU. In some high-efficiency analog computing structures, it is very difficult to realize this neuron model.
  • photonic computing devices using light as a carrier require two processes to realize neural network computing based on traditional neuron models: 1
  • the specific method is to decompose the weight matrix of the neural network into two unitary matrices and a diagonal matrix by SVD decomposition, and then use the self-similarity in the light propagation process. Construct the unitary matrix structure and realize the diagonal matrix with the intensity modulator; 2 realize the activation function with the electronic computer.
  • the present invention has a large number of required analog device combinations, is prone to drift due to environmental interference, and is difficult to realize by using analog devices for part of the activation function, requiring the help of an electronic computer.
  • the technical problems of computing speed and energy efficiency of analog computing components are reduced, and a fusion neuron model, neural network structure and its inference method and training method, computer readable storage medium and computer equipment are provided.
  • the present invention provides the following technical solutions:
  • a fusion neuron model which is used to simulate computing devices to realize artificial neurons and network computing, is special in that,
  • the weight of each synaptic connection is a continuously derivable nonlinear function ⁇ ( wi , xi );
  • i is the corresponding level of neurons, which is an integer greater than 1;
  • x i is the input value of the synaptic connection from the previous level i-1 level neuron to the current level i level neuron;
  • w i is the training parameter, obtained through the following steps:
  • step S5 through the gradient descent method, according to the partial derivative of each level of weight, the loss value obtained in step S3 is minimized, and the initialization training parameter w i ′ is updated according to the minimized result;
  • step S2 is specifically:
  • the one-dimensional vector is input to the transfer matrix between the input layer and the hidden layer of the neural network, and the transfer matrix adopts a continuous derivable nonlinear function as an element value to obtain a result matrix;
  • step S2.4 assign the output vector obtained in step S2.3 to the one-dimensional vector formed in step S2.1;
  • the present invention also provides a neural network structure, the neural network is a feedforward network or a feedback network, which is special in that it includes an input layer, a hidden layer and an output layer;
  • Each of the input layer, the hidden layer and the output layer has at least one neuron, and the neuron adopts the above-mentioned fusion neuron model.
  • the present invention also provides a reasoning method based on the above-mentioned neural network structure, which is special in that it includes the following steps:
  • each element in the one-dimensional vector is input to the transfer matrix between the input layer and the hidden layer of the neural network according to the corresponding relationship, and the transfer matrix adopts a continuous differentiable nonlinear function as element value, get a matrix;
  • step S3 adding the matrix row vectors obtained in step S2 in turn to obtain an output vector
  • step S4 assign the output vector obtained in step S3 to the one-dimensional vector formed in step S1;
  • S5 Repeat S2 to S4 until reaching the output layer of the neural network, the inference is over, and the inference output result is obtained.
  • the present invention also provides a training method based on the above-mentioned neural network structure, which is special in that it includes the following steps:
  • the initialization output S2.1 of each synaptic weight is obtained through inference, and batch is selected for format transformation to form a one-dimensional vector;
  • the one-dimensional vector is input to the transfer matrix between the input layer and the hidden layer of the neural network, and the transfer matrix adopts a continuous differentiable nonlinear function as an element value to obtain a matrix;
  • step S2.4 assign the output vector obtained in step S2.3 to the one-dimensional vector formed in step S2.1;
  • step S5 through the gradient descent method, according to the partial derivative of each level of weight, the loss value obtained in step S3 is minimized, and the initialization output of each synaptic weight is updated according to the minimized result;
  • the present invention also provides a computer-readable storage medium on which a computer program is stored, the special feature of which is that when the program is executed by a processor, the steps of the above-mentioned reasoning method or the steps of the above-mentioned training method can be realized. .
  • the present invention also provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the special feature is that when the processor executes the program, it can The steps of implementing the above-mentioned reasoning method, or the steps of implementing the above-mentioned training method.
  • the present invention fuses the neuron model, fuses the connection weight in the traditional artificial neuron with the activation function, and no longer needs the activation function, which can not only be realized in a digital electronic computer, but is more suitable for some simulations with nonlinear characteristics. It can be realized by computing devices, which can directly realize the cascade connection of devices, avoiding the speed and energy consumption bottleneck introduced by converting analog signals into digital signals and performing activation operations; since activation functions are no longer required, the original activation functions need to be The electronic computer processing process effectively improves the computing speed and energy efficiency of the analog computing components.
  • the neural network structure of the present invention uses the above-mentioned fusion neuron model as a basic unit to form a hierarchical structure, which can realize the application fields of traditional artificial neuron networks such as image recognition, speech processing, and automatic driving. Improve computational efficiency.
  • the reasoning method of the neural network structure of the present invention firstly substitute the input data of the connection into the nonlinear weight function of the connection, calculate the weighted result of the connection, then sum all the weighted results of the neuron, and directly transmit it to the next A layer of neurons is passed forward in sequence, and finally the recognition result is obtained. It is no longer a vector-matrix multiplication and addition operation.
  • the inference result is more efficient and accurate, and it can also be embedded in existing training methods.
  • the training method of the neural network structure of the present invention optimizes the parameters of the neuron model through the back-propagation algorithm and the gradient descent algorithm, wherein, the gradient of the parameter is obtained by taking the partial derivative of the weight function, and the aforementioned reasoning is integrated in the training. process.
  • the computer-readable storage medium and computer device of the present invention can implement the reasoning method and training method of the present invention, execute specific steps in a program, and implement corresponding methods in applications, which is convenient for popularization and application.
  • Fig. 1 is a schematic diagram of a traditional artificial neuron model
  • Fig. 2 is the schematic diagram of fusion neuron model of the present invention
  • FIG. 3 is a schematic diagram of an embodiment of a neural network structure of the present invention.
  • FIG. 5 is a schematic flowchart of the neural network structure training method of the present invention.
  • this application proposes a novel artificial neuron model and network structure in which a linear model and an activation function are integrated.
  • the model expresses the connection weight of the synapse between neurons with a nonlinear function, thereby forming a weight matrix in a nonlinear space,
  • the nonlinear operations required by artificial neurons are directly implemented in the weight matrix, so that the nonlinear characteristics of analog computing devices such as optical devices can be directly used to realize the function of neural network.
  • the present invention proposes a non-activated The nonlinear weight neuron of the function and the corresponding network calculation model, so that the nonlinear effect of the analog signal such as light and electricity can be used to directly realize the neural network structure. Advantages in speed and energy efficiency.
  • the traditional neural network is a neuron structure using a linear model + nonlinear activation.
  • the linear model refers to the linear weighted sum of the output of the model as the input, assuming that the output y of a model and the input xi satisfy relationship, the model is a linear model, in which both w i and b belong to the real number domain. It is called a linear model because when the model has only one input, x i and y form a straight line in a two-dimensional coordinate system. Similarly, when the model has n inputs, the vector x i and the vector y form a A plane in n+1-dimensional space.
  • nonlinear activation refers to mapping the calculation results of the linear model to the nonlinear space, thereby enhancing the ability of the neural network to fit nonlinear functions. If the output of each neuron is passed through a nonlinear function, then the entire neural network model is no longer linear.
  • the nonlinear function can be a commonly used activation function such as Sigmod, Relu, Tanh, etc., or it can be any
  • the image of the function on the two-dimensional plane is a continuous differentiable function of a curve or a polyline.
  • the present invention proposes an artificial neuron and a corresponding network structure in which the synaptic connection weight and activation function are fused, mainly including four main points: first, the synaptic connection weight is a nonlinear function; second, the neuron And the network structure has no activation function; third, the inference operation based on this model is no longer a vector-matrix multiply-add operation; fourth, the inference model can be embedded into existing training methods.
  • the weight of synaptic connection is a nonlinear function, specifically, the weight of synaptic connection between neurons is a nonlinear function ⁇ ( wi , xi ), and
  • the input xi of the synapse acts on the weight of the synaptic connection, it is a nonlinear output, that is, for the input xi and the corresponding output yi , it is a curve in the two-dimensional coordinate system, where i is The variables, the neuron corresponding techniques, correspond to each level of the neuron model.
  • It is a neuron calculation model that integrates synaptic weights and activation functions.
  • the neurons of this model do not have an activation function f. After the input of a neuron is weighted with a nonlinear weight function, it is directly output as the result to the connected in the next neuron.
  • the present invention also proposes a neural network structure based on a fusion model, which takes the aforementioned fusion neuron model as a basic unit, including an input layer, a hidden layer and an output layer, wherein the input layer, the hidden layer and the output layer are among the Each layer has at least one neuron, and the neuron adopts the aforementioned fusion neuron model, which is a single-layer or multi-layer neural network structure formed according to certain rules.
  • the connection relationship is configured according to the needs of the task.
  • FIG. 1 is a schematic diagram of the traditional artificial neuron model.
  • the weight w i of each synaptic connection in the traditional artificial neuron model is a real number, which is multiplied with the input data x i and then accumulated, and the accumulated result is sent to the activation In the function f, the corresponding output is obtained.
  • Fig. 2 is the fusion neuron model in this embodiment, the weight of each synaptic connection is a continuously derivable nonlinear function ⁇ ( wi , xi ), and the nonlinear function has two parameters w respectively i and xi , where the parameter wi will only be adjusted during the training process , but will remain fixed during the inference process and will not be changed .
  • the value of the neuron input to the synaptic connection at the same time, after the synaptic connection is summed, the model directly outputs to the next-level neuron, without an activation function in the middle.
  • Figure 3 is an embodiment of the neural network structure based on the fusion neuron model of the present invention.
  • the neural network structure is composed of three layers: input layer, hidden layer and output layer. Each layer has three neurons. A fully-connected structure is adopted between them.
  • a neural network structure of any level and any connection relationship can be constructed based on the fusion neuron model of the present invention, which can be a feedforward network or a feedback network.
  • the neural network structure When the above-mentioned neural network structure performs inference operations, when a group of data (such as images, speech, text, etc.) enters the neurons of the input layer, the data is first input into the nonlinear weight matrix, and then the rows in the result matrix are entered. The vectors are added in turn to generate the calculation result of this layer, and used as the input of the next layer, which is passed forward in turn, and finally the recognition result is obtained.
  • the values of the parameters in the neural network are adjusted through the backpropagation algorithm and the gradient descent algorithm.
  • the gradient descent algorithm is mainly used to optimize the value of a single parameter, and the backpropagation algorithm provides an efficient way to The gradient descent algorithm is used on the parameters of , so that the loss function of the neural network model on the training data is as small as possible, and the gradient of the parameters is obtained by taking the partial derivative of the nonlinear weight function.
  • the transfer matrix T between the input layer and the hidden layer is:
  • the connection weights between the first neuron of the input layer and the first neuron of the hidden layer is ⁇ (w 11 , x 1 ), where w 11 represents the weight of the connection between the first element of the previous layer and the first element of the next layer.
  • the inference operation is performed on the neural network, and the specific steps are:
  • Information such as input pictures, voices, etc. can be recognized based on the above-mentioned reasoning method.
  • the neural network is trained, and the specific steps are:
  • the weight of each synaptic connection is a continuously derivable nonlinear function ⁇ ( wi , xi ), and wi is a training parameter, obtained through the training steps of the neural network structure of the present invention:
  • step S5 through the gradient descent method, according to the partial derivative of each level of weight, the loss value obtained in step S3 is minimized, and the initialization training parameter w i ′ is updated according to the minimized result;
  • the reasoning in step S2 is obtained by using the neural network structure reasoning method of the present invention.
  • the present invention also provides a computer-readable storage medium and a computer device, wherein the computer-readable storage medium stores a computer program, and when the program is executed by a processor, the steps of the above-mentioned reasoning method or the steps of the training method can be implemented .
  • the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above-mentioned reasoning method or the steps of the training method when the processor executes the computer program. It should be noted that the reasoning method and training method of the present invention can be implemented not only by an electronic computer such as a computer device, but also by some analog computing devices with nonlinear characteristics.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及人工神经元及神经网络,具体涉及一种融合神经元模型、神经网络结构及其推理方法和训练方法、计算机可读存储介质及计算机设备,融合神经元模型的每个突触连接权值是任意一个连续可导的非线性的函数,在突触权值上实现线性到非线性的映射,神经网络结构以融合神经元模型作为基本组成单位,构成层次化结构,推理方法是将输入数据代入连接的非线性权值函数中,计算出连接加权结果,再将该神经元所有的加权结果求和,直接传递到下一级神经元,依次前向传递,最后得到识别结果,训练方法是通过反向传播算法和梯度下降算法来优化神经元模型的参数,计算机可读存储介质及计算机设备能够实现推理方法和训练方法的具体步骤。

Description

融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备 技术领域
本发明涉及人工神经元及神经网络,具体涉及一种融合神经元模型、神经网络结构及其推理方法和训练方法、计算机可读存储介质及计算机设备。
背景技术
在新科技革命浪潮的推动下,智能化成为未来社会形态演进的必然趋势,人工智能技术在信息化时代中发挥着越来越重要的作用。以人工神经网络为核心的数据处理技术成为现今人工智能的主流方法,它以一种模拟人脑的机制解释数据,并通过组合低层特征形成更加抽象的高层属性。当前,人工神经网络技术在模式识别、图像处理、智能控制、组合优化、金融预测、通信、机器人以及专家系统等领域得到广泛的应用,发挥了基础性的作用,并创造了巨大的经济价值。
人工神经网络是在现代神经科学的基础上提出和发展起来的,是一种反映人脑结构和功能的抽象数学模型。自1943年美国心理学家W.McCulloch和数学家W.Pitts提出形式神经元的抽象数学模型—MP模型以来,人工神经网络模型经过了50多年曲折的发展,相关的理论和方法已经发展成一门界于物理学、数学、计算机科学和神经生物学的交叉学科。所谓的神经网络是一类通过多层非线性变换对高复杂性数据建模的方法的合集,作为人工神经网络的基本组成单元,其中,人工神经元模型为
Figure PCTCN2021087524-appb-000001
包含三个基本要素:(1)权值w i,对应于生物神经元突触的一组连接,连接强度由各连接上的权值w i表示,其中,权值w i为正表示激活,权值w i为负表示抑制;(2)求和单元,用于求取多个输入信号的加权求和;(3)非线性激活函数,向神经元引入了非线性因素,使得神经网络可以任意逼近任何非线性函数,并将神经元输出幅度限制在一定范围。在执行神经网络推理或者训练计算时,突触连接的权值与输入数据的相乘运算、求和单元的相加运算是一种线性模型,因此,求和之后需要一个非线性激活函数将线性模型计算的值映射到非线性 空间,增强神经网络的描述能力。如果没有非线性激活函数,则神经网络只能进行线性变换。
这种基于线性模型+非线性激活的人工神经元及网络模型,在通用数字电子计算机中可以容易且快速的进行计算,但是,在光计算、DNA计算等模拟计算器件中却很难实现。由于数字电子计算机的通用性,对于线性乘加操作和非线性激活操作都可以转换为二进制的布尔逻辑形式,由CPU内部的逻辑运算单元完成计算操作。而在某些高能效的模拟计算结构中,实现该神经元模型却非常困难,例如,以光为载体的光子计算器件,在实现基于传统神经元模型的神经网络计算时需要两个过程:①利用光的非线性特性构建线性向量矩阵乘加结构,具体方法为将神经网络的权值矩阵采用SVD分解的方法分解成两个酉矩阵和一个对角阵,然后利用光传播过程中的自相似作用构造酉矩阵结构,并用强度调制器实现对角阵;②用电子计算机实现激活函数。
采用模拟计算器件实现传统人工神经元及网络计算模型时主要存在两个问题:首先,利用模拟信号的非线性特性将多个非线性模拟信号叠加成为线性信号,从而基于模拟器件实现线性向量矩阵乘加,不仅需要大量器件的组合,而且较容易受到环境的干扰产生漂移;其次,模型中常用的Sigmod、Relu等激活函数很难用模拟器件实现,因此,需要用电子计算机处理该类操作,降低了模拟计算组件的计算速度和能效。
发明内容
本发明为解决目前采用模拟计算器件实现传统人工神经元及网络计算模型时,存在所需模拟器件组合数量大,容易受到环境干扰产生漂移,以及部分激活函数难以采用模拟器件实现,需要借助电子计算机处理,降低了模拟计算组件的计算速度和能效的技术问题,提供一种融合神经元模型、神经网络结构及其推理方法和训练方法、计算机可读存储介质及计算机设备。
为实现上述目的,本发明提供如下技术方案:
一种融合神经元模型,用于模拟计算器件实现人工神经元及网络计算,其特殊之处在于,
每个突触连接的权值为连续可导的非线性函数φ(w i,x i);
其中,i为神经元相应级数,为大于1的整数;x i为前一级i-1级神经元输 入至当前级i级神经元突触连接的输入值;
w i为训练参数,通过以下步骤获得:
S1,为每一个突触的权值随机赋值,作为初始化训练参数w i′;
S2,选取一个batch,代入神经元模型进行推理,得到推理结果;
S3,依据推理结果,根据损失函数,计算相应损失值;
S4,通过反向传播算法,根据权值非线性激活函数计算每一级权值的偏导数;
S5,通过梯度下降法,根据每一级权值的偏导数,将步骤S3得到的损失值最小化,并根据该最小化结果更新初始化训练参数w i′;
S6,重复执行步骤S2至步骤S5,直至所有batch均被代入神经元模型进行推理,最后一个batch执行步骤S2至步骤S5后,得到的更新后的初始化训练参数,即为w i
进一步地,步骤S2具体为:
S2.1,选取一个batch,对该batch进行格式变换,形成一个一维向量;
S2.2,将所述一维向量输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导的非线性函数作为元素值,得到一个结果矩阵;
S2.3,将步骤S2.2得到的结果矩阵行向量依次相加,得到一个输出向量;
S2.4,将步骤S2.3得到的输出向量赋值给步骤S2.1形成的一维向量;
S2.5,重复执行S2.2至步骤S2.4,直至到达神经网络的输出层,推理结束,得到推理结果。
本发明还提供了一种神经网络结构,所述神经网络为前馈网络或反馈网络,其特殊之处在于,包括输入层、隐藏层和输出层;
所述输入层、隐藏层和输出层中每个层次均有至少一个神经元,所述神经元采用上述的融合神经元模型。
另外,本发明还提供了一种基于上述神经网络结构的推理方法,其特殊之处在于,包括以下步骤:
S1,对输入的batch进行格式变换,形成一个一维向量,输入至模拟计算器件;
S2,通过模拟计算器件,将所述一维向量中的每一个元素,按照对应关 系输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导非线性函数作为元素值,得到一个矩阵;
S3,将步骤S2得到的矩阵行向量依次相加,得到一个输出向量;
S4,将步骤S3得到的输出向量赋值给步骤S1形成的一维向量;
S5,重复执行S2至步骤S4,直至到达神经网络的输出层,推理结束,得到推理输出结果。
再者,本发明还提供了一种基于上述神经网络结构的训练方法,其特殊之处在于,包括以下步骤:
S1,为神经网络结构中神经元的每一个突触连接的权值随机赋值;
S2,通过推理得到每一个突触权值的初始化输出S2.1,选取batch进行格式变换,形成一个一维向量;
S2.2,将所述一维向量输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导非线性函数作为元素值,得到一个矩阵;
S2.3,将步骤S2.2得到的矩阵行向量依次相加,得到一个输出向量;
S2.4,将步骤S2.3得到的输出向量赋值给步骤S2.1形成的一维向量;
S2.5,重复执行S2.2至步骤S2.4,直至到达神经网络的输出层,推理结束,得到每一个突触权值的初始化输出;
S3,根据每一个突触权值的初始化输出,通过损失函数,计算相应损失值;
S4,通过反向传播算法,根据权值非线性激活函数计算每一级权值的偏导数;
S5,通过梯度下降法,根据每一级权值的偏导数,将步骤S3得到的损失值最小化,并根据该最小化结果更新每一个突触权值的初始化输出;
S6,重复执行步骤S2至步骤S5,直至所有batch均被代入神经元进行推理,最后一个batch执行步骤S2至步骤S5后得到的更新后的每一个突触权值的初始化输出,即为每一个突触权值的最终输出,完成训练;
S7,将每一个突触权值的最终输出代入模拟计算器件,基于神经网络执行推理。
同时,本发明还提供了一种计算机可读存储介质,其上存储有计算机程 序,其特殊之处在于,该程序被处理器执行时可实现上述推理方法的步骤,或实现上述训练方法的步骤。
相应地,本发明还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特殊之处在于,所述处理器执行所述程序时可实现上述推理方法的步骤,或实现上述训练方法的步骤。
与现有技术相比,本发明的有益效果是:
1.本发明融合神经元模型,将传统人工神经元中的连接权值与激活函数相融合,不再需要激活函数,不仅可以在数字电子计算机中实现,更适合于一些具有非线性特性的模拟计算器件来实现,可以直接实现器件的级连,避免了将模拟信号转换为数字信号,进行激活操作所引入的速度和能耗瓶颈;由于不再需要激活函数,规避了原有激活函数需要借助电子计算机处理的流程,有效提高了模拟计算组件的计算速度和能效。
2.本发明的神经网络结构,以上述融合神经元模型作为基本组成单位,构成层次化结构,可以实现图像识别、语音处理、自动驾驶等传统人工神经元网络所应用的领域,同时,还能够提高计算效率。
3.本发明神经网络结构的推理方法,首先将该连接的输入数据代入连接的非线性权值函数中,计算出连接加权结果,再将该神经元所有的加权结果求和,直接传递到下一层神经元,依次前向传递,最后得到识别结果,不再是向量矩阵乘加运算,推理结果更加高效准确,还可以嵌入到现有训练方法中。
4.本发明神经网络结构的训练方法,通过反向传播算法和梯度下降算法来优化神经元模型的参数,其中,参数的梯度通过对权值函数求偏导获得,训练中融合了前述的推理过程。
5.本发明的计算机可读存储介质和计算机设备,能够执行本发明的推理方法和训练方法,以程序的方式执行具体步骤,能够在应用中实现相应方法,便于推广应用。
附图说明
图1为传统人工神经元模型示意图;
图2为本发明融合神经元模型的示意图;
图3为本发明神经网络结构实施例的示意图;
图4为本发明神经网络结构推理方法流程示意图;
图5为本发明神经网络结构训练方法流程示意图。
具体实施方式
下面将结合本发明的实施例和附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例并非对本发明的限制。
本发明的发明构思如下:
如何设计新的人工神经元及网络计算模型,以适应高能效的模拟计算器件的特性,是本专利所要解决的核心问题。
在进行新型模拟计算器件和人工神经元及网络计算模型协同研究时发现,线性模型+非线性激活的神经元及网络计算模型与模拟计算器件的物理特性不匹配,是产生所需模拟器件组合数量大,容易受到环境干扰产生漂移,以及部分激活函数难以采用模拟器件实现,需要借助电子计算机处理,降低了模拟计算组件的计算速度和能效等技术问题的根本原因。因此本申请提出一种线性模型与激活函数融合的新型人工神经元模型及网络结构,该模型将神经元间突触的连接权重用非线性函数表示,从而形成非线性空间中的权值矩阵,将人工神经元需要的非线性操作直接在权值矩阵中实现,从而可以直接利用光器件等模拟计算器件的非线性特性实现神经网络的功能。
针对传统线性模型+非线性激活的神经元及网络计算模型,与模拟计算器件的物理特性不匹配,而造成的规模问题、稳定性问题和功耗速度等问题,本发明提出了一种没有激活函数的非线性权值神经元及相应的网络计算模型,从而可以利用光、电等模拟信号的非线性效应直接实现神经网络结构,不仅可以减小器件的体积,而且可以充分发挥出模拟信号在速度和能效方面的优势。
传统神经网络是采用线性模型+非线性激活的神经元结构。其中,线性模型是指模型的输出为输入的线性加权和,假设一个模型的输出y和输入x i满足
Figure PCTCN2021087524-appb-000002
关系,则这个模型就是一个线性模型,其中,w i和b都属于实数域。被称为线性模型是因为当模型的输入只有一个的时候,x i和y形成了二维坐标系中的一条直线,类似的,当模型有n个输入时,向量x i和向量y形成了 n+1维空间的一个平面。一个线性模型中,通过输入得到输出的函数称之为一个线性变换,其最大的特点是任意线性模型的组合仍然还是线性模型。另外,非线性激活是指将线性模型的计算结果映射到非线性空间,从而增强神经网络拟合非线性函数的能力。如果将每一个神经元的输出通过一个非线性函数,那么整个神经网络模型也就不再是线性的,该非线性函数可以是Sigmod、Relu、Tanh等常用的激活函数,也可以是任意的在二维平面上函数图像为曲线或折线的连续可导函数。
而本发明提出了一种突触连接权值与激活函数融合的人工神经元及相应网络结构,主要包括四个要点:第一,突触连接权值是一个非线性函数;第二,神经元及网络结构没有激活函数;第三,基于该模型的推理操作不再是向量矩阵乘加运算;第四,推理模型可以嵌入到现有训练方法中。
本发明提出的一种融合神经元模型,突触连接权值是一个非线性函数,具体是指神经元间突触连接的权值是一个非线性的函数φ(w i,x i),而该突触的输入x i作用到该突触连接权值上时,是非线性的输出,即对于输入x i和相应的输出y i,在二维坐标系中是一条曲线,此处,i为变量,为神经元相应技术,与神经元模型的每一级相对应。是一种突触权值与激活函数融合的神经元计算模型,该模型的神经元没有激活函数f,一个神经元的输入与非线性权值函数加权和后,直接作为结果输出到所连接的下一个神经元中。
同时,本发明还提出了一种基于融合模型的神经网络结构,以前述的融合神经元模型为基本组成单位,包括输入层、隐藏层和输出层,其中,输入层、隐藏层和输出层中每个层次均有至少一个神经元,神经元采用前述的融合神经元模型,是按照一定的规则构成的单层或者多层神经网络结构,每一层神经网络中神经元的数量、层次之间的连接关系根据任务的需要进行配置。
如下以一个实施例为例具体说明本发明融合神经元模型及神经网络结构:
如图1,是传统人工神经元模型示意图,传统人工神经元模型每个突触连接的权值w i是一个实数,与输入的数据x i作乘法运算后再累加,将累加结果送入激活函数f中,得到相应输出。图2是本实施例中的融合神经元模型,每个突触连接的权值是一个连续可导的非线性函数φ(w i,x i),该非线性函数有两个参数分别为w i和x i,其中,参数w i仅在训练过程中会进行调整,而在推理过 程中保持固定,不再改变,参数w i的获取方式后续将详细介绍,参数x i是上一级神经元输入到该突触连接的值,同时,该模型在突触连接求和之后,直接输出给下一级神经元,中间没有激活函数。
如图3,是本发明的基于融合神经元模型的神经网络结构的一个实施例,该神经网络结构由输入层、隐藏层和输出层三个层次组成,每个层次有三个神经元,层次之间采用全连接的结构,实际应用时,基于本发明的融合神经元模型可以构建任意层次、任意连接关系的神经网络结构,可以是前馈网络,也可以是反馈网络等。
上述的神经网络结构在进行推理操作时,当一组数据(如图像、语音、文本等)进入到输入层神经元时,数据首先输入到非线性权值矩阵中,然后将结果矩阵中的行向量依次相加生成该层的计算结果,并作为下一层的输入,依次前向传递,最后得到识别结果。在进行训练操作时,通过反向传播算法和梯度下降算法调整神经网络中参数的取值,梯度下降算法主要用于优化单个参数的取值,而反向传播算法给出一个高效的方式在所有的参数上使用梯度下降算法,从而使神经网络模型在训练数据上的损失函数尽可能的小,参数的梯度通过对非线性权值函数求偏导的方式获得。
如下以实施例为例对推理和训练的具体方法进行说明:
前述图3中所示神经网络结构实施例中,输入层和隐藏层之间的传递矩阵T为:
Figure PCTCN2021087524-appb-000003
该传递矩阵中共有9个元素,分别表示输入层网络与隐藏层网络的突触连接的权值,例如对于输入层的第一个神经元与隐藏层第一个神经元之间的连接权值为φ(w 11,x 1),其中w 11表示上一层的第一个元素和下一层第一个元素之间连接的权值,对于传递矩阵,如果两个神经元之间没有连接,那么该位置的元素值为0。
如图4,对该神经网络进行推理操作,具体步骤为:
(1)对输入的图片、语音等信息进行格式变换,形成一个一维向量 V=[x 1 x 2 x 3],并将该一维向量输入到模拟计算器件,通过模拟计算器件输入至输入层神经元中,对于两个相邻的连接层次,将向量V输入到输入层和隐藏层之间的传递矩阵T中,获得结果矩阵T′;
(2)将结果矩阵T′中的行向量依次相加,获得一个输出向量V′;
(3)将输出向量V′的值赋值给一维向量V;
(4)判断是否已经到达输出层,如果到达输出层则输出向量V′就是推理的计算结果,否则将当前的输出向量V′输入到输入层和隐藏层之间的传递矩阵T中,再次获得一个结果矩阵,再重复执行(2)到(4),直到到达输出层,完成推理,输出推理结果,结束推理。
能够基于上述推理方法对输入的图片、语音等信息进行识别。
如图5,对该神经网络进行训练操作,具体步骤为:
(1)对网络参数(权值)进行初始化,可以采用随机的方法使每个网络参数在0附近随机取值,实际应用中,随机取值也可以不在0附近,可进行随机取值;
(2)从训练数据中选取一个batch,其大小可以根据需要动态的改变;
(3)调用该神经网络的推理操作,进行推理操作,取得输出;
(4)根据损失函数,计算损失值;
(5)执行反向传播算法,计算所有网络参数的偏导数;
(6)使用梯度下降法,或者其它算法与反向传播算法相结合,将通过损失函数计算出的损失值最小化,并根据最小化结果更新所有相关的网络参数;
(7)判断是否还有batch需要注入模型参与训练,如果有则跳转到步骤(2),重复执行步骤(2)至步骤(7),直至所有batch均已注入模型参与训练;
(8)训练结束,输出训练出的网络参数,代入神经网络结构中,用于后续训练
本发明中融合神经元模型,每个突触连接的权值为连续可导的非线性函数φ(w i,x i),w i为训练参数,通过本发明神经网络结构的训练步骤获得:
S1,为每一级的权值随机赋值,得到初始化训练参数w i′;
S2,选取一个batch,代入神经元模型进行推理;
S3,根据初始化训练参数w i,通过损失函数,计算相应损失值;
S4,通过反向传播算法,计算每一级权值的偏导数;
S5,通过梯度下降法,根据每一级权值的偏导数,将步骤S3得到的损失值最小化,并根据该最小化结果更新初始化训练参数w i′;
S6,重复执行步骤S2至步骤S5,直至所有batch均被代入神经元模型进行推理,最后一个batch执行步骤S2至步骤S5后得到的更新后的初始化训练参数w i′,即为w i
其中,步骤S2中的推理采用本发明的神经网络结构推理方法得到。
另外,本发明还提出了一种计算机可读存储介质和计算机设备,其中,计算机可读存储介质上存储有计算机程序,该程序被处理器执行时能够实现上述推理方法的步骤或训练方法的步骤。计算机设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行该计算机程序时实现上述推理方法的步骤或训练方法的步骤。需要说明的是,本发明的推理方法和训练方法,除了能够通过计算机设备这种电子计算机实现,还适合于一些具有非线性特性的模拟计算器件来实现。
以上所述仅为本发明的实施例,并非对本发明保护范围的限制,凡是利用本发明说明书及附图内容所作的等效结构变换,或直接或间接运用在其他相关的技术领域,均包括在本发明的专利保护范围内。

Claims (9)

  1. 一种融合神经元模型,用于模拟计算器件实现人工神经元及网络计算,其特征在于:
    每个突触连接的权值为连续可导的非线性函数φ(w i,x i);
    其中,i为神经元相应级数,为大于1的整数;x i为前一级i-1级神经元输入至当前级i级神经元突触连接的输入值;
    w i为训练参数,通过以下步骤获得:
    S1,为每一个突触的权值随机赋值,作为初始化训练参数w i′;
    S2,选取一个batch,代入神经元模型进行推理,得到推理结果;
    S3,依据推理结果,根据损失函数,计算相应损失值;
    S4,通过反向传播算法,根据权值非线性激活函数计算每一级权值的偏导数;
    S5,通过梯度下降法,根据每一级权值的偏导数,将步骤S3得到的损失值最小化,并根据该最小化结果更新初始化训练参数w i′;
    S6,重复执行步骤S2至步骤S5,直至所有batch均被代入神经元模型进行推理,最后一个batch执行步骤S2至步骤S5后,得到的更新后的初始化训练参数,即为w i
  2. 如权利要求1所述一种融合神经元模型,其特征在于:所述步骤S2具体为:
    S2.1,选取一个batch,对该batch进行格式变换,形成一个一维向量;
    S2.2,将所述一维向量输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导的非线性函数作为元素值,得到一个结果矩阵;
    S2.3,将步骤S2.2得到的结果矩阵行向量依次相加,得到一个输出向量;
    S2.4,将步骤S2.3得到的输出向量赋值给步骤S2.1形成的一维向量;
    S2.5,重复执行S2.2至步骤S2.4,直至到达神经网络的输出层,推理结束,得到推理结果。
  3. 一种神经网络结构,所述神经网络为前馈网络或反馈网络,其特征在于:包括输入层、隐藏层和输出层;
    所述输入层、隐藏层和输出层中每个层次均有至少一个神经元,所述神经元采用权利要求1或2所述的融合神经元模型。
  4. 一种基于权利要求3所述神经网络结构的推理方法,其特征在于,包括以下步骤:
    S1,对输入的batch进行格式变换,形成一个一维向量,输入至模拟计算器件;
    S2,通过模拟计算器件,将所述一维向量中的每一个元素,按照对应关系输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导非线性函数作为元素值,得到一个矩阵;
    S3,将步骤S2得到的矩阵行向量依次相加,得到一个输出向量;
    S4,将步骤S3得到的输出向量赋值给步骤S1形成的一维向量;
    S5,重复执行S2至步骤S4,直至到达神经网络的输出层,推理结束,得到推理输出结果。
  5. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于:该程序被处理器执行时实现权利要求4所述方法的步骤。
  6. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于:所述处理器执行所述程序时实现权利要求4所述方法的步骤。
  7. 一种基于权利要求3所述神经网络结构的训练方法,其特征在于,包括以下步骤:
    S1,为神经网络结构中神经元的每一个突触连接的权值随机赋值;
    S2,通过推理得到每一个突触权值的初始化输出
    S2.1,选取batch进行格式变换,形成一个一维向量;
    S2.2,将所述一维向量输入至神经网络的输入层和隐藏层之间的传递矩阵,所述传递矩阵采用连续可导非线性函数作为元素值,得到一个矩阵;
    S2.3,将步骤S2.2得到的矩阵行向量依次相加,得到一个输出向量;
    S2.4,将步骤S2.3得到的输出向量赋值给步骤S2.1形成的一维向量;
    S2.5,重复执行S2.2至步骤S2.4,直至到达神经网络的输出层,推理结 束,得到每一个突触权值的初始化输出;
    S3,根据每一个突触权值的初始化输出,通过损失函数,计算相应损失值;
    S4,通过反向传播算法,根据权值非线性激活函数计算每一级权值的偏导数;
    S5,通过梯度下降法,根据每一级权值的偏导数,将步骤S3得到的损失值最小化,并根据该最小化结果更新每一个突触权值的初始化输出;
    S6,重复执行步骤S2至步骤S5,直至所有batch均被代入神经元进行推理,最后一个batch执行步骤S2至步骤S5后得到的更新后的每一个突触权值的初始化输出,即为每一个突触权值的最终输出;
    S7,将每一个突触权值的最终输出代入模拟计算器件,基于神经网络执行推理。
  8. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于:该程序被处理器执行时实现权利要求7所述方法的步骤。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于:所述处理器执行所述程序时实现权利要求7所述方法的步骤。
PCT/CN2021/087524 2020-12-25 2021-04-15 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备 WO2022134391A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011562331.6 2020-12-25
CN202011562331.6A CN112529166A (zh) 2020-12-25 2020-12-25 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备

Publications (1)

Publication Number Publication Date
WO2022134391A1 true WO2022134391A1 (zh) 2022-06-30

Family

ID=74976450

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087524 WO2022134391A1 (zh) 2020-12-25 2021-04-15 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备

Country Status (2)

Country Link
CN (1) CN112529166A (zh)
WO (1) WO2022134391A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416253A (zh) * 2023-06-12 2023-07-11 北京科技大学 一种基于亮暗通道先验景深估计的神经元提取方法及装置
CN116523120A (zh) * 2023-04-14 2023-08-01 成都飞机工业(集团)有限责任公司 一种作战系统健康状态预测方法
CN117057407A (zh) * 2023-08-21 2023-11-14 浙江大学 一种面向有串扰的波分复用光学神经网络的训练方法
CN117236137A (zh) * 2023-11-01 2023-12-15 龙建路桥股份有限公司 一种高寒区深长隧道冬季连续施工控制系统
CN117686447A (zh) * 2024-01-31 2024-03-12 北京英视睿达科技股份有限公司 基于多通道模型的水质监测方法、装置、设备及介质
CN117933499A (zh) * 2024-03-22 2024-04-26 中国铁建电气化局集团有限公司 高速铁路接触网的入侵风险预测方法、装置和存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529166A (zh) * 2020-12-25 2021-03-19 中国科学院西安光学精密机械研究所 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备
CN113159290B (zh) * 2021-04-26 2022-08-09 青岛本原微电子有限公司 一种神经网络模型网络推理的优化方法
CN112988082B (zh) * 2021-05-18 2021-08-03 南京优存科技有限公司 基于nvm进行ai计算的芯片系统及其运行方法
CN113298246B (zh) * 2021-05-27 2023-02-28 山东云海国创云计算装备产业创新中心有限公司 数据处理方法、装置及计算机可读存储介质
CN113780552B (zh) * 2021-09-09 2024-03-22 浙江数秦科技有限公司 一种双向隐私保护的安全多方计算方法
CN113743595B (zh) * 2021-10-09 2023-08-15 福州大学 基于物理驱动自编码器神经网络的结构参数识别方法
CN116749487B (zh) * 2023-07-17 2024-01-19 宇盛电气有限公司 一种多层共挤机头加温控制管路、系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1464477A (zh) * 2002-06-18 2003-12-31 中国科学院半导体研究所 多权值突触的神经元构造方法
CN109376855A (zh) * 2018-12-14 2019-02-22 中国科学院计算技术研究所 一种光神经元结构和包含该结构的神经网络处理系统
CN110472733A (zh) * 2019-07-22 2019-11-19 天津大学 一种基于神经形态学的在体神经元建模方法
CN112529166A (zh) * 2020-12-25 2021-03-19 中国科学院西安光学精密机械研究所 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1464477A (zh) * 2002-06-18 2003-12-31 中国科学院半导体研究所 多权值突触的神经元构造方法
CN109376855A (zh) * 2018-12-14 2019-02-22 中国科学院计算技术研究所 一种光神经元结构和包含该结构的神经网络处理系统
CN110472733A (zh) * 2019-07-22 2019-11-19 天津大学 一种基于神经形态学的在体神经元建模方法
CN112529166A (zh) * 2020-12-25 2021-03-19 中国科学院西安光学精密机械研究所 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523120A (zh) * 2023-04-14 2023-08-01 成都飞机工业(集团)有限责任公司 一种作战系统健康状态预测方法
CN116416253A (zh) * 2023-06-12 2023-07-11 北京科技大学 一种基于亮暗通道先验景深估计的神经元提取方法及装置
CN116416253B (zh) * 2023-06-12 2023-08-29 北京科技大学 一种基于亮暗通道先验景深估计的神经元提取方法及装置
CN117057407A (zh) * 2023-08-21 2023-11-14 浙江大学 一种面向有串扰的波分复用光学神经网络的训练方法
CN117236137A (zh) * 2023-11-01 2023-12-15 龙建路桥股份有限公司 一种高寒区深长隧道冬季连续施工控制系统
CN117236137B (zh) * 2023-11-01 2024-02-02 龙建路桥股份有限公司 一种高寒区深长隧道冬季连续施工控制系统
CN117686447A (zh) * 2024-01-31 2024-03-12 北京英视睿达科技股份有限公司 基于多通道模型的水质监测方法、装置、设备及介质
CN117686447B (zh) * 2024-01-31 2024-05-03 北京英视睿达科技股份有限公司 基于多通道模型的水质监测方法、装置、设备及介质
CN117933499A (zh) * 2024-03-22 2024-04-26 中国铁建电气化局集团有限公司 高速铁路接触网的入侵风险预测方法、装置和存储介质

Also Published As

Publication number Publication date
CN112529166A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
WO2022134391A1 (zh) 融合神经元模型、神经网络结构及训练、推理方法、存储介质和设备
Jaafra et al. Reinforcement learning for neural architecture search: A review
CN108805270B (zh) 一种基于存储器的卷积神经网络系统
Castillo et al. Functional networks with applications: a neural-based paradigm
Pearlmutter Gradient calculations for dynamic recurrent neural networks: A survey
Davidson et al. Theory of morphological neural networks
Teow Understanding convolutional neural networks using a minimal model for handwritten digit recognition
He et al. Constructing an associative memory system using spiking neural network
Parhi et al. Brain-inspired computing: Models and architectures
Ranjan et al. A novel and efficient classifier using spiking neural network
CN108009635A (zh) 一种支持增量更新的深度卷积计算模型
CN108320018A (zh) 一种人工神经网络运算的装置及方法
WO2023039681A1 (en) Methods and systems for implicit attention with sub-quadratic complexity in artificial neural networks
Kozlova et al. The use of neural networks for planning the behavior of complex systems
Zhao et al. Deep learning and its development
Dai et al. Fast training and model compression of gated RNNs via singular value decomposition
Tang Image classification based on CNN: models and modules
Harikrishnan et al. Handwritten digit recognition with feed-forward multi-layer perceptron and convolutional neural network architectures
Weitzenfeld et al. A concurrent object-oriented framework for the simulation of neural networks
Nowshin et al. Recent advances in reservoir computing with a focus on electronic reservoirs
Kuang et al. Digital implementation of the spiking neural network and its digit recognition
CN114004353A (zh) 减少光器件数量的光神经网络芯片构建方法及系统
Zhou A method of converting ann to snn for image classification
Mungai et al. A study on merging mechanisms of simple hopfield network models for building associative memory
Zhang et al. A fast evolutionary knowledge transfer search for multiscale deep neural architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908403

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21908403

Country of ref document: EP

Kind code of ref document: A1