WO2021007919A1 - 一种基于自动寻址和递归信息整合的内存网络方法 - Google Patents

一种基于自动寻址和递归信息整合的内存网络方法 Download PDF

Info

Publication number
WO2021007919A1
WO2021007919A1 PCT/CN2019/101806 CN2019101806W WO2021007919A1 WO 2021007919 A1 WO2021007919 A1 WO 2021007919A1 CN 2019101806 W CN2019101806 W CN 2019101806W WO 2021007919 A1 WO2021007919 A1 WO 2021007919A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
information
addressing
automatic addressing
network
Prior art date
Application number
PCT/CN2019/101806
Other languages
English (en)
French (fr)
Inventor
李革
李章恒
钟家兴
黄靖佳
张涛
Original Assignee
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学深圳研究生院 filed Critical 北京大学深圳研究生院
Priority to US17/423,223 priority Critical patent/US20220138525A1/en
Publication of WO2021007919A1 publication Critical patent/WO2021007919A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement

Definitions

  • the invention belongs to the field of deep learning, and relates to a recurrent neural network and a memory neural network, and more specifically to a memory network method based on automatic addressing and recursive information integration.
  • recurrent neural network is a typical neural network used to process time series tasks. Its representative frameworks such as long and short-term memory networks (LSTM) and gated recurrent units (GRU) have good timing Modeling capabilities are applied to sequential tasks in various practical scenarios, such as speech recognition, text reasoning, and video analysis.
  • LSTM long and short-term memory networks
  • GRU gated recurrent units
  • RNN only transmits a hidden state with a limited dimension in the front and back time steps, so the memory capacity of historical information is limited.
  • the gradient can be directly propagated through the memory to a certain time step needed in the past during training, avoiding the accumulation of gradients, so it can alleviate the problems of gradient disappearance and explosion.
  • Historical information can be directly stored in the memory matrix, which greatly enhances the network's ability to remember historical information.
  • the addressing methods that memory reads and writes rely on are content-based addressing and location-based addressing. Such addressing methods consume more memory, and their space complexity is proportional to the size of the entire memory matrix. Due to the complexity of operations, Its speed is also slower.
  • the processing unit that comprehensively calculates the read memory information and the hidden state information from the previous time step simply reuses the calculation steps of the LSTM, resulting in the inability to effectively use the memory information.
  • the present invention provides a memory network based on automatic addressing and recursive information integration frame.
  • the memory network method based on automatic addressing and recursive information integration of the present invention includes the following steps:
  • the two gates calculated in formula (1) It is used to control the inflow of information of h t-1 and r t element by element, which is the meaning of formulas (2) and (3). Then, the information processing method of long short-term memory network (LSTM) is reused for comprehensive calculation, such as As shown in formula (4)(5), finally, the generated new hidden state h t and memory information r t are respectively controlled by the two output gates calculated in formula (4) and output after being connected;
  • LSTM long short-term memory network
  • the automatic addressing method is specifically to connect h t-1 with x t and send it to a fully connected layer to obtain an N-dimensional embedding.
  • the embedding is regarded as an unnormalized memory addressing probability.
  • gumbel-softmax to sample the probability into a one-hot vetor, and read out the D-dimensional entry r t corresponding to the position of the element 1 in the vector in the memory.
  • the writing position written into the memory in step 3) is the position where r t is read out using the one-hot vector in step 1).
  • the automatic addressing method uses only h t-1 and x t for memory addressing, and uses the gumbel-softmax function to perform normalization and one-hot sampling on the unnormalized probability vector.
  • the calculation unit for recursive information integration has 3 new doors compared to the 4 doors of LSTM with They are used to control the inflow of h t-1 and r t information, and the direct output of r t information.
  • the method of the present invention is a memory neural network framework based on automatic addressing and recursive information integration. It is an efficient and lightweight memory network method.
  • the memory is read and written through automatic addressing operations with low time and space complexity.
  • the whole framework has the characteristics of high efficiency, rapidity, and strong versatility. It is suitable for various timing processing tasks and exhibits performance beyond traditional LSTM and previous memory networks.
  • Fig. 1 is a flowchart of the memory network method based on automatic addressing and recursive information integration of the present invention.
  • Fig. 2 is the cross-entropy loss reduction curve of the verification set of the present invention on the task of array copy.
  • Fig. 3 is the cross-entropy loss reduction curve of the verification set of the present invention on the task of repeated copying.
  • Fig. 4 is the cross-entropy loss reduction curve of the verification set of the present invention on the task of related recall.
  • Fig. 5 is the cross-entropy loss reduction curve of the verification set of the present invention on the priority ranking task.
  • the present invention provides a memory network method based on automatic addressing and recursive information integration.
  • the method is a memory neural network framework based on automatic addressing and recursive information integration.
  • the method is based on automatic addressing operations with low time and space complexity.
  • the memory is read and written, and the memory information is effectively used through a novel computing unit.
  • the entire framework is highly efficient, fast, and versatile. It is suitable for various time-series processing tasks, and is superior to traditional LSTM and previous Memory network performance.
  • the method of the present invention proposes a new memory network method based on automatic addressing and recursive information integration for time series tasks, that is, a memory loop neural network framework;
  • Figure 1 is a flow chart of the memory network method of the present invention, and the specific implementation The way is as follows.
  • the memory matrix of the memory recurrent neural network framework is an N ⁇ D-dimensional matrix, where N is the number of memory entries, and D is equal to the dimension of the hidden state of the RNN.
  • the automatic addressing method of this memory loop neural network framework is to directly use the hidden state h t-1 transmitted by the RNN between different time steps to encode historical memory addressing information, and to address the memory in combination with the current input x t .
  • the embedding vector is considered to be an unnormalized memory addressing probability and used
  • a gumbel-softmax function proposed by the researcher samples the probability into a one-hot vector, and reads the D-dimensional memory information r t corresponding to the position of the element 1 in the vector.
  • the two gates calculated in formula (1) They are respectively used to control the information flow of h t-1 and r t element by element, that is, the meaning of formulas (2) and (3).
  • the information processing method of LSTM is multiplexed for comprehensive calculation, as shown in formula (4)(5), and finally, the new hidden state h t and memory information r t are generated using the two calculated in formula (4).
  • the output gate outputs after information control and connection.
  • the calculation unit first filters the unnecessary information in the input h t-1 and r t for the time step, and finally uses an output gate to control the information used for output in the r t . In this way, the fault tolerance and flexibility of reading memory information can be greatly increased.
  • the calculation unit of recursive information integration compared with the 4 doors of LSTM, adds 3 new doors with They are used to control the inflow of h t-1 and r t information, and the direct output of r t information.
  • This step is the calculation process that occurs in the block diagram where the words "ARMIN (Auto-addressing and Recurrent Memory Integration Network) cells" in Figure 1 are located.
  • the new hidden state h t generated at this moment is used as the information to be memorized and written into the memory.
  • the written position is the position of r t read using the one-hot vector in operation 1). This step is shown in the part of the process framed by the word "write" in Figure 1.
  • this set of algorithm tasks are divided into: a) Array copy: In the first 50 time steps, 50 randomly generated 6-bit binary numbers are input into the recursive network, and the network is required to output the previous input in the same order in the next 50 time steps Target array.
  • cross-entropy loss to measure the degree of deviation between the actual output array and the target array. The lower the degree of deviation, the lower the cross-entropy loss, indicating that the recursive network uses historical memory information to complete tasks.
  • This memory loop neural network framework is the same as the previous memory network, namely TARDIS (Temporal Automatic Relation Discovery in Sequences), AwTA (ARMIN with TARDIS Addressing) shown in Figures 2 to 5, based on TARDIS addressing. ARMIN), SAM (Sparse Access Memory), DNC (Differentiable Neural Computer), NTM (Neural Turing Machine, Neural Turing Machine) and other frameworks have been compared.
  • the framework exhibits the characteristics of rapid loss of loss on the four tasks, low final convergence loss, and fewer iterations, indicating that it has a higher utilization of training samples. .
  • This shows the advantages of the fast learning speed of the memory addressing mechanism of this framework and the high efficiency of using memory information as mentioned above.
  • the actual running speed of the framework is 3 to 4 times that of the NTM with better performance in the comparison framework.
  • the present invention is suitable for the field of deep learning, and is particularly suitable for recurrent neural networks and memory neural networks.
  • the present invention is based on the memory neural network framework of automatic addressing and recursive information integration, and the memory is controlled by automatic addressing operations with low time and space complexity. Read and write, and make effective use of memory information through a novel computing unit.
  • the entire framework is highly efficient, fast, and versatile. It is suitable for various time-series processing tasks, and it surpasses traditional LSTM and previous memory Network performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于自动寻址和递归信息整合的内存网络方法。该方法基于自动寻址和递归信息整合的内存神经网络框架,是一种高效和轻量级的内存网络方法,通过时间和空间复杂度较低的自动寻址操作对内存进行读写,并通过一种新颖的计算单元对内存信息进行有效利用,整个框架具有高效、快速、通用性强的特点,适用于各种时序处理任务,并表现出超越传统LSTM和之前的内存网络的性能。

Description

一种基于自动寻址和递归信息整合的内存网络方法 技术领域
本发明属于深度学习领域,涉及递归神经网络和内存神经网络,更具体地涉及一种基于自动寻址和递归信息整合的内存网络方法。
背景技术
在深度学习技术中,递归神经网络(RNN)是一种典型的用于处理时序任务的神经网络,其代表性框架如长短期记忆网络(LSTM),门控循环单元(GRU)具有良好的时序建模能力,被应用于各种实际场景下的时序任务,例如语音识别,文本推理和视频分析。
然而,目前典型的递归神经网络都面临以下两个问题:
一、训练时的梯度消失和梯度爆炸问题。在训练的时间步较长的情况下,容易发生梯度的累乘导致的梯度过小(接近于0)或者过大的情况(接近无穷),导致网络的训练无法收敛。
二、RNN在前后时间步只传递一个维度有限的隐状态,因此对历史信息的记忆能力有限。
基于上述两点问题,相关研究借鉴冯诺依曼架构的思想,提出了内存增强的递归神经网络,即将RNN的每个时间步产生的信息显式地存储在一个内存矩阵中,并通过学习可训练的读写方式,在每个时间步对内存进行读写。这种内存机制能够明显解决上述两种RNN面临的问题:
一、梯度在训练时可直接通过内存传播到过去所需的某个时间步,避免了梯度累乘,因此可缓解梯度消失和爆炸问题。
二、历史信息可直接被存储在内存矩阵中,大大增强了网络对历史信息的记忆能力。
然而,此前的内存神经网络存在两点不足:
一、内存读写所依赖的寻址方式为基于内容寻址和基于位置寻址,这样的寻址方式对内存消耗较高,其空间复杂度与整个内存矩阵的尺寸成正比,由于操作复杂,其速度也较慢。
二、对读到的内存信息和上一个时间步传来的隐状态信息进行综合计算的处理单元只是简单地复用LSTM的计算步骤,导致无法有效利用内存信息。
因此,当前的内存神经网络具有速度、内存消耗、内存信息利用效率等方面的问题。
发明的公开
为了克服上述用于增强RNN能力的内存神经网络的不足,在考虑计算复杂度的条件下,进一步改善内存神经网络的压缩性能,本发明提供了一种基于自动寻址和递归信息整合的内存网络框架。
本发明的基于自动寻址和递归信息整合的内存网络方法,包括如下步骤:
1)使用自动寻址对内存矩阵进行读操作,直接利用递归神经网络(RNN)在不同时间步之间传递的隐状态h t-1对历史的内存寻址信息进行编码,结合当前的输入x t对内存进行寻址;
2)使用递归信息整合的计算单元对隐状态h t-1、内存信息r t和输入x t进行综合计算:
Figure PCTCN2019101806-appb-000001
Figure PCTCN2019101806-appb-000002
Figure PCTCN2019101806-appb-000003
Figure PCTCN2019101806-appb-000004
Figure PCTCN2019101806-appb-000005
Figure PCTCN2019101806-appb-000006
公式(1)中计算的两个门
Figure PCTCN2019101806-appb-000007
分别用于逐元素地控制h t-1和r t的信息流入,即公式(2)、(3)的含义,其后复用长短期记忆网络(LSTM)的信息处理方式进行综合计算,如公式(4)(5)所示,最后,将产生的新隐状态h t和内存信息r t分别用公式(4)中计算的两个输出门进行信息控制后并连接后输出;
3)对内存的写操作:
将该时刻产生的新的隐状态h t作为需要记忆的信息,写入内存;
4)进入下一个时间步:
将h t传入下一个时间步,接收输入x t+1,回到步骤1)循环执行上述步骤。
优选的,自动寻址方法具体为将h t-1与x t连接起来,并送入一个全连接层,得到一个N维的嵌入,嵌入被认为是一个未归一化的内存寻址概率,并用gumbel-softmax将该概率采样成一个独热向量(one-hot vetor),并读出内存中对应该矢量中元素为1的位置的D维的条目r t
优选的,步骤3)中写入内存的写入位置为步骤1)中使用独热向量读出r t的位置。
优选的,自动寻址方法仅使用h t-1与x t进行内存寻址,并使用gumbel-softmax函数对未归一化的概率矢量进行归一化和独热采样。
优选的,递归信息整合的计算单元,相比LSTM的4个门,新增了3个门
Figure PCTCN2019101806-appb-000008
Figure PCTCN2019101806-appb-000009
Figure PCTCN2019101806-appb-000010
分别用于控制h t-1和r t的信息流入,以及r t的信息直接输出。
本发明的方法具有如下优点:
本发明方法是基于自动寻址和递归信息整合的内存神经网络框架,是一种高效和轻量级的内存网络方法,通过时间和空间复杂度较低的自动寻址操作对内存进行读写,并通过一种新颖的计算单元对内存信息进行有效利用,整个框架具有高效、快速、通用性强的特点,适用于各种时序处理任务,并表现出超越传统LSTM和之前的内存网络的性能。
附图的简要说明
图1是本发明的基于自动寻址和递归信息整合的内存网络方法的流程框图。
图2是本发明在数组拷贝的任务上的验证集的交叉熵损失下降曲线。
图3是本发明在重复拷贝的任务上的验证集的交叉熵损失下降曲线。
图4是本发明在相关召回的任务上的验证集的交叉熵损失下降曲线。
图5是本发明在优先级排序的任务上的验证集的交叉熵损失下降曲线。
实现本发明的最佳方式
下面结合附图,通过实施例进一步描述本发明,但不以任何方式限制本发明的范围。
本发明提供了一种基于自动寻址和递归信息整合的内存网络方法,该方法是基于自动寻址和递归信息整合的内存神经网络框架,通过时间和空间复杂度较低的自动寻址操作对内存进行读写,并通过一种新颖的计算单元对内存信息进行有效利用,整个框架具有高效、快速、通用性强的特点,适用于各种时序处理任务,并表现出超越传统LSTM和之前的内存网络的性能。
本发明的方法针对时序任务提出一种新的基于自动寻址和递归信息整合的内存网络方法,即,一种内存循环神经网络框架;图1是本发明的内存网络方法的流程框图,具体实施方式如下。
内存循环神经网络框架的内存矩阵为1个N×D维的矩阵,其中N为内存的条目数量,D等于RNN隐状态的维度,在处理标准的RNN输入序列时包含如下计算步骤:
1)使用自动寻址对内存矩阵进行读操作:
本内存循环神经网络框架的自动寻址方法即直接利用RNN在不同时间步之间传递的隐状态h t-1对历史的内存寻址信息进行编码,结合当前的输入x t对内存进行寻址。具体为将h t-1与x t连接起来,并送入一个全连接层(FC),得到一个N维的嵌入向量,该嵌入向量被认为是一个未归一化的内存寻址概率,并用研究者提出的一种gumbel-softmax函数将该概率采样成一个独热向量,并读出内存中对应该矢量中元素为1的位置的D维的内存信息r t。该寻址方法由于操作简单,而运行速度较快,且空间复杂度仅为O(d h+d x),相较于之前的内存网络有大幅减小。该步骤如图1中“读”字样框出的部分流程所示。
2)使用递归信息整合的计算单元对隐状态h t-1,内存信息r t,输入x t进行综合计算:
Figure PCTCN2019101806-appb-000011
Figure PCTCN2019101806-appb-000012
Figure PCTCN2019101806-appb-000013
Figure PCTCN2019101806-appb-000014
Figure PCTCN2019101806-appb-000015
Figure PCTCN2019101806-appb-000016
公式(1)中计算的两个门
Figure PCTCN2019101806-appb-000017
分别用于逐元素地控制h t-1和r t的信息流入,即公式(2)、(3)的含义。其后复用LSTM的信息处理方式进行综合计算,如公式(4)(5)所示,最后,将产生的新隐状态h t和内存信息r t分别用公式(4)中计算的两个输出门进行信息控制后并连接后输出。该计算单元相比直接复用LSTM,首先过滤了输入h t-1和r t中对该时间步不需要的信息,并在最后多用了一个输出门控制r t中用于输出的信息。这样,可以大大增加对内存信息读取的容错性和灵活性。
递归信息整合的计算单元,相比LSTM的4个门,新增了3个门
Figure PCTCN2019101806-appb-000018
Figure PCTCN2019101806-appb-000019
分别用于控制h t-1和r t的信息流入,以及r t的信息直接输出。该步骤即图1中“ARMIN(Auto-addressing and Recurrent Memory Integration Network,自动寻址和递归信息整合网络)细胞”字样所在框图内发生的计算过程。
3)对内存的写操作:
将该时刻产生的新的隐状态h t作为需要记忆的信息,写入内存。写入的位置为操作1)中使用独热向量读出r t的位置。该步骤如图1中“写”字样框出的部分流程所示。
4)进入下一个时间步:
将h t传入下一个时间步,接收输入x t+1,回到1)步循环执行上述步骤。如图1中“上个时间步”和“下个时间步”及箭头所示,显示了网络的循环处理过程。
以下结合一套算法任务说明本发明点云属性压缩方法所提供的框架的效果。具体上这套算法任务分为:a)数组拷贝:在前50个时间步将随机生成的50个6 比特二进制数字输入递归网络,并要求网络在后50个时间步按照相同顺序输出之前输入的目标数组,在以下所有实验中,我们使用交叉熵损失测量实际输出数组与目标数组的偏离程度,偏离程度越低,则交叉熵损失越低,说明该递归网络利用历史内存信息完成任务的能力越强;其交叉熵损失下降曲线如图2所示;b)重复拷贝:向递归网络输入长度为1~10的数组,拷贝输出该数组1~10次,其交叉熵损失下降曲线如图3所示;c)相关召回:向递归网络输入2~6个(键,值)对,并随后输入其中一个键,要求输出该键对应的值,交叉熵损失下降曲线如图4所示;d)优先级排序:向递归网络随机输入40个(键,值)对,按照键的优先级降序输出前30个键最高的值,交叉熵损失下降曲线如图5所示。采用输出二进制序列与标准答案的交叉熵作为任务损失评估模型性能,损失越低说明网络性能越好。本内存循环神经网络框架与此前的内存网络,即图2~5中所示的TARDIS(Temporal Automatic Relation Discovery in Sequences,时序自动关系发现网络)、AwTA(ARMIN with TARDIS Addressing,基于TARDIS寻址方式的ARMIN)、SAM(Sparse Access Memory,稀疏访问内存)、DNC(Differentiable Neural Computer,可微分神经计算机)、NTM(Neural Turing Machine,神经图灵机)等框架都做了对比。
从图2至图5中可以看出,本框架在4个任务上都表现出损失下降较快,最终收敛损失较低的特点,所需迭代数较少,说明其对训练样本利用率较高。这表现出如前所述的本框架的内存寻址机制的学习速度快,以及对内存信息利用效率高的优点。此外,该框架的实际运行速度为对比框架中性能较好的NTM的3~4倍。
需要注意的是,公布实施例的目的在于帮助进一步理解本发明,但是本领域的技术人员可以理解:在不脱离本发明及所附权利要求的精神和范围内,各种替换和修改都是可能的。因此,本发明不应局限于实施例所公开的内容,本发明要求保护的范围以权利要求书界定的范围为准。
工业应用性
本发明适用于深度学习领域,特别适用于递归神经网络和内存神经网络,本发明基于自动寻址和递归信息整合的内存神经网络框架,通过时间和空间复杂度较低的自动寻址操作对内存进行读写,并通过一种新颖的计算单元对内存信息进 行有效利用,整个框架具有高效、快速、通用性强的特点,适用于各种时序处理任务,并表现出超越传统LSTM和之前的内存网络的性能。

Claims (5)

  1. 一种基于自动寻址和递归信息整合的内存网络方法,包括如下步骤:
    1)使用自动寻址对内存矩阵进行读操作,直接利用递归神经网络(RNN)在不同时间步之间传递的隐状态h t-1对历史的内存寻址信息进行编码,结合当前的输入x t对内存进行寻址;
    2)使用递归信息整合的计算单元对隐状态ht-1、内存信息rt和输入xt进行综合计算:
    Figure PCTCN2019101806-appb-100001
    Figure PCTCN2019101806-appb-100002
    Figure PCTCN2019101806-appb-100003
    Figure PCTCN2019101806-appb-100004
    Figure PCTCN2019101806-appb-100005
    Figure PCTCN2019101806-appb-100006
    公式(1)中计算的两个门
    Figure PCTCN2019101806-appb-100007
    分别用于逐元素地控制h t-1和r t的信息流入,即公式(2)、(3)的含义,其后复用长短期记忆网络(LSTM)的信息处理方式进行综合计算,如公式(4)(5)所示,最后,将产生的新隐状态h t和内存信息r t分别用公式(4)中计算的两个输出门进行信息控制后并连接后输出;
    3)对内存的写操作:
    将该时刻产生的新的隐状态h t作为需要记忆的信息,写入内存;
    4)进入下一个时间步:
    将h t传入下一个时间步,接收输入x t+1,回到步骤1)循环执行上述步骤。
  2. 根据权利要求1所述的内存网络方法,其特征在于,所述自动寻址方法具体为将h t-1与x t连接起来,并送入一个全连接层,得到一个N维的嵌入向量,所述嵌入向量认为是一个未归一化的内存寻址概率,并用gumbel-softmax函数将该概率采样成一个独热向量,并读出内存中对应该矢量中元素为1的位置的D维的内存信息r t
  3. 根据权利要求2所述的内存网络方法,其特征在于,步骤3)中写入内存的写入位置为步骤1)中使用独热向量读出r t的位置。
  4. 根据权利要求1所述的内存网络方法,其特征在于,所述自动寻址方法仅使用h t-1与x t进行内存寻址,并使用gumbel-softmax函数对未归一化的概率矢量进行归一化和独热采样。
  5. 根据权利要求1所述的内存网络方法,其特征在于,所述递归信息整合的计算单元,相比长短期记忆网络(LSTM)的4个门,新增了3个门
    Figure PCTCN2019101806-appb-100008
    Figure PCTCN2019101806-appb-100009
    分别用于控制h t-1和r t的信息流入,以及r t的信息直接输出。
PCT/CN2019/101806 2019-07-15 2019-08-21 一种基于自动寻址和递归信息整合的内存网络方法 WO2021007919A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/423,223 US20220138525A1 (en) 2019-07-15 2019-08-21 Memory network method based on automatic addressing and recursive information integration

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910635623.9A CN110348567B (zh) 2019-07-15 2019-07-15 一种基于自动寻址和递归信息整合的内存网络方法
CN201910635623.9 2019-07-15

Publications (1)

Publication Number Publication Date
WO2021007919A1 true WO2021007919A1 (zh) 2021-01-21

Family

ID=68175226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/101806 WO2021007919A1 (zh) 2019-07-15 2019-08-21 一种基于自动寻址和递归信息整合的内存网络方法

Country Status (3)

Country Link
US (1) US20220138525A1 (zh)
CN (1) CN110348567B (zh)
WO (1) WO2021007919A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210150345A1 (en) * 2019-11-14 2021-05-20 Qualcomm Incorporated Conditional Computation For Continual Learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9483361B2 (en) * 2013-05-08 2016-11-01 Commvault Systems, Inc. Information management cell with failover management capability
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN107704916A (zh) * 2016-08-12 2018-02-16 北京深鉴科技有限公司 一种基于fpga实现rnn神经网络的硬件加速器及方法
CN108734272A (zh) * 2017-04-17 2018-11-02 英特尔公司 卷积神经网络优化机构

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9807473B2 (en) * 2015-11-20 2017-10-31 Microsoft Technology Licensing, Llc Jointly modeling embedding and translation to bridge video and language
US10049106B2 (en) * 2017-01-18 2018-08-14 Xerox Corporation Natural language generation through character-based recurrent neural networks with finite-state prior knowledge
EP3566182A1 (en) * 2017-02-06 2019-11-13 Deepmind Technologies Limited Memory augmented generative temporal models
US20180349765A1 (en) * 2017-05-30 2018-12-06 Xerox Corporation Log-linear recurrent neural network
US10258304B1 (en) * 2017-11-29 2019-04-16 Siemens Healthcare Gmbh Method and system for accurate boundary delineation of tubular structures in medical images using infinitely recurrent neural networks
CN109613178A (zh) * 2018-11-05 2019-04-12 广东奥博信息产业股份有限公司 一种基于递归神经网络预测空气污染的方法及系统
CN109753897B (zh) * 2018-12-21 2022-05-27 西北工业大学 基于记忆单元强化-时序动态学习的行为识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9483361B2 (en) * 2013-05-08 2016-11-01 Commvault Systems, Inc. Information management cell with failover management capability
CN107704916A (zh) * 2016-08-12 2018-02-16 北京深鉴科技有限公司 一种基于fpga实现rnn神经网络的硬件加速器及方法
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN108734272A (zh) * 2017-04-17 2018-11-02 英特尔公司 卷积神经网络优化机构

Also Published As

Publication number Publication date
US20220138525A1 (en) 2022-05-05
CN110348567B (zh) 2022-10-25
CN110348567A (zh) 2019-10-18

Similar Documents

Publication Publication Date Title
CN109271522B (zh) 基于深度混合模型迁移学习的评论情感分类方法及系统
CN111275113B (zh) 基于代价敏感混合网络的偏斜类时间序列异常检测方法
CN107562784A (zh) 基于ResLCNN模型的短文本分类方法
CN106934352A (zh) 一种基于双路分形网络和lstm的视频描述方法
CN113657561B (zh) 一种基于多任务解耦学习的半监督夜间图像分类方法
EP4131069A1 (en) Transfer model training method and apparatus and fault detection method and apparatus
Hu et al. Transformation-gated LSTM: efficient capture of short-term mutation dependencies for multivariate time series prediction tasks
CN112613571A (zh) 一种用于图像识别的量子神经网络方法、系统及介质
WO2021208455A1 (zh) 一种面向家居口语环境的神经网络语音识别方法及系统
Mirza et al. Efficient online learning with improved LSTM neural networks
CN111104555A (zh) 基于注意力机制的视频哈希检索方法
CN108881254A (zh) 基于神经网络的入侵检测系统
CN115062727A (zh) 一种基于多阶超图卷积网络的图节点分类方法及系统
CN115630298A (zh) 基于自注意力机制的网络流量异常检测方法及系统
WO2021007919A1 (zh) 一种基于自动寻址和递归信息整合的内存网络方法
CN112653684A (zh) 一种基于多路特征感知长短期记忆的异常流量检测方法
WO2023202484A1 (zh) 神经网络模型的修复方法和相关设备
Ghanim et al. Arabic/English Handwritten Digits Recognition using MLPs, CNN, RF, and CNN-RF
Zia Hierarchical recurrent highway networks
Ni et al. Enhanced knowledge distillation for face recognition
Shetty et al. Comparative analysis of different classification techniques
Fang et al. A method of license plate location and character recognition based on CNN
CN114841063A (zh) 一种基于深度学习的航空发动机剩余寿命预测方法
Sun et al. An image classification method based on Echo State Network
Liu et al. An attempt to apply the homotopy method to the domain of machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937460

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19937460

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19937460

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 140223)