JP2022554371A

JP2022554371A - Memristor-based neural network parallel acceleration method, processor, and apparatus

Info

Publication number: JP2022554371A
Application number: JP2022526246A
Authority: JP
Inventors: ▲華▼▲強▼ ▲呉▼; ▲鵬▼ 姚; ▲浜▼ 高; ▲鶴▼ ▲銭▼
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-11-07
Filing date: 2020-01-10
Publication date: 2022-12-28
Anticipated expiration: 2040-01-10
Also published as: KR20220088943A; US20220335278A1; JP7399517B2; CN110807519B; WO2021088248A1; CN110807519A

Abstract

本開示は、メモリスタに基づくニューラルネットワークの並列加速方法およびプロセッサ、装置である。当該ニューラルネットワークは、順次に設定された複数の機能層を含み、当該複数の機能層は、並列された複数の第１メモリスタアレイを含む第１機能層と、第１機能層の後に位置する第２機能層とを含み、当該複数の第１メモリスタアレイは、前記第１機能層の操作を実行し、操作結果を第２機能層に出力するために用いられる。当該並列加速方法は、複数の第１メモリスタセルアレイを用いて第１機能層の操作を並列して実行し、操作結果を第２機能層に出力するステップを含む。The present disclosure is a parallel acceleration method, processor, and apparatus for memristor-based neural networks. The neural network includes a plurality of sequentially arranged functional layers, the plurality of functional layers being positioned after a first functional layer including a plurality of parallel first memristor arrays and the first functional layer. and a second functional layer, wherein the plurality of first memristor arrays are used to perform the operation of the first functional layer and output the operation result to the second functional layer. The parallel acceleration method includes using a plurality of first memristor cell arrays to perform operations on a first functional layer in parallel and outputting operation results to a second functional layer.

Description

本願は、２０１９年１１月７日に提出され、名称が「メモリスタに基づくニューラルネットワークの並列加速方法およびプロセッサ、装置」であり、出願番号が２０１９１１０８２２３６．３である中国特許出願の優先権を主張し、ここで、全文に上記中国特許出願に開示されている内容を引用して本願の一部とする。 This application claims priority from a Chinese patent application filed on November 7, 2019, entitled "Parallel Acceleration Method and Processor, Apparatus for Neural Networks Based on Memristors" with Application No. 201911082236.3. , the content disclosed in the above Chinese patent application is hereby incorporated by reference in its entirety.

本開示の実施例は、メモリスタに基づくニューラルネットワークの並列加速方法およびプロセッサ、装置に関する。 Embodiments of the present disclosure relate to parallel acceleration methods, processors, and apparatus for memristor-based neural networks.

ディープニューラルネットワークアルゴリズムは、知能化の情報技術革命をもたらす。様々なディープニューラルネットワークアルゴリズムに基づいて、画像の認識及び分割と、物体検出と、音声及びテキストへの翻訳、生成などの処理を実現することができる。ディープニューラルネットワークアルゴリズムを使用して異なる動作負荷を処理することは、データを中心とする算出であり、当該アルゴリズムを実現するハードウェアプラットフォームは、高性能、低消費電力の処理能力を有する必要がある。しかしながら、当該アルゴリズムを実現する従来のハードウェアプラットフォームは、記憶及び算出が分離するノイマンアーキテクチャであり、このようなアーキテクチャにおいて、算出時にデータを記憶装置と算出装置との間で往復移動させる必要がある。それで、大量のパラメータを含むディープニューラルネットワークの算出過程において、当該アーキテクチャの効率が低い。このために、新規算出ハードウェアを開発してディープニューラルネットワークアルゴリズムを実行することは、現在早急に解決する必要がある問題となる。 Deep neural network algorithm brings intelligent information technology revolution. Based on various deep neural network algorithms, image recognition and segmentation, object detection, speech and text translation, generation and other processing can be realized. Using deep neural network algorithms to process different workloads is a data-centric computation, and the hardware platform that implements the algorithms should have high-performance, low-power processing capabilities. . However, the conventional hardware platform that implements the algorithm is a von Neumann architecture in which storage and computation are separated, and in such an architecture, it is necessary to move data back and forth between the storage device and the computation device at the time of computation. . Therefore, the efficiency of this architecture is low in the computation process of deep neural networks involving a large amount of parameters. For this reason, developing new computational hardware to implement deep neural network algorithms is now an urgent problem to be solved.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速方法を提供し、前記ニューラルネットワークは、順次に設定された複数の機能層を含み、前記複数の機能層は、並列された複数の第１メモリスタアレイを含む第１機能層と、前記第１機能層の後に位置する第２機能層とを含み、前記複数の第１メモリスタアレイは、前記第１機能層の操作を並列して実行し、操作結果を前記第２機能層に出力するために用いられ、前記並列加速方法は、前記複数の第１メモリスタアレイを用いて前記第１機能層の操作を並列して実行し、前記操作結果を前記第２機能層に出力するステップを含む。 At least one embodiment of the present disclosure provides a parallel acceleration method for a memristor-based neural network, wherein the neural network includes a plurality of sequentially configured functional layers, the plurality of functional layers arranged in parallel a first functional layer including a plurality of first memristor arrays; and a second functional layer positioned after the first functional layer, the plurality of first memristor arrays operating the first functional layer. used to execute in parallel and output operation results to the second functional layer, wherein the parallel acceleration method uses the plurality of first memristor arrays to parallelize the operations of the first functional layer; executing and outputting the operation result to the second functional layer.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記複数の第１メモリスタアレイを用いて前記第１機能層の操作を並列して実行し、前記操作結果を前記第２機能層に出力するステップは、前記第１機能層により受信された入力データを前記複数の第１メモリスタアレイに逐一対応する複数のサブ入力データに分割するステップと、前記複数の第１メモリスタセルアレイを用いて前記複数のサブ入力データに対して前記第１機能層の操作を並列して実行して、複数のサブ操作結果を対応して生成するステップとを含む。 For example, in the parallel acceleration method according to some embodiments of the present disclosure, the plurality of first memristor arrays are used to perform the operations of the first functional layer in parallel, and the operation result is transmitted to the second function. The step of outputting to a layer includes dividing input data received by the first functional layer into a plurality of sub-input data corresponding to the plurality of first memristor cell arrays; performing in parallel the operations of the first functional layer on the plurality of sub-input data using to correspondingly generate a plurality of sub-operation results.

例えば、本開示のいくつかの実施例に係る並列加速方法は、前記複数のサブ操作結果を接合し、前記第２機能層を用いて接合結果に対して前記第２機能層の操作を実行するステップをさらに含む。 For example, a parallel acceleration method according to some embodiments of the present disclosure joins the plurality of sub-operation results and uses the second functional layer to perform the operation of the second functional layer on the joined result. Further including steps.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記複数のサブ入力データのサイズは、基本的に同じである。 For example, in the parallel acceleration method according to some embodiments of the present disclosure, the sizes of the plurality of sub-input data are basically the same.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記複数の第１メモリスタアレイを用いて前記第１機能層の操作を並列して実行し、前記操作結果を前記第２機能層に出力するステップは、前記第１機能層により受信された複数の入力データをそれぞれ前記複数の第１メモリスタアレイに提供するステップと、前記複数の第１メモリスタセルアレイの少なくとも一部を用いて、受信された前記複数の入力データに対して前記第１機能層の操作を並列して実行して、複数のサブ操作結果を対応して生成するステップとを含む。 For example, in the parallel acceleration method according to some embodiments of the present disclosure, the plurality of first memristor arrays are used to perform the operations of the first functional layer in parallel, and the operation result is transmitted to the second function. The step of outputting to a layer includes providing a plurality of input data received by the first functional layer to the plurality of first memristor cell arrays, respectively; performing in parallel the operations of the first functional layer on the plurality of received input data to correspondingly generate a plurality of sub-operation results.

例えば、本開示のいくつかの実施例に係る並列加速方法は、前記第２機能層を用いて前記複数のサブ操作結果に対してそれぞれ前記第２機能層の操作を実行するステップをさらに含む。 For example, the parallel acceleration method according to some embodiments of the present disclosure further includes performing operations of the second functional layer on each of the plurality of sub-operation results using the second functional layer.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記複数の入力データは、互いに異なる。 For example, in parallel acceleration methods according to some embodiments of the present disclosure, the plurality of input data are different from each other.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記ニューラルネットワークは、畳み込みニューラルネットワークである。 For example, in parallel acceleration methods according to some embodiments of the present disclosure, the neural network is a convolutional neural network.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記第１機能層は、前記ニューラルネットワークの初期畳み込み層である。 For example, in parallel acceleration methods according to some embodiments of the present disclosure, the first functional layer is an initial convolutional layer of the neural network.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記複数の機能層は、第３機能層をさらに含み、前記第３機能層の出力が前記第１機能層に提供される。 For example, in parallel acceleration methods according to some embodiments of the present disclosure, the plurality of functional layers further includes a third functional layer, an output of the third functional layer being provided to the first functional layer.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記ニューラルネットワークの重みパラメータは、オフチップトレーニング（ｏｆｆ－ｃｈｉｐｔｒａｉｎｉｎｇ）により得られ、前記ニューラルネットワークの重みパラメータは、前記第１機能層の重みパラメータを含み、前記第１機能層の重みパラメータが前記複数の第１メモリスタアレイに書き込まれて、前記複数の第１メモリスタアレイのコンダクタンスが決定される。 For example, in the parallel acceleration method according to some embodiments of the present disclosure, the neural network weight parameters are obtained by off-chip training, and the neural network weight parameters are obtained from the first function The first functional layer weight parameters, including layer weight parameters, are written to the plurality of first memristor arrays to determine conductances of the plurality of first memristor arrays.

例えば、本開示のいくつかの実施例に係る並列加速方法において、前記ニューラルネットワークの重みパラメータは、前記第１機能層以外の他の機能層の重みパラメータをさらに含み、前記他の機能層の重みパラメータが前記他の機能層に対応するメモリスタアレイに書き込まれることで、前記他の機能層に対応するメモリスタアレイのコンダクタンスが決定される。 For example, in the parallel acceleration method according to some embodiments of the present disclosure, the weight parameters of the neural network further include weight parameters of functional layers other than the first functional layer, and weights of the other functional layers By writing the parameter into the memristor array corresponding to the other functional layer, the conductance of the memristor array corresponding to the other functional layer is determined.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速プロセッサをさらに提供し、前記ニューラルネットワークは、順次に設定された複数の機能層を含み、前記複数の機能層は、第１機能層を含み、前記並列加速プロセッサは、複数のメモリスタアレイ算出ユニットを含み、前記複数のメモリスタアレイ算出ユニットは、複数の第１メモリスタアレイ算出ユニットを含み、前記第１機能層の重みパラメータは、前記複数の第１メモリスタアレイ算出ユニットに書き込まれ、前記複数の第１メモリスタアレイ算出ユニットは、前記第１機能層の操作に対応する演算を並列して実行するように構成される。 At least one embodiment of the present disclosure further provides a memristor-based neural network parallel acceleration processor, wherein the neural network includes a plurality of sequentially arranged functional layers, the plurality of functional layers comprising a first a functional layer, wherein the parallel acceleration processor comprises a plurality of memristor array calculation units, the plurality of memristor array calculation units including a plurality of first memristor array calculation units, the weight of the first functional layer Parameters are written to the plurality of first memristor array computing units, and the plurality of first memristor array computing units are configured to perform operations corresponding to operations of the first functional layer in parallel. be.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速装置をさらに提供し、本開示のいずれか一つの実施例に係る並列加速プロセッサと、前記並列加速プロセッサに接続された入力インタフェース及び出力インタフェースと、を含み、前記入力インタフェースは、命令を受信して前記並列加速プロセッサの作動を制御するように構成され、前記出力インタフェースは、前記並列加速プロセッサの作動結果を出力するように構成される。 At least one embodiment of the present disclosure further provides a memristor-based neural network parallel accelerator, comprising: a parallel acceleration processor according to any one embodiment of the present disclosure; and an input interface coupled to the parallel acceleration processor. and an output interface, wherein the input interface is configured to receive instructions to control operation of the parallel acceleration processor, and the output interface is configured to output an operation result of the parallel acceleration processor. be done.

本開示の実施例の技術的解決手段をより明確に説明するために、実施例の図面を以下に簡単に紹介する。明らかに、以下の説明の図面は、本開示を限定するのではなく、本開示のいくつかの実施例にのみ関連している。
図１は、メモリスタユニット回路の模式図である。図２は、メモリスタアレイの模式図である。図３は、畳み込みニューラルネットワークの模式図である。図４は、畳み込みニューラルネットワークの動作過程模式図である。図５Ａは、メモリスタアレイに基づく畳み込みニューラルネットワークの畳み込み算出の模式図である。図５Ｂは、メモリスタアレイに基づく畳み込みニューラルネットワークの完全接続算出の模式図である。図６は、本開示のいくつかの実施例に係るニューラルネットワークの構造概略ブロック図である。図７Ａは、図６に示すニューラルネットワークの並列加速方法における第１機能層の並列処理方式である。図７Ｂは、図６に示すニューラルネットワークの並列加速方法における第１機能層の別の並列処理方式である。図８は、本開示のいくつかの実施例に係るニューラルネットワークのオフチップトレーニング方法のフローチャートである。図９は、本開示のいくつかの実施例に係るメモリスタに基づくニューラルネットワークの並列加速プロセッサの模式図である。図１０は、図９に示す並列加速プロセッサにおけるメモリスタアレイ算出ユニットの構造模式図である。図１１は、本開示のいくつかの実施例に係るメモリスタに基づくニューラルネットワークの並列加速装置の概略ブロック図である。 In order to describe the technical solutions of the embodiments of the present disclosure more clearly, the drawings of the embodiments are briefly introduced below. Apparently, the drawings in the following description do not limit the disclosure, but only relate to some embodiments of the disclosure.
FIG. 1 is a schematic diagram of a memristor unit circuit. FIG. 2 is a schematic diagram of a memristor array. FIG. 3 is a schematic diagram of a convolutional neural network. FIG. 4 is a schematic diagram of the operation process of the convolutional neural network. FIG. 5A is a schematic diagram of convolution computation of a convolutional neural network based on a memristor array. FIG. 5B is a schematic diagram of complete connectivity computation for a convolutional neural network based on a memristor array. FIG. 6 is a structural schematic block diagram of a neural network according to some embodiments of the present disclosure; FIG. 7A is a parallel processing scheme of the first functional layer in the neural network parallel acceleration method shown in FIG. FIG. 7B is another parallel processing scheme of the first functional layer in the neural network parallel acceleration method shown in FIG. FIG. 8 is a flowchart of a neural network off-chip training method according to some embodiments of the present disclosure. FIG. 9 is a schematic diagram of a memristor-based neural network parallel acceleration processor according to some embodiments of the present disclosure. FIG. 10 is a structural schematic diagram of a memristor array calculation unit in the parallel acceleration processor shown in FIG. FIG. 11 is a schematic block diagram of a memristor-based neural network parallel accelerator in accordance with some embodiments of the present disclosure.

本開示の目的、技術案及び利点をさらに明確に説明するために、以下、本開示の実施例の図面を参照して、本開示の実施例の技術案について明確で完全に説明する。明らかなように、記載された実施例は、本開示の一部の実施例であり、全ての実施例ではない。記載の本開示の実施例に基づいて、当業者が創造的な労働をせずに取得するその他の実施例は、いずれも本開示の保護範囲に含まれる。 In order to describe the objectives, technical solutions and advantages of the present disclosure more clearly, the following clearly and completely describes the technical solutions of the embodiments of the present disclosure with reference to the drawings of the embodiments of the present disclosure. Apparently, the described embodiments are some but not all embodiments of the present disclosure. Based on the described embodiments of the present disclosure, any other embodiments obtained by a person skilled in the art without creative efforts fall within the protection scope of the present disclosure.

特に定義されない限り、本開示で使用される技術用語又は科学用語は、当業者が理解する通常の意味である。本開示で使用される「第１」、「第２」及び類似する語は、何らかの順序、数量又は重要性を示すものではなく、異なる構成部分を区別するためのものにすぎない。同様に、「一つ」、「一」又は、「当該」等の類似する語も数制限を示すものではなく、少なくとも一つ存在することを示す。「含む」や「含まれる」などの類似する語は、この語の前に出現した素子や物がこの語の後に挙げられる素子や物、及びそれらの均等物を含むことを意味するが、その他の素子や物を排除するものではない。「接続」や「互いに接続」などの類似する語は、物理的又は機械的な接続に限定されず、直接的か間接的かを問わず、電気的な接続を含んでもよい。「上」、「下」、「左」、「右」などは、相対位置関係を示すためのものにすぎず、説明対象の絶対位置が変わると、当該相対位置関係もそれに応じて変わる可能性がある。 Unless otherwise defined, technical or scientific terms used in this disclosure have the common meanings that are understood by those of ordinary skill in the art. The terms "first," "second," and similar terms used in this disclosure do not imply any order, quantity, or importance, but are merely to distinguish between different components. Similarly, similar terms such as "one," "one," or "the" are not intended to imply a numerical limitation, but rather indicate the presence of at least one. Similar words such as “include” and “include” mean that the element or thing appearing before the word includes the elements or things listed after the word and their equivalents, but not otherwise. is not intended to exclude any element or object from Similar terms such as "connected" and "connected to each other" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only to indicate relative positional relationships, and if the absolute position of the object being described changes, the relative positional relationships may change accordingly. There is

メモリスタ（抵抗変化メモリ、相変化メモリ、導電ブリッジメモリ等）は、外部励起を印加することにより、そのコンダクタンス状態を調整することができる不揮発性デバイスである。キルヒホッフ電流の法則（Ｋｉｒｃｈｈｏｆｆ’ｓｃｕｒｒｅｎｔｌａｗ）及びオームの法則（Ｏｈｍ’ｓｌａｗ）に基づいて、このようなデバイスで構成されたアレイは、積和演算（ｍｕｌｔｉｐｌｙ－ａｃｃｕｍｕｌａｔｅ）を並列して完了することができ、かつ記憶及び算出は、いずれもアレイの各デバイスに発生する。このような算出アーキテクチャに基づいて、大量のデータの転送を必要としないインメモリコンピューティング（Ｉｎ－ｍｅｍｏｒｙｃｏｍｐｕｔｉｎｇ）を実現することができる。同時に、積和演算は、ニューラルネットワークの動作に必要なコア算出タスクである。したがって、アレイにおけるメモリスタ型デバイスのコンダクタンスを使用して重み値を表示し、このようなインメモリコンピューティングに基づいて効率が高いニューラルネットワーク演算を実現することができる。 A memristor (resistive memory, phase change memory, conductive bridge memory, etc.) is a non-volatile device whose conductance state can be adjusted by applying an external excitation. Based on Kirchhoff's current law and Ohm's law, arrays composed of such devices complete multiply-accumulate operations in parallel. and both storage and computation occur in each device of the array. Based on such computing architecture, in-memory computing that does not require transfer of large amounts of data can be realized. At the same time, multiply-accumulate operations are core computational tasks required for the operation of neural networks. Therefore, the conductance of the memristor-type devices in the array can be used to represent the weight values to realize highly efficient neural network operations based on such in-memory computing.

図１は、メモリスタユニット回路の模式図である。図１に示すように、当該メモリスタユニット回路は、１Ｔ１Ｒ構造を採用し、すなわち当該メモリスタユニット回路は、一つのトランジスタＭ１及び一つのメモリスタＲ１を含む。 FIG. 1 is a schematic diagram of a memristor unit circuit. As shown in FIG. 1, the memristor unit circuit adopts a 1T1R structure, ie the memristor unit circuit includes one transistor M1 and one memristor R1.

なお、本開示の実施例で採用されたトランジスタは、いずれも薄膜トランジスタ又は電界効果トランジスタ（例えばＭＯＳ電界効果トランジスタ）又は他の特性が同じであるスイッチデバイスであってよい。ここで採用されたトランジスタのソース、ドレインは、構造的には対称であってよいため、そのソース、ドレインは、構造的に区別されていなくてよい。本開示の実施例において、トランジスタのゲート以外の二つの電極を区別するために、そのうちの一電極が第１電極であり、他の電極が第２電極であることを直接説明する。 It should be noted that the transistors employed in the embodiments of the present disclosure may all be thin film transistors or field effect transistors (eg, MOS field effect transistors) or switch devices with other characteristics that are the same. Since the source and drain of the transistor employed here may be structurally symmetrical, the source and drain may not be structurally distinguished. In the embodiments of the present disclosure, in order to distinguish the two electrodes other than the gate of the transistor, it is directly described that one of them is the first electrode and the other is the second electrode.

本開示の実施例は、採用されたトランジスタのタイプを限定せず、例えばトランジスタＭ１がＮ型トランジスタを採用する場合、そのゲートがワード線端ＷＬに接続され、例えばワード線端ＷＬがハイレベルを入力する時にトランジスタＭ１がオンにする。トランジスタＭ１の第１電極はソースであって、且つソース線端ＳＬに接続されてもよく、例えばトランジスタＭ１は、ソース線端ＳＬによりリセット電圧を受信することができる。トランジスタＭ１の第２電極はドレインであって、且つメモリスタＲ１の第２電極（例えば負極）に接続されてもよい。メモリスタＲ１の第１電極（例えば正極）がビット線端ＢＬに接続され、例えばメモリスタＲ１は、ビット線端ＢＬによりセット電圧を受信することができる。例えばトランジスタＭ１がＰ型トランジスタを採用する場合、そのゲートは、ワード線端ＷＬに接続され、例えばワード線端ＷＬがローレベルを入力する時にトランジスタＭ１がオンにする。トランジスタＭ１の第１電極はドレインであって、且つソース線端ＳＬに接続されてもよく、例えばトランジスタＭ１は、ソース線端ＳＬによりリセット電圧を受信することができる。トランジスタＭ１の第２電極はソースであって、且つメモリスタＲ１の第２極（例えば負極）に接続されてもよく、メモリスタＲ１の第１電極（例えば正極）は、ビット線端ＢＬに接続され、例えばメモリスタＲ１は、ビット線端ＢＬによりセット電圧を受信することができる。なお、抵抗変化メモリ構造は、メモリスタＲ１の第２電極がソース線端ＳＬに接続された構造のような他の構造として実現されてもよく、本開示の実施例は、これを限定しない。以下の各実施例は、いずれもトランジスタＭ１がＮ型トランジスタを採用することを例として説明する。 The embodiments of the present disclosure do not limit the type of transistors employed, for example, if the transistor M1 adopts an N-type transistor, its gate is connected to the word line end WL, and the word line end WL is at a high level, for example. At the input, transistor M1 turns on. The first electrode of the transistor M1 is the source and may be connected to the source line end SL, for example the transistor M1 can receive the reset voltage by the source line end SL. A second electrode of transistor M1 is a drain and may be connected to a second electrode (eg, negative electrode) of memristor R1. A first electrode (eg positive electrode) of the memristor R1 is connected to the bit line end BL, for example the memristor R1 can receive a set voltage by the bit line end BL. For example, if the transistor M1 employs a P-type transistor, its gate is connected to the word line end WL, and the transistor M1 is turned on, for example, when the word line end WL inputs a low level. The first electrode of the transistor M1 is the drain and may be connected to the source line end SL, eg the transistor M1 may receive the reset voltage by the source line end SL. a second electrode of the transistor M1 may be a source and connected to a second pole (e.g. negative electrode) of the memristor R1, a first electrode (e.g. positive electrode) of the memristor R1 being connected to the bit line end BL; For example, memristor R1 can receive a set voltage via bit line end BL. Note that the resistive memory structure may be implemented as other structures, such as a structure in which the second electrode of the memristor R1 is connected to the source line end SL, and the embodiments of the present disclosure are not limited to this. Each of the following embodiments will be described with an example in which the transistor M1 adopts an N-type transistor.

ワード線端ＷＬの作用は、トランジスタＭ１のゲートに対応する電圧を印加し、それによりトランジスタＭ１のオン又はオフを制御することである。セット操作又はリセット操作を行うなどメモリスタＲ１を操作する場合には、いずれも、まずトランジスタＭ１をオンにする必要があり、すなわちワード線端ＷＬによりトランジスタＭ１のゲートにオン電圧を印加する必要がある。トランジスタＭ１をオンにした後、例えば、ソース線端ＳＬとビット線端ＢＬにストレージＲ１に電圧を印加することにより、メモリスタＲ１の抵抗状態を変化させることができる。例えば、ビット線端ＢＬによりセット電圧を印加することにより、当該メモリスタＲ１が低抵抗状態にある。また、例えば、ソース線端ＳＬによりリセット電圧を印加することにより、当該メモリスタＲ１が高抵抗状態にある。 The function of word line end WL is to apply a corresponding voltage to the gate of transistor M1, thereby controlling the on or off of transistor M1. Any operation of the memristor R1, such as a set operation or a reset operation, must first turn on the transistor M1, that is, apply an on-voltage to the gate of the transistor M1 by the word line end WL. . After turning on the transistor M1, the resistance state of the memristor R1 can be changed by, for example, applying a voltage to the source line end SL and the bit line end BL of the storage R1. For example, the memristor R1 is in a low resistance state by applying a set voltage through the bit line end BL. Further, for example, by applying a reset voltage through the source line end SL, the memristor R1 is in a high resistance state.

なお、本開示の実施例において、ワード線端ＷＬ及びビット線端ＢＬにより同時に電圧を印加することにより、メモリスタＲ１の抵抗値がますます小さくなり、すなわちメモリスタＲ１が高抵抗状態から低抵抗状態に変化し、メモリスタＲ１を高抵抗状態から低抵抗状態に変化させる操作をセット操作と呼ぶ。ワード線端ＷＬ及びソース線端ＳＬにより同時に電圧を印加することにより、メモリスタＲ１の抵抗値がますます大きくなり、すなわちメモリスタＲ１が低抵抗状態から高抵抗状態に変化し、メモリスタＲ１を低抵抗状態から高抵抗状態に変化させる操作をリセット操作と呼ぶ。例えば、メモリスタＲ１が閾値電圧を有し、入力電圧振幅がメモリスタＲ１の閾値電圧より小さい時に、メモリスタＲ１の抵抗値（又は、コンダクタンス値）を変化させない。この場合、閾値電圧より小さい電圧を入力することにより、メモリスタＲ１の抵抗値（又はコンダクタンス値）を使用して算出することができる。閾値電圧より大きい電圧を入力することにより、メモリスタＲ１の抵抗値（又はコンダクタンス値）を変化させることができる。 It should be noted that in the embodiment of the present disclosure, the resistance value of the memristor R1 becomes smaller and smaller by simultaneously applying a voltage from the word line end WL and the bit line end BL, that is, the memristor R1 changes from a high resistance state to a low resistance state. The operation of changing the memristor R1 from the high resistance state to the low resistance state is called a set operation. By simultaneously applying voltages from the word line end WL and the source line end SL, the resistance value of the memristor R1 increases more and more, that is, the memristor R1 changes from the low resistance state to the high resistance state, and the memristor R1 changes to the low resistance state. to a high resistance state is called a reset operation. For example, when memristor R1 has a threshold voltage and the input voltage amplitude is less than the threshold voltage of memristor R1, the resistance value (or conductance value) of memristor R1 is not changed. In this case, by inputting a voltage lower than the threshold voltage, it can be calculated using the resistance value (or conductance value) of the memristor R1. By inputting a voltage higher than the threshold voltage, the resistance value (or conductance value) of the memristor R1 can be changed.

図２は、メモリスタアレイを示し、当該メモリスタアレイは、複数の図１に示すメモリスタユニット回路で構成される。例えば、複数のメモリスタユニット回路は、ｍ行ｎ列のアレイを構成し、ｍが１より大きい整数であり、ｎが１以上の整数である。図２においてＢＬ＜１＞、ＢＬ＜２＞・・・ＢＬ＜ｍ＞は、それぞれ第１行、第２行・・・第ｍ行のビット線を示し、各行のメモリスタユニット回路におけるメモリスタは、当該行に対応するビット線に接続される。図２におけるＷＬ＜１＞、ＷＬ＜２＞・・・ＷＬ＜ｎ＞は、それぞれ第１列、第２列・・・第ｎ列のワード線を示し、各列のメモリスタセルユニット回路におけるトランジスタのゲートは、当該列に対応するワード線に接続される。図２におけるＳＬ＜１＞、ＳＬ＜２＞・・・ＳＬ＜ｎ＞は、それぞれ第１列、第２列・・・第ｎ列のソース線を示し、各列のメモリスタセルユニット回路におけるトランジスタのソースは、当該列に対応するソース線に接続される。 FIG. 2 shows a memristor array, which consists of a plurality of memristor unit circuits shown in FIG. For example, a plurality of memristor unit circuits form an array of m rows and n columns, where m is an integer greater than 1 and n is an integer greater than or equal to 1. In FIG. 2, BL<1>, BL<2>, . , are connected to the bit lines corresponding to the row. WL<1>, WL<2>, . . . , WL<n> in FIG. A gate of the transistor is connected to a word line corresponding to the column. SL<1>, SL<2>, . . . , SL<n> in FIG. The source of the transistor is connected to the source line corresponding to the column.

図２に示すｍ行ｎ列のメモリスタアレイは、サイズがｍ行ｎ列であるニューラルネットワーク重み行列を表すことができる。例えば、第１層のニューロン層は、ｍ個のニューロンノードを有し、且つ図２に示すメモリスタセルアレイのｍ行のビット線に対応して接続される。第２層のニューロン層は、ｎ個のニューロンノードを有し、図２に示すメモリスタセルアレイのｎ列のソース線に対応して接続される。第１層のニューロン層に電圧励起を並列して入力することにより、第２層のニューロン層に電圧励起ベクトルとメモリスタアレイのコンダクタンス行列（コンダクタンスが抵抗の逆数である）を乗算して得られた出力電流を得ることができる。 The m-by-n memristor array shown in FIG. 2 can represent a neural network weight matrix of size m-by-n. For example, the neuron layer of the first layer has m neuron nodes, and is connected to bit lines of m rows in the memristor cell array shown in FIG. The neuron layer of the second layer has n neuron nodes and is connected to the source lines of n columns of the memristor cell array shown in FIG. By inputting the voltage excitation in parallel to the neuron layer of the first layer, the neuron layer of the second layer is obtained by multiplying the voltage excitation vector by the conductance matrix of the memristor array (conductance is the reciprocal of the resistance). output current can be obtained.

具体的には、キルヒホッフの法則に基づいて、メモリスタアレイの出力電流は、式１により得ることができ、

ここで、ｊ＝１、…、ｎであり、ｋ＝１、…、ｍである。 Specifically, based on Kirchhoff's law, the output current of the memristor array can be obtained by Equation 1,

where j=1,...,n and k=1,...,m.

上記式において、ｖ_ｋは、第１層のニューロン層におけるニューロンノードｋから入力した電圧励起を表し、ｉ_ｊは、第２層のニューロン層のニューロンノードｊの出力電流を表し、ｇ_ｋ，ｊは、メモリスタアレイのコンダクタンス行列を表す。 In the above equation, v _k represents the voltage excitation input from neuron node k in the first neuron layer, i _j represents the output current of neuron node j in the second neuron layer, and g _k,j represents the conductance matrix of the memristor array.

キルヒホッフの法則から分かるように、メモリスタアレイは、積和演算を並列に完了することができる。 As can be seen from Kirchhoff's law, memristor arrays can complete multiply-accumulate operations in parallel.

なお、例えば一部の例において、ニューラルネットワーク重み行列の各重みは、二つのメモリスタを使用して実現されてもよい。すなわち、メモリスタセルアレイにおける二列のメモリスタにより一列の出力電流の出力を実現することができる。この場合、サイズがｍ行ｎ列であるニューラルネットワーク重み行列を表すと、ｍ行２ｎ列のメモリスタアレイを必要とする。 Note that, for example, in some examples, each weight in the neural network weight matrix may be implemented using two memristors. In other words, two columns of memristors in the memristor cell array can realize output of one column of output current. In this case, representing a neural network weight matrix of size m rows and n columns requires a memristor array of m rows and 2n columns.

なお、メモリスタアレイから出力された電流は、アナログ電流であり、一部の例において、アナログデジタル変換回路（ＡＤＣ）によりアナログ電流をデジタル電圧に変換して第２層のニューロン層に伝達することができ、それにより第２層のニューロン層は、デジタルアナログ変換回路（ＤＡＣ）によりデジタル電圧をアナログ電圧に変換し、かつ他のメモリスタアレイを介して他の層のニューロン層に接続することができる。他の一部の例において、サンプルホールド回路によりアナログ電流をアナログ電圧に変換して第２層のニューロン層に伝送することができる。 The current output from the memristor array is an analog current, and in some cases, the analog current is converted into a digital voltage by an analog-to-digital conversion circuit (ADC) and transmitted to the neuron layer of the second layer. so that the neuron layer of the second layer can convert a digital voltage to an analog voltage by a digital-to-analog conversion circuit (DAC) and connect to the neuron layer of another layer through another memristor array. can. In some other examples, a sample-and-hold circuit can convert the analog current to an analog voltage for transmission to the neuron layer of the second layer.

畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｅＮｅｕｒａｌＮｅｔｗｏｒｋ、ＣＮＮ）は、主に二次元形状を識別するために用いられ、且つ画像の平行移動、スケーリング、傾斜又は他の形式の変形に対しては高度不変性を有する。ＣＮＮは、主に局所受容野と重み値共有によりニューラルネットワークモデルの複雑さを簡略化し、重みの数を減少させる。深層学習技術の発展に伴い、ＣＮＮの応用範囲は、画像認識の分野に限定されず、それが顔認識、文字認識、動物分類、画像処理などの分野に応用することができる。 Convolute Neural Networks (CNNs) are primarily used to identify two-dimensional shapes and are highly invariant to image translation, scaling, tilting or other forms of deformation. CNN simplifies the complexity of neural network models and reduces the number of weights mainly through local receptive fields and weight value sharing. With the development of deep learning technology, the application range of CNN is not limited to the field of image recognition, but it can be applied to face recognition, character recognition, animal classification, image processing and other fields.

図３は、畳み込みニューラルネットワークの模式図を示す。例えば、当該畳み込みニューラルネットワークは、画像処理に用いることができ、且つ画像を入力及び出力として使用し、スカラー量の重みの代わりに畳み込みカーネル（ｃｏｎｖｏｌｕｔｉｏｎｋｅｒｎｅｌ）を使用する。図３は、３層のニューロン層を有する畳み込みニューラルネットワークのみを示し、本開示の実施例は、これを限定しない。図３に示すように、畳み込みニューラルネットワークは、３層のニューロン層を含み、それぞれ入力層１０１、隠れ層１０２及び出力層１０３である。入力層１０１は、４つの入力を有し、隠れ層１０２は、３つの出力を有し、出力層１０３は、２つの出力を有する。 FIG. 3 shows a schematic diagram of a convolutional neural network. For example, the convolutional neural network can be used for image processing and uses images as input and output and uses convolution kernels instead of scalar weights. FIG. 3 only shows a convolutional neural network with three neuron layers, and the embodiments of the present disclosure are not so limited. As shown in FIG. 3, the convolutional neural network includes three layers of neurons, an input layer 101, a hidden layer 102 and an output layer 103, respectively. Input layer 101 has four inputs, hidden layer 102 has three outputs, and output layer 103 has two outputs.

例えば、入力層１０１の４つの入力は、４枚の画像であってよく、又は、１枚の画像の四種の特徴マップであってよい。隠れ層１０２の３つの出力は、入力層１０１から入力された画像の特徴マップであってよい。 For example, the four inputs of input layer 101 may be four images, or four feature maps of one image. The three outputs of hidden layer 102 may be feature maps of the image input from input layer 101 .

例えば、図３に示すように、畳み込み層は、重みW^ｋ _ij及びバイアスｂ^k _iを有する。重みW^ｋ _ijは、畳み込みカーネルを表し、バイアスｂ^k _iは、畳み込み層の出力に
重畳されたスカラー量であり、ここで、ｋは、入力層１０１を表すタグであり、ｉ及びｊは、それぞれ入力層１０１のユニットと隠れ層１０２のユニットのタグである。例えば、第１畳み込み層２０１は、第１組の畳み込みカーネル（図３におけるW¹ _ij）と第１組のバイアス（図３におけるｂ¹ _i）とを含む。第２畳み込み層２０２は、第２組の畳み込みカーネル（図３のW² _ij）と第２組のバイアス（図３のｂ² _i）とを含む。一般的には、各畳み込み層は、数十又は数百個の畳み込みカーネルを含み、畳み込みニューラルネットワークが深度畳み込みニューラルネットワークであれば、それは、少なくとも五層の畳み込み層を含むことができる。 For example, as shown in FIG. 3, a convolutional layer has weights W ^k _ij and biases b ^k _i . The weights W ^k _ij represent the convolution kernel and the biases b ^k _i are scalar quantities superimposed on the output of the convolutional layer, where k is a tag representing the input layer 101 and i and j are: These are the tags for the units in the input layer 101 and the units in the hidden layer 102, respectively. For example, the first convolutional layer 201 includes a first set of convolution kernels (W ¹ _ij in FIG. 3) and a first set of biases (b ¹ _i in FIG. 3). The second convolutional layer 202 includes a second set of convolution kernels (W ² _ij in FIG. 3) and a second set of biases (b ² _i in FIG. 3). Generally, each convolutional layer contains tens or hundreds of convolution kernels, and if the convolutional neural network is a depth convolutional neural network, it can contain at least five convolutional layers.

例えば、図３に示すように、当該畳み込みニューラルネットワークは、第１活性化層２０３及び第２活性化層２０４をさらに含む。第１活性化層２０３は、第１畳み込み層２０１の後に位置し、第２活性化層２０４は、第２畳み込み層２０２の後に位置する。活性化層（例えば、第１活性化層２０３及び第２活性化層２０４）は、活性化関数を含み、活性化関数は、畳み込みニューラルネットワークに非線形因子を導入することに用いられることにより、畳み込みニューラルネットワークは、複雑な問題をよりよく解決することができる。活性化関数は、整流線形ユニット（ＲｅＬＵ）関数、Ｓ型関数（シグモイド関数）又は双曲線正接関数（ｔａｎｈ関数）などを含んでもよい。ＲｅＬＵ関数は、非飽和非線形関数であり、Ｓｉｇｍｏｉｄ関数及びｔａｎｈ関数は、飽和非線形関数である。例えば、活性化層は、単独で畳み込みニューラルネットワークの一層とすることができ、又は活性化層は、畳み込み層（例えば、第１畳み込み層２０１が第１活性化層２０３を含んでよく、第２畳み込み層２０２が第２活性化層２０４を含んでよく）に含まれてもよい。 For example, as shown in FIG. 3, the convolutional neural network further includes a first activation layer 203 and a second activation layer 204 . A first activation layer 203 follows the first convolutional layer 201 and a second activation layer 204 follows the second convolutional layer 202 . The activation layers (e.g., first activation layer 203 and second activation layer 204) include activation functions, which are used to introduce nonlinear factors into the convolutional neural network to generate convolutional Neural networks are better able to solve complex problems. The activation function may include a rectified linear unit (ReLU) function, an sigmoid function (sigmoid function) or a hyperbolic tangent function (tanh function), or the like. The ReLU function is a unsaturated nonlinear function, and the Sigmoid and tanh functions are saturated nonlinear functions. For example, an activation layer can be a single layer of a convolutional neural network alone, or an activation layer can be a convolutional layer (e.g., a first convolutional layer 201 may include a first activation layer 203 and a second activation layer 203). Convolutional layer 202 may be included in the second activation layer 204).

例えば、第１畳み込み層２０１において、まず、各入力に対して第１組の畳み込みカーネルのうちのいくつかの畳み込みカーネルW¹ _ijと第１組のバイアスのうちのいくつかのバイアスｂ¹ _iを応用して、第１畳み込み層２０１の出力を得て、次に、第１畳み込み層２０１の出力を第１活性化層２０３により処理することにより、第１活性化層２０３の出力を得る。第２畳み込み層２０２において、まず、入力された第１活性化層２０３における出力に対して第２組の畳み込みカーネルのうちのいくつかの畳み込みカーネルW² _ijと第２組のバイアスのうちのいくつかのバイアスｂ² _iを応用して、第２畳み込み層２０２の出力を得て、次に、第２畳み込み層２０２の出力を第２活性化層２０４により処理することにより、第２活性化層２０４の出力を得る。例えば、第１畳み込み層２０１の出力は、その入力に対して畳み込みカーネルW¹ _ijを応用した後にバイアスｂ¹ _iを加算した結果であってもよく、第２畳み込み層２０２の出力は、第１活性化層２０３の出力に対して畳み込みカーネルW² _ijを応用した後にバイアスｂ² _iを加算した結果であってもよい。 For example, in the first convolutional layer 201, first, for each input, some convolution kernels W ¹ _ij of the first set of convolution kernels and some biases b ¹ _i of the first set of biases are By application, the output of the first convolutional layer 201 is obtained, and then the output of the first convolutional layer 201 is processed by the first activation layer 203 to obtain the output of the first activation layer 203 . In the second convolutional layer 202, first, for the input output in the first activation layer 203, some convolution kernels W ² _ij of the second set of convolution kernels and some of the second set of biases Applying that bias b ² _i to obtain the output of the second convolutional layer 202 , and then processing the output of the second convolutional layer 202 by the second activation layer 204 , the second activation layer 204 output is obtained. For example, the output of the first convolutional layer 201 may be the result of applying a convolution kernel W ¹ _ij to its input followed by adding a bias b ¹ _i , and the output of the second convolutional layer 202 may be the result of applying the bias b 1 i to its input. It may be the result of applying the convolution kernel W ² _ij to the output of the activation layer 203 and then adding the bias b ² _i .

畳み込みニューラルネットワークを利用して画像処理を行う前に、畳み込みニューラルネットワークをトレーニングする必要がある。トレーニングされた後、畳み込みニューラルネットワークの畳み込みカーネル及びバイアスは、画像処理期間に不変を保持する。トレーニング過程において、各畳み込みカーネル及びバイアスは、複数組の入力／出力例示画像及び最適化アルゴリズムにより調整することにより、最適化された畳み込みニューラルネットワークモデルを取得する。 Before you can use a convolutional neural network to process images, you need to train the convolutional neural network. After being trained, the convolution kernels and biases of the convolutional neural network remain unchanged during image processing. In the training process, each convolution kernel and bias are adjusted by multiple sets of input/output example images and an optimization algorithm to obtain an optimized convolutional neural network model.

図４は、畳み込みニューラルネットワークの動作過程模式図を示す。例えば、図４に示すように、入力画像を畳み込みニューラルネットワークに入力した後、順次いくつかの処理過程（図４における畳み込み算出、ダウンサンプリング（すなわちｄｏｗｎ－ｓａｍｐｌｉｎｇ）、ベクトル化、完全接続算出等）を行った後に対応する出力を得る。畳み込みニューラルネットワークの主な構成要素は、複数の畳み込み層、複数のダウンサンプリング層（すなわちｄｏｗｎ－ｓａｍｐｌｉｎｇ layer）、平坦化層及び完全接続層を含むことができる。本開示において、理解すべきなのは、複数の畳み込み層、複数のダウンサンプリング層、平坦化層及び完全接続層などのこれらの層は、それぞれ対応する処理／操作を指し、すなわち畳込み処理／操作（図４における畳み込み算出に示すとおりである）、ダウンサンプリング処理／操作（図４におけるダウンサンプリングに示すとおりである）、平坦化処理／操作（図４におけるベクトル化に示すとおりである）、完全接続処理／操作（図４における完全接続算出に示すとおりである）等であり、後に繰り返して説明しない。なお、本開示において、これらの層は、対応する処理／操作を指す層を機能層と総称することにより、ニューロン層と区別するために用いられる。なお、機能層は、さらにアップサンプリング層（アップサンプル層）、標準化層等を含んでよく、本開示の実施例は、これを限定しない。 FIG. 4 shows a schematic diagram of the operation process of the convolutional neural network. For example, as shown in FIG. 4, after inputting the input image to the convolutional neural network, several processing steps (convolution calculation, down-sampling, vectorization, complete connection calculation, etc. in FIG. 4) are performed sequentially. to get the corresponding output after doing The main components of a convolutional neural network can include multiple convolutional layers, multiple down-sampling layers, flattening layers and fully connected layers. In the present disclosure, it should be understood that these layers, such as multiple convolutional layers, multiple downsampling layers, flattening layers and fully connected layers, respectively refer to corresponding processing/operations, i.e. convolutional processing/operations ( 4), downsampling operations/operations (as shown in downsampling in FIG. 4), flattening operations/operations (as shown in vectorization in FIG. 4), full connection processing/operations (as shown in the complete connection calculation in FIG. 4), etc., which will not be repeated later. It should be noted that in this disclosure, these layers are used to distinguish them from the neuron layers by generically referring to layers that refer to corresponding processing/operations as functional layers. Note that the functional layer may further include an up-sampling layer (up-sampling layer), a normalization layer, etc., and the embodiments of the present disclosure are not limited thereto.

畳み込み層は、畳み込みニューラルネットワークのコア層である。畳み込みニューラルネットワークの畳み込み層において、一つのニューロンは、一部の隣接する層のニューロンに接続される。畳み込み層は、入力画像にいくつかの畳み込みカーネル（フィルタとも呼ばれる）を応用して、入力画像の複数の種類の特徴を抽出することができる。各畳み込みカーネルは、一種類の特徴を抽出することができる。畳み込みカーネルは、一般的には、ランダム小数行列の形式で初期化され、畳み込みニューラルネットワークのトレーニング過程において畳み込みカーネルは、学習して合理的な重み値を得る。入力画像に対して一つの畳み込みカーネルを応用した後に得られた結果は、特徴マップ（ｆｅａｔｕｒｅｍａｐ）と呼ばれ、特徴マップの数は、畳み込みカーネルの数と等しい。各特徴マップは、いくつかの矩形で配列されたニューロンで構成され、同一の特徴マップのニューロンは、重み値を共有し、ここで共有された重み値は、畳み込みカーネルである。一階層の畳み込み層から出力された特徴マップは、隣接する次階層の畳み込み層に入力されて再処理して新たな特徴マップを得ることができる。例えば、図４に示すように、第１階層の畳み込み層は、第１特徴マップを出力することができ、当該第１特徴マップは、第２階層の畳み込み層に入力されて再び処理されて第２特徴マップを取得する。 A convolutional layer is the core layer of a convolutional neural network. In a convolutional layer of a convolutional neural network, one neuron is connected to some adjacent layer's neurons. A convolutional layer can apply several convolution kernels (also called filters) to an input image to extract multiple kinds of features of the input image. Each convolution kernel can extract one kind of features. The convolution kernel is generally initialized in the form of a random fraction matrix, and in the training process of the convolution neural network, the convolution kernel learns to obtain reasonable weight values. The results obtained after applying one convolution kernel to the input image are called feature maps, and the number of feature maps is equal to the number of convolution kernels. Each feature map is composed of several rectangular arrayed neurons, and neurons of the same feature map share a weight value, where the shared weight value is the convolution kernel. A feature map output from a first convolutional layer can be input to an adjacent next convolutional layer and reprocessed to obtain a new feature map. For example, as shown in FIG. 4, a first convolutional layer can output a first feature map, which is input to a second convolutional layer and processed again to produce a first feature map. 2 Get the feature map.

例えば、図４に示すように、畳み込み層は、異なる畳み込みカーネルを使用して入力画像のある局所体験領域のデータに対して畳み込みを行うことができ、畳み込み結果は、活性化層に入力することができ、活性化層は、対応する活性化関数に基づいて算出して入力画像の特徴情報を取得する。 For example, as shown in FIG. 4, a convolution layer can convolve data in a local region of experience with an input image using different convolution kernels, and the convolution results are input to the activation layer. , and the activation layer is calculated based on the corresponding activation function to obtain the feature information of the input image.

例えば、図４に示すように、ダウンサンプリング層は、隣接する畳み込み層の間に設置され、ダウンサンプリング層は、ダウンサンプリングの一つの形式である。一方では、ダウンサンプリング層は、入力画像の規模を低減し、算出の複雑さを簡略化し、ある程度でオーバートレーニング現象を減少させることに用いられる。他方では、ダウンサンプリング層は、特徴の圧縮を行い、入力画像の主要な特徴を抽出することができる。ダウンサンプリング層は、特徴マップのサイズを減少させることができるが、特徴マップの数を変更しない。例えば、一つのサイズが１２×１２である入力画像に対して、６×６の畳み込みカーネルによりサンプリングを行う場合、２×２の出力画像を得ることができ、これは、入力画像上の３６個の画素が出力画像における１つの画素に統合することを意味する。最後のダウンサンプリング層の出力は、平坦化層に入力されて平坦化動作（Ｆｌａｔｔｅｎ）を行うことができる。平坦化層は、特徴マップ（二次元画像）をベクトル（一次元）に変換してもよい。当該平坦化操作は、

のような方式で行うことができ、
ここで、ｖは、ｋ個の要素を含むベクトルであり、ｆは、ｉ行ｊ列を有する行列である。 For example, as shown in FIG. 4, a downsampling layer is placed between adjacent convolutional layers, where the downsampling layer is one form of downsampling. On the one hand, the downsampling layer is used to reduce the scale of the input image, simplify the computational complexity, and reduce the overtraining phenomenon to some extent. On the other hand, the downsampling layer can perform feature compression and extract the main features of the input image. A downsampling layer can reduce the size of feature maps, but does not change the number of feature maps. For example, if an input image with a size of 12×12 is sampled by a 6×6 convolution kernel, a 2×2 output image can be obtained, which is equivalent to 36 pixels on the input image. pixels are merged into one pixel in the output image. The output of the final downsampling layer can be input to a flattening layer to perform a flattening operation (Flatten). The flattening layer may transform the feature map (two-dimensional image) into a vector (one-dimensional). The flattening operation includes:

can be done in a manner like
where v is a vector containing k elements and f is a matrix with i rows and j columns.

次に、平坦化層の出力（すなわち一次元ベクトル）は、一つの完全接続層（ＦＣＮ）に入力することができる。完全接続層は、図３に示す畳み込みニューラルネットワークと類似する構造を有してよいが、異なる点は、完全接続層は、畳み込みカーネルの代わりに異なるスカラー量を使用することである。完全接続層は、抽出された全ての特徴を接続するために用いられる。完全接続層の出力は、一次元ベクトルであってもよい。 The output of the planarization layer (ie, a one-dimensional vector) can then be input into one Fully Connected Layer (FCN). A fully connected layer may have a similar structure to the convolutional neural network shown in FIG. 3, except that the fully connected layer uses different scalar quantities instead of convolution kernels. A full connection layer is used to connect all extracted features. The output of a fully connected layer may be a one-dimensional vector.

畳み込みニューラルネットワークにおける畳み込み算出及び完全接続算出などの算出過程は、主に積和演算を含むため、畳み込み層及び完全接続層などの機能層は、メモリスタアレイにより実現することができる。例えば、畳み込み層と完全接続層の重みは、いずれもメモリスタアレイのアレイコンダクタンスで表すことができ、同時に畳み込み層と完全接続層の入力は、対応する電圧励起により表すことができる。それにより前述のキルヒホッフの法則に基づいてそれぞれ畳み込み算出と完全接続算出を実現することができる。 Calculation processes such as convolution calculation and complete connection calculation in a convolutional neural network mainly include sum-of-products operations, so functional layers such as convolution layers and complete connection layers can be realized by memristor arrays. For example, both the convolutional and fully connected layer weights can be represented by the array conductance of the memristor array, while the convolutional and fully connected layer inputs can be represented by corresponding voltage excitations. Thereby, convolution calculation and complete connection calculation can be realized respectively based on Kirchhoff's law described above.

図５Ａは、メモリスタアレイに基づく畳み込みニューラルネットワークの畳み込み算出の模式図であり、図５Ｂは、メモリスタアレイに基づく畳み込みニューラルネットワークの完全接続算出の模式図である。 FIG. 5A is a schematic diagram of convolution computation of a convolutional neural network based on a memristor array, and FIG. 5B is a schematic diagram of complete connection computation of a convolutional neural network based on a memristor array.

図５Ａに示すように、一つのメモリスタアレイを用いて一つの畳み込み層の畳み込み算出を実現することができ、例えば入力画像（図５Ａにおけるデジタル画像「２」に示す）に対して畳み込み処理を行うことができる。例えば、一部の例において、当該畳み込み層は、複数の畳み込みカーネルを含み、当該メモリスタセルアレイの各行は、一つの畳み込みカーネルに対応し、かつ各行の複数のメモリスタは、それぞれ一つの畳み込みカーネルの各要素の値を表すために用いられる。例えば、一つの３×３の畳み込みカーネルに対しては、当該メモリスタアレイの各行は、９つのメモリスタを使用して当該畳み込みカーネルの９つの要素の値を表す。なお、上記メモリスタアレイを用いて畳み込み層を特徴付ける方式は、例示的なものであり、本開示の実施例は、これを含むがこれに限定されるものではない。 As shown in FIG. 5A, one memristor array can be used to realize the convolution computation of one convolution layer, for example, the input image (shown as digital image “2” in FIG. 5A) is subjected to the convolution process. It can be carried out. For example, in some examples, the convolutional layer includes multiple convolution kernels, each row of the memristor cell array corresponds to one convolution kernel, and the multiple memristors in each row correspond to one convolution kernel. Used to represent the value of each element. For example, for a 3×3 convolution kernel, each row of the memristor array uses 9 memristors to represent the values of the 9 elements of the convolution kernel. It should be noted that the manner in which convolutional layers are characterized using memristor arrays is exemplary, and embodiments of the present disclosure include, but are not limited to, this.

理解すべきなのは、畳み込み層がその入力画像に対して畳み込み処理を行う場合、入力画像を複数の画像サブブロック（そのサイズが畳み込みカーネルのサイズと同じである）に分割する必要があり、次に畳み込みカーネルを使用して各画像サブブロックに対して畳み込み操作を行う。メモリスタアレイを用いて畳み込み層の畳み込み演算を実現する場合、複数の畳み込みカーネルは、各画像サブブロックを並列して処理することができるが、依然として各画像サブブロックのデータをバッチに分けて（すなわち一つずつの画像サブブロック）で当該メモリスタアレイに直列に入力する必要があり、入力画像全体に対する畳み込み処理を実現する。 What should be understood is that when a convolutional layer performs a convolution process on its input image, it should divide the input image into multiple image sub-blocks (whose size is the same as the size of the convolution kernel), and then A convolution operation is performed on each image sub-block using a convolution kernel. When a memristor array is used to implement the convolution operation of the convolutional layer, multiple convolution kernels can process each image sub-block in parallel, but still divide the data of each image sub-block into batches ( That is, each image sub-block must be serially input to the memristor array to implement the convolution process for the entire input image.

図５Ｂに示すように、一つのメモリスタアレイを用いて一つの完全接続層の完全接続算出を実現することができる。例えば、一部の例において、図５Ｂに示すように、当該メモリスタアレイの各列は、完全接続層の入力を受信するために用いられ、各行は、完全接続層の出力を提供するために用いられ、各行の複数のメモリスタは、それぞれ当該行の出力に対応する各重みを示すために用いられる。なお、上記メモリスタアレイを用いて完全接続層を特徴付ける方式は、例示的なものであり、本開示の実施例は、これを含むがこれに限定されるものではない。 As shown in FIG. 5B, one memristor array can be used to realize the fully connected computation of one fully connected layer. For example, in some examples, each column of the memristor array is used to receive a fully connected layer input and each row is used to provide a fully connected layer output, as shown in FIG. 5B. A plurality of memristors in each row are used to indicate respective weights corresponding to the outputs in that row. It should be noted that the manner in which the memristor array is used to characterize a fully connected layer is exemplary, and embodiments of the present disclosure include, but are not limited to this.

理解すべきなのは、完全接続層の完全接続算出が一回で完了することができることである。畳み込み層の畳み込み算出は、バッチ直列完了を必要とし、かつ全てのバッチを処理した後、畳み込み層の畳み込み算出が完了する必要がある。したがって、畳み込み算出と完全接続接続との間に常に深刻な速度不整合（畳込み算出がかかる時間が、完全接続算出がかかる時間よりはるかに大きい）が存在する。したがって、メモリスタアレイに基づいて畳み込みニューラルネットワークを実現する場合、当該畳み込みニューラルネットワークの性能は、常に最も効率が低いメモリアレイ、例えば畳み込み層に対応するメモリスタアレイに制限される（効率ボトルネックと呼ばれる）。 It should be understood that the fully connected computation of the fully connected layer can be completed in one go. The convolutional computation of the convolutional layer requires batch-serial completion, and after processing all batches, the convolutional computation of the convolutional layer needs to be completed. Therefore, there is always a severe speed mismatch between convolutional computations and fully connected connections (the time taken by convolutional computations is much greater than the time taken by fully connected computations). Therefore, when implementing a convolutional neural network based on a memristor array, the performance of the convolutional neural network is always limited to the memory array with the lowest efficiency, such as the memristor array corresponding to the convolutional layer (efficiency bottleneck). Called).

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速方法を提供する。当該ニューラルネットワークは、順次に設定された複数の機能層を含み、当該複数の機能層は、並列された複数の第１メモリスタアレイを含む第１機能層と、第１機能層の後に位置する第２機能層とを含み、当該複数の第１メモリスタアレイは、前記第１機能層の操作を実行し、操作結果を第２機能層に出力するために用いられる。当該並列加速方法は、複数の第１メモリスタセルアレイを用いて第１機能層の操作を並列して実行し、操作結果を第２機能層に出力するステップを含む。 At least one embodiment of the present disclosure provides a parallel acceleration method for a memristor-based neural network. The neural network includes a plurality of sequentially arranged functional layers, the plurality of functional layers being positioned after a first functional layer including a plurality of parallel first memristor arrays and the first functional layer. and a second functional layer, wherein the plurality of first memristor arrays are used to perform the operation of the first functional layer and output the operation result to the second functional layer. The parallel acceleration method includes using a plurality of first memristor cell arrays to perform operations on a first functional layer in parallel and outputting operation results to a second functional layer.

本開示の少なくとも一つの実施例は、上記並列加速方法に対応するプロセッサ及び装置をさらに提供する。 At least one embodiment of the present disclosure further provides a processor and apparatus corresponding to the above parallel acceleration method.

本開示の実施例に係るメモリスタに基づくニューラルネットワークの並列加速方法及びプロセッサ、装置は、第１機能層の操作を複数の第１メモリスタアレイにより並列して実行して、メモリスタに基づくニューラルネットワークの動作過程への加速を実現することができる。当該メモリスタに基づくニューラルネットワークのアーキテクチャ構想及び並列加速方法は、様々なディープニューラルネットワークモデル及び異なる種類のメモリスタに広く適用され、ディープニューラルネットワークモデルの速度ミスマッチの問題を解決することに役立つ。 A parallel acceleration method, processor, and apparatus for a memristor-based neural network according to an embodiment of the present disclosure perform operations of a first functional layer in parallel by a plurality of first memristor arrays to generate a memristor-based neural network. Acceleration to the motion process can be realized. The memristor-based neural network architecture concept and parallel acceleration method are widely applied to various deep neural network models and different types of memristors, and help to solve the speed mismatch problem of deep neural network models.

以下、図面を参照して本開示のいくつかの実施例及びその例示を詳細に説明する。 Several embodiments of the present disclosure and examples thereof will now be described in detail with reference to the drawings.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速方法を提供する。図６は、本開示のいくつかの実施例に係るニューラルネットワークの構造模式ブロック図を示し、図７Ａは、図６に示すニューラルネットワークの並列加速方法における第１機能層の並列処理方式を示し、図７Ｂは、図６に示すニューラルネットワークの並列加速方法における第１機能層の他の並列処理方式を示す。 At least one embodiment of the present disclosure provides a parallel acceleration method for a memristor-based neural network. FIG. 6 shows a structural schematic block diagram of a neural network according to some embodiments of the present disclosure, FIG. 7A shows the parallel processing scheme of the first functional layer in the parallel acceleration method of the neural network shown in FIG. 6; FIG. 7B shows another parallel processing scheme of the first functional layer in the neural network parallel acceleration method shown in FIG.

図６に示すように、当該ニューラルネットワークは、順次に設定された複数の機能層を含む。例えば、図６に示すように、当該複数の機能層は、第１機能層及び第１機能層の後に位置する第２機能層を含む。例えば、いくつかの実施例において、当該複数の機能層は、第１機能層及び第２機能層以外の他の機能層をさらに含むことができ、本開示は、これを限定しない。 As shown in FIG. 6, the neural network includes a plurality of sequentially set functional layers. For example, as shown in FIG. 6, the plurality of functional layers includes a first functional layer and a second functional layer located after the first functional layer. For example, in some embodiments, the plurality of functional layers may further include functional layers other than the first functional layer and the second functional layer, and the present disclosure is not limited thereto.

例えば、いくつかの実施例において、図７Ａ及び図７Ｂに示すように、第１機能層は、並列された複数の第１メモリスタアレイを含み、第１機能層に対応する複数の第１メモリスタアレイは、第１機能層の操作を並列して実行し、操作結果を第２機能層に出力するために用いられ、ニューラルネットワークの動作過程に対する加速を実現する。例えば、いくつかの実施例において、第１機能層が一つの第１メモリスタアレイのみを含む場合、第１機能層は、当該ニューラルネットワークの動作性能を制限する効率ボトルネックであり、例えば第１機能層は、畳み込み層である。 For example, in some embodiments, as shown in FIGS. 7A and 7B, the first functional layer includes a plurality of parallel first memristor arrays, and a plurality of first memories corresponding to the first functional layer. The star array is used to execute the operations of the first functional layer in parallel and output the operation results to the second functional layer, thus realizing acceleration for the operation process of the neural network. For example, in some embodiments, if the first functional layer contains only one first memristor array, the first functional layer is an efficiency bottleneck that limits the performance of the neural network, e.g. The functional layer is a convolutional layer.

例えば、いくつかの実施例において、当該ニューラルネットワークは、複数の畳み込み層を含む畳み込みニューラルネットワークである。一般的には、ニューラルネットワークの入力画像に対して畳み込み処理を行うための初期畳み込み層（すなわち第１畳み込み層）の演算量が最大であり、かかる時間が最も長く、すなわち初期畳み込み層が一般的にニューラルネットワークの効率ボトルネックであるため、第１機能層は、一般的に当該初期畳み込み層を含むことができる。なお、本開示は、これを含むがこれに限定されない。例えば、他の実施例において、図６に示すように、ニューラルネットワークの複数の機能層は、第１機能層の前に位置する第３機能層をさらに含んでよく、第３機能層の出力が第１機能層の入力として第１機能層に提供されるため、第１機能層は、ニューラルネットワークの初期畳み込み層以外の他の畳み込み層、例えば中間畳み込み層などであってもよい。 For example, in some embodiments, the neural network is a convolutional neural network that includes multiple convolutional layers. In general, the initial convolutional layer (i.e., the first convolutional layer) for performing convolutional processing on the input image of the neural network has the largest amount of computation and takes the longest time, which is the initial convolutional layer. The first functional layer can generally include the initial convolutional layer, since it is the efficiency bottleneck of neural networks. Note that this disclosure includes, but is not limited to, this. For example, in another embodiment, as shown in FIG. 6, the plurality of functional layers of the neural network may further include a third functional layer positioned before the first functional layer, the output of the third functional layer being The first functional layer may be another convolutional layer other than the initial convolutional layer of the neural network, such as an intermediate convolutional layer, as it is provided to the first functional layer as input for the first functional layer.

理解すべきなのは、ニューラルネットワークは、複数の第１機能層（例えば、畳み込み層）を含むことができ、それにより各第１機能層に対応する複数の第１メモリスタアレイにより当該第１機能層の操作を並列して実行して、ニューラルネットワークの並列度を向上させることができ、さらにニューラルネットワークの動作過程に対するさらなる加速を実現することができる。例えば、各第１機能層に対応する第１メモリスタアレイの数は、同じであってもよく、異なってもよく、本開示の実施例は、これを限定しない。 It should be appreciated that the neural network may include a plurality of first functional layers (eg, convolutional layers) such that a plurality of first memristor arrays corresponding to each first functional layer provide can be performed in parallel to improve the parallelism of the neural network, and further accelerate the operation process of the neural network. For example, the number of first memristor arrays corresponding to each first functional layer may be the same or different, and embodiments of the present disclosure do not limit this.

例えば、上記第２機能層は、畳み込み層、ダウンサンプリング層、平坦化層及び完全接続層などのうちの一つを含むことができる。例えば、上記第３機能層は、畳み込み層及びダウンサンプリング層などのうちの一つを含むことができる。なお、本開示の実施例は、これを限定するものではない。 For example, the second functional layer may include one of a convolution layer, a downsampling layer, a flattening layer, a fully connected layer, and the like. For example, the third functional layer may include one of a convolutional layer, a downsampling layer, and the like. However, the embodiments of the present disclosure are not intended to limit this.

例えば、図７Ａ及び図７Ｂは、いずれも、第１機能層が３つの第１メモリスタアレイを含む場合を例示的に示したが、本開示に対する制限と見なされるべきではない。すなわち、第１機能層に含まれる第１メモリスタアレイの数は、実際のニーズに応じて設定することができ、本開示の実施例は、これを限定しない。 For example, FIGS. 7A and 7B both illustratively show the case where the first functional layer includes three first memristor arrays, which should not be considered a limitation on the present disclosure. That is, the number of first memristor arrays included in the first functional layer can be set according to actual needs, and the embodiments of the present disclosure do not limit this.

例えば、図７Ａ及び図７Ｂに示すように、上記メモリスタに基づくニューラルネットワークの並列加速方法は、複数の第１メモリスタアレイを用いて第１機能層の操作を並列して実行し、操作結果を第２機能層（図７Ａ及び図７Ｂに示されず）に出力するステップを含む。 For example, as shown in FIGS. 7A and 7B, the memristor-based neural network parallel acceleration method uses a plurality of first memristor arrays to perform the operations of the first functional layer in parallel, and the operation results are Outputting to a second functional layer (not shown in FIGS. 7A and 7B).

例えば、いくつかの実施例において、図７Ａに示すように、まず、第１機能層により受信された入力データ（図７Ａにおけるデジタル画像「２」に示す）を複数の第１メモリスタアレイに逐一対応する複数のサブ入力データ（図７Ａにおけるデジタル画像「２」から分割された三つの部分に示す）に分割することができる。次に、当該複数の第１メモリスタアレイを用いて当該複数のサブ入力データに対して第１機能層の操作を並列して実行して、複数のサブ操作結果を対応して生成する。次に、さらに当該複数のサブ操作結果を接合して第２機能層を用いて接合結果に対して第２機能層の操作を実行することができる。 For example, in some embodiments, as shown in FIG. 7A, input data received by the first functional layer (shown in digital image "2" in FIG. 7A) is first serialized to a plurality of first memristor arrays. It can be divided into a corresponding plurality of sub-input data (shown in three parts divided from digital image "2" in FIG. 7A). Then, the operations of the first functional layer are performed in parallel on the plurality of sub-input data using the plurality of first memristor arrays to correspondingly generate a plurality of sub-operation results. Then, the plurality of sub-operation results can be further spliced and a second functional layer can be used to perform the operation of the second functional layer on the spliced result.

例えば、一部の例において、図７Ａに示すように、第１機能層は、畳み込み層であり、第１機能層に含まれる各第１メモリスタアレイは、いずれも図５Ａに示すような方式で第１機能層の畳み込み操作を実現することができる。 For example, in some examples, as shown in FIG. 7A, the first functional layer is a convolutional layer, and each first memristor array included in the first functional layer is a scheme as shown in FIG. 5A. can realize the convolution operation of the first functional layer.

例えば、一部の例において、入力画像（すなわち入力データ）を分割して得られた複数のサブ入力画像（すなわちサブ入力データ）において、隣接するサブ入力画像の間は、一般的に互いにオーバーラップしてよく、当然ながらオーバーラップしなくてよく、本開示の実施例は、これを限定しない。例えば、一部の例において、当該複数のサブ入力データのサイズは、基本的に同じであり、それにより、各サブ入力データは、対応する第１メモリスタアレイにより畳み込み処理を行うことにかかる時間が基本的に同じであり、さらに全体的には、第１機能層の処理速度を加速し、すなわちニューラルネットワークの処理速度を加速することができる。 For example, in some cases, in a plurality of sub-input images (i.e., sub-input data) obtained by dividing an input image (i.e., input data), adjacent sub-input images generally overlap each other. may and of course be non-overlapping, and embodiments of the present disclosure are not limited to this. For example, in some examples, the sizes of the plurality of sub-input data are basically the same, so that each sub-input data has the time it takes to perform convolution processing by the corresponding first memristor array. are basically the same, and overall, it can accelerate the processing speed of the first functional layer, that is, the processing speed of the neural network.

例えば、一部の例において、当該複数のサブ入力データは、任意の順序でそれぞれ当該複数の第１メモリスタアレイに提供することができ、この場合に、各第１メモリスタアレイは、いずれかのサブ入力データを処理することができる。例えば、他の一部の例において、当該複数のサブ入力データは、所定の順序でそれぞれ当該複数の第１メモリスタアレイに逐一対応して提供されるべきであり、この場合に、各第１記憶アレイは、それに対応するサブ入力データを処理することができる。 For example, in some examples, the plurality of sub-input data may be provided to the plurality of first memristor arrays respectively in any order, where each first memristor array of sub-input data can be processed. For example, in some other examples, the plurality of sub-input data should be provided in a predetermined order corresponding to the plurality of first memristor arrays. The storage array can process sub-input data corresponding thereto.

例えば、一つの第１メモリセルアレイを用いて入力画像を処理する場合（図５Ａを参照する）、第１機能層の操作がかかる時間をｔと記す。例えば三つの第１メモリスタアレイを用いて入力画像を分割して得られた三枚のサブ入力画像を並列処理する場合（図７Ａを参照する）、第１機能層の操作がかかる時間は、ｔ／３までに減少する。これにより、図７Ａに示す並列加速方法は、ニューラルネットワークが単一の入力データを処理する動作過程の加速を実現することができる。 For example, if one first memory cell array is used to process an input image (see FIG. 5A), the time taken to operate the first functional layer is denoted as t. For example, when three sub-input images obtained by dividing an input image using three first memristor arrays are processed in parallel (see FIG. 7A), the time taken to operate the first functional layer is Decrease by t/3. Therefore, the parallel acceleration method shown in FIG. 7A can realize the acceleration of the operation process of the neural network processing single input data.

例えば、一部の例において、第２機能層は、畳み込み層、ダウンサンプリング層、平坦化層及び完全接続層のうちの一つであってもよく、本開示の実施例は、これを限定しない。 For example, in some examples, the second functional layer may be one of a convolutional layer, a downsampling layer, a flattening layer, and a fully connected layer, which embodiments of the present disclosure do not limit. .

例えば、他の実施例において、図７Ｂに示すように、まず、第１機能層により受信された複数の入力データ（図７Ｂにおけるデジタル画像「２」、「１」、「４」に示す）をそれぞれ複数の第１メモリスタアレイに提供することができる。次に、当該複数の第１メモリスタアレイの少なくとも一部を用いて、受信された複数の入力データに対して第１機能層の操作を並列して実行して、複数のサブ操作結果を対応して生成する。次に、さらに第２機能層を用いてそれぞれ当該複数のサブ操作結果に対して第２機能層の操作を実行することができる。 For example, in another embodiment, as shown in FIG. 7B, first a plurality of input data received by the first functional layer (shown in digital images "2", "1", and "4" in FIG. 7B) are Each can be provided for a plurality of first memristor arrays. Next, using at least a part of the plurality of first memristor arrays, the operations of the first functional layer are performed in parallel on the plurality of received input data, and the plurality of sub-operation results are corresponded. to generate. Then, the operation of the second function layer can be performed on each of the plurality of sub-operation results using the second function layer.

例えば、一部の例において、図７Ｂに示すように、第１機能層は、畳み込み層であり、第１機能層に含まれる各第１メモリスタアレイは、いずれも図５Ａに示すような方式で第１機能層の畳み込み操作を実現することができる。例えば、当該複数の入力データは、任意の順序で当該複数の第１メモリスタアレイに割り当てられてもよく、この場合に、各第１メモリスタアレイは、いずれかの入力データを処理することができる。例えば、当該複数の入力データは、互いに異異なってよく、当然のことながら一部又は全部が同じであってもよく、本開示の実施例は、これを限定しない。 For example, in some examples, as shown in FIG. 7B, the first functional layer is a convolutional layer, and each first memristor array included in the first functional layer is a scheme as shown in FIG. 5A. can realize the convolution operation of the first functional layer. For example, the plurality of input data may be assigned to the plurality of first memristor arrays in any order, in which case each first memristor array may process any input data. can. For example, the plurality of input data may be different from each other, and may of course be partially or entirely the same, and the embodiments of the present disclosure are not limited to this.

例えば、一つの第１メモリスタアレイを用いて入力画像を処理する場合（図５Ａを参照する）、第１機能層の操作がかかる時間をｔ１と記し、後続の機能層の操作がかかる時間をｔ２と記し、ｔ１＞ｔ２であれば、ニューラルネットワークを使用して三枚の入力画像を処理するために使用される時間は、少なくとも約３×ｔ１＋ｔ２である（例えば、第１機能層が現在の一枚の入力画像のデータを処理する時、後続の機能層は、前の入力画像の関連データを処理することができる）。これに比べて、例えば三つの第１メモリスタアレイを用いて三枚の入力画像を並列処理する場合（図７Ｂを参照する）、ニューラルネットワークを使用して三枚の入力画像を処理するために使用される時間は、約ｔ１＋３×ｔ２である。それにより、節約した時間は、２×（ｔ１－ｔ２）である。すなわち、図７Ｂに示す並列加速方法は、ニューラルネットワークが複数の入力データを処理する動作過程への加速を実現することができる。 For example, if one first memristor array is used to process the input image (see FIG. 5A), the time taken for the operation of the first functional layer is denoted as t1, and the time taken for the subsequent functional layer operations is denoted by t1. Denote t2, and if t1>t2, then the time used to process three input images using the neural network is at least about 3*t1+t2 (e.g., the first functional layer is the current When processing the data of one input image, subsequent functional layers can process the associated data of previous input images). By comparison, for example, when processing three input images in parallel using three first memristor arrays (see FIG. 7B), to process the three input images using a neural network: The time used is about t1+3*t2. The time saved thereby is 2*(t1-t2). That is, the parallel acceleration method shown in FIG. 7B can realize acceleration to the operation process in which the neural network processes multiple input data.

理解すべきなのは、図７Ａに示す並列加速方法及び図７Ｂに示す並列加速方法は、同一のニューラルネットワーク（例えば、同一のニューラルネットワークの異なる第１機能層）に総合的に適用することができ、本開示の実施例は、これを限定しない。 It should be understood that the parallel acceleration method shown in FIG. 7A and the parallel acceleration method shown in FIG. 7B can be applied comprehensively to the same neural network (e.g., different first functional layers of the same neural network), Embodiments of the present disclosure do not limit this.

本開示の実施例に係るニューラルネットワークは、上記並列加速方法を採用してそれを動作させることができ、その動作過程において、第１機能層の操作を複数の第１メモリスタアレイにより並列的に実行して、ニューラルネットワークの動作過程に対する加速を実現することができる。当該ニューラルネットワークのアーキテクチャ構想及びその並列加速方法は、様々なディープニューラルネットワークモデル及び異なる種類のメモリスタに広く適用され、ディープニューラルネットワークモデルの速度ミスマッチの問題を解決することに役立つ。 The neural network according to the embodiment of the present disclosure can adopt the above parallel acceleration method to operate, and in the operation process, the first functional layer is operated in parallel by a plurality of first memristor arrays. can be implemented to achieve acceleration for the neural network operation process. The neural network architecture concept and its parallel acceleration method are widely applied to various deep neural network models and different kinds of memristors, and help to solve the speed mismatch problem of deep neural network models.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークのオフチップトレーニング方法をさらに提供する。例えば、当該トレーニング方法は、前述の実施例に係るニューラルネットワークのパラメータを得るために用いられる。例えば、図６、図７Ａ及び図７Ｂに示すように、当該ニューラルネットワークは、順次に設定された複数の機能層を含み、当該複数の機能層は、第１機能層及び第１機能層の後に位置する第２機能層を含み、第１機能層は、並列された複数の第１メモリスタアレイを含み、当該複数の第１メモリスタアレイは、第１機能層の操作を実行し、操作結果を第２機能層に出力するために用いられる。 At least one embodiment of the present disclosure further provides an off-chip training method for a memristor-based neural network. For example, the training method is used to obtain the parameters of the neural network according to the previous embodiment. For example, as shown in FIGS. 6, 7A and 7B, the neural network includes a plurality of sequentially configured functional layers, the plurality of functional layers being a first functional layer and a first functional layer followed by a first functional layer. a second functional layer located in the first functional layer, the first functional layer including a plurality of first memristor arrays arranged in parallel, the plurality of first memristor arrays performing operations of the first functional layer; is used to output to the second functional layer.

理解すべきなのは、ニューラルネットワークのトレーニング方法は、一般的には、ニューラルネットワークを使用してトレーニング入力データを処理することにより、トレーニング出力データを得ることと、トレーニング出力データに基づいて、損失関数によりニューラルネットワークの損失値を算出することと、損失値に基づいてニューラルネットワークのパラメータを補正することと、ニューラルネットワークのトレーニングが所定の条件を満たすか否かを判断し、所定の条件を満たさない場合、上記トレーニング過程を繰り返し実行し、所定の条件を満たす場合、上記トレーニング過程を停止し、トレーニングされたニューラルネットワークを得ることと、を含む。当然のことながら、ニューラルネットワークをトレーニングする場合、一般的には、ニューラルネットワークのパラメータを初期化する必要がある。例えば、一般的にニューラルネットワークのパラメータを乱数に初期化することができ、例えば乱数がガウス分布に合致し、本開示の実施例は、これを限定しない。理解すべきなのは、本開示の実施例に係るニューラルネットワークのトレーニング方法も上記一般的なトレーニングステップ及び過程を参照することができることである。 It should be understood that the neural network training method generally consists of using the neural network to process the training input data to obtain the training output data, and based on the training output data, by the loss function Calculating the loss value of the neural network, correcting the parameters of the neural network based on the loss value, determining whether the training of the neural network satisfies a predetermined condition, and if the predetermined condition is not satisfied , repeatedly performing the training process and, if a predetermined condition is met, stopping the training process and obtaining a trained neural network. Of course, when training a neural network, it is generally necessary to initialize the parameters of the neural network. For example, parameters of the neural network can generally be initialized to random numbers, such as random numbers conforming to a Gaussian distribution, and embodiments of the present disclosure are not limited thereto. It should be understood that the neural network training method according to the embodiments of the present disclosure can also refer to the above general training steps and processes.

オフチップトレーニングにより各重みパラメータを取得した後、メモリスタアレイにおける各デバイスのコンダクタンスに対してセット及びリセット操作によりプログラミングを行い、対応する重みを実現する。具体的なプログラミング方法及びメモリスタ重みの組織方式は、限定されない。 After obtaining each weight parameter by off-chip training, the conductance of each device in the memristor array is programmed by set and reset operations to realize the corresponding weight. The specific programming method and memristor weight organization scheme are not limited.

図８は、本開示のいくつかの実施例に係るニューラルネットワークのオフチップトレーニング方法のフローチャートである。例えば、図８に示すように、当該オフチップトレーニング方法は、以下のステップＳ１０～ステップＳ３０を含むことができる。 FIG. 8 is a flowchart of a neural network off-chip training method according to some embodiments of the present disclosure. For example, as shown in FIG. 8, the off-chip training method can include the following steps S10-S30.

ステップＳ１０、ニューラルネットワークの数学的モデルを構築する。 Step S10, building a mathematical model of the neural network.

例えば、一部の例において、ソフトウェア（例えば、プログラムコード等）を使用して本開示の実施例に係る数学的モデルを構築することができる。 For example, in some examples, software (eg, program code, etc.) may be used to construct mathematical models according to embodiments of the present disclosure.

ステップＳ２０、数学的モデルをトレーニングすることにより、トレーニングされた数学的モデルを得る。 Step S20, training the mathematical model to obtain a trained mathematical model.

例えば、一部の例において、プロセッサ及びメモリスタなどに基づいて上記数学的モデルを実行しトレーニングすることができる。例えば、数学的モデルのトレーニングステップ及び過程は、一般的なトレーニングステップ及び過程を参照することができ、ここで繰り返して説明しない。 For example, in some examples, the above mathematical models can be run and trained based on processors, memristors, and the like. For example, the training steps and processes of the mathematical model can refer to the general training steps and processes and will not be repeated here.

ステップＳ３０、トレーニングされた数学的モデルの重みパラメータをニューラルネットワークに対応するメモリスタアレイに書き込む。 Step S30, write the weight parameters of the trained mathematical model into the memristor array corresponding to the neural network.

例えば、一部の例において、数学的モデルにおける第１機能層は、第１重みパラメータを含む。数学的モデルのトレーニング過程において、順方向に伝播する時、当該第１重みパラメータにより第１機能層のトレーニング入力データを処理する。逆方向に伝播する場合、当該第１重みパラメータを修正してトレーニングされた第１機能層の第１重みパラメータを取得する。この場合、トレーニングされた数学的モデルの重みパラメータをニューラルネットワークに対応するメモリスタアレイに書き込むステップ、すなわちステップＳ３０は、トレーニングされた数学的モデルにおける第１機能層の当該第１重みパラメータを複数の第１メモリスタアレイにそれぞれ書き込むことを含む。このとき、第１機能層に対応する各第１メモリスタアレイは、同じコンダクタンス重み行列を含む。 For example, in some examples, a first functional layer in the mathematical model includes a first weight parameter. In the process of training the mathematical model, the training input data of the first functional layer are processed according to the first weight parameter during forward propagation. When propagating in the backward direction, modify the first weight parameter to obtain the first weight parameter of the first trained function layer. In this case, the step of writing the weight parameter of the trained mathematical model to the memristor array corresponding to the neural network, i.e. step S30, is to write the first weight parameter of the first functional layer in the trained mathematical model to a plurality of Writing to each of the first memristor arrays. At this time, each first memristor array corresponding to the first functional layer includes the same conductance weight matrix.

例えば、他の一部の例において、数学的モデルにおける第１機能層は、複数の第１重みパラメータを含む。数学的モデルのトレーニング過程において、順方向に伝播する場合、数学的モデルにおける第１機能層が受信したトレーニング入力データを当該複数の第１重みパラメータに逐一対応する複数のトレーニングサブ入力データに分割し、当該複数の第１重みパラメータを使用して当該複数のトレーニングサブ入力データに対して第１機能層の操作を並列して実行して、複数のトレーニングサブ操作結果を生成し、各第１重みパラメータに対応するトレーニングサブ操作結果及び当該トレーニングサブ操作結果に対応するトレーニング中間データに基づいて、当該第１重みパラメータのパラメータ値を更新する。オフチップトレーニングの具体的な方式の異なりに基づいて、各アレイに同じ重みパラメータを書き込んでよく、異なる重みパラメータを書き込んでよい。 For example, in some other examples, a first functional layer in the mathematical model includes multiple first weighting parameters. In the training process of the mathematical model, during forward propagation, the training input data received by the first functional layer in the mathematical model is divided into a plurality of training sub-input data corresponding to the plurality of first weight parameters one by one. , performing in parallel a first functional layer operation on the plurality of training sub-input data using the plurality of first weight parameters to generate a plurality of training sub-operation results; Update the parameter value of the first weight parameter according to the training sub-operation result corresponding to the parameter and the training intermediate data corresponding to the training sub-operation result. The same weight parameter may be written to each array, or different weight parameters may be written to each array, based on different specific methods of off-chip training.

この場合、トレーニングされた数学的モデルの重みパラメータをニューラルネットワークに対応するメモリスタアレイに書き込むステップ、すなわちステップＳ３０は、トレーニングされた数学的モデルにおける第１機能層の複数の第１重みパラメータをそれぞれ複数の第１メモリスタアレイに逐一対応して書き込むことを含む。この時、得られたニューラルネットワークは、図７Ａに示す並列加速方法を実行することに用いることができる。 In this case, the step of writing the weight parameters of the trained mathematical model into the memristor array corresponding to the neural network, i.e. step S30, respectively writes the plurality of first weight parameters of the first functional layer in the trained mathematical model to Writing to a plurality of first memristor arrays one by one. The resulting neural network can then be used to implement the parallel acceleration method shown in FIG. 7A.

例えば、さらに一部の例において、数学的モデルにおける第１機能層は、複数の第１重みパラメータを含む。数学的モデルのトレーニング過程において、順方向に伝播する場合、数学的モデルにおける第１機能層が受信した複数のトレーニング入力データをそれぞれ当該複数の第１重みパラメータに提供し、当該複数の第１重みパラメータを使用して当該複数のトレーニング入力データの少なくとも一部に対して第１機能層の操作を並列して実行して、複数のトレーニングサブ操作結果を生成し、各第１重みパラメータに対応するトレーニングサブ操作結果及び当該トレーニングサブ操作結果に対応するトレーニング中間データに基づいて、当該第１重みパラメータのパラメータ値を更新する。 For example, in some further examples, the first functional layer in the mathematical model includes a plurality of first weighting parameters. In the training process of the mathematical model, in forward propagation, a plurality of training input data received by the first functional layer in the mathematical model are respectively provided to the plurality of first weight parameters, and the plurality of first weight parameters performing in parallel a first functional layer operation on at least a portion of the plurality of training input data using the parameters to generate a plurality of training sub-operation results corresponding to each first weight parameter; Update the parameter value of the first weight parameter according to the training sub-operation result and the training intermediate data corresponding to the training sub-operation result.

この場合、トレーニングされた数学的モデルの重みパラメータをニューラルネットワークに対応するメモリスタアレイに書き込むステップ、すなわちステップＳ３０は、トレーニングされた数学的モデルにおける第１機能層の複数の第１重みパラメータをそれぞれ複数の第１メモリスタアレイに逐一対応して書き込むことを含む。この時、取得されたニューラルネットワークは、図７Ｂに示された並列加速方法を実行するために用いられてもよく、図７Ａに示された並列加速方法を実行するために用いられてもよい。 In this case, the step of writing the weight parameters of the trained mathematical model into the memristor array corresponding to the neural network, i.e. step S30, respectively writes the plurality of first weight parameters of the first functional layer in the trained mathematical model to Writing to a plurality of first memristor arrays one by one. At this time, the obtained neural network may be used to implement the parallel acceleration method shown in FIG. 7B, or may be used to implement the parallel acceleration method shown in FIG. 7A.

したがって、本開示の他のいくつかの実施例に係るメモリスタに基づくニューラルネットワークの並列加速方法において、ニューラルネットワークの重みパラメータは、上記オフチップトレーニング方法により得られ、ニューラルネットワークの重みパラメータは、第１機能層の重みパラメータを含み、第１機能層の重みパラメータが複数の第１メモリスタアレイに書き込まれることで、複数の第１メモリスタアレイのコンダクタンスが決定される。また、理解すべきなのは、上記オフチップトレーニング方法により得られたニューラルネットワークの重みパラメータは、前記第１機能層以外の他の機能層の重みパラメータをさらに含み、前記他の機能層の重みパラメータが他の機能層に対応するメモリスタアレイに書き込まれることで、他の機能層に対応するメモリスタアレイのコンダクタンスが決定される。 Therefore, in the memristor-based parallel acceleration method for neural networks according to some other embodiments of the present disclosure, the neural network weight parameters are obtained by the above-described off-chip training method, and the neural network weight parameters are obtained from the first The first functional layer weight parameter including the functional layer weight parameter is written to the plurality of first memristor arrays to determine the conductance of the plurality of first memristor arrays. It should also be understood that the weight parameters of the neural network obtained by the off-chip training method further include weight parameters of other functional layers other than the first functional layer, and the weight parameters of the other functional layers are By writing to the memristor arrays corresponding to other functional layers, the conductances of the memristor arrays corresponding to other functional layers are determined.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速プロセッサをさらに提供し、当該並列加速プロセッサは、前述の並列加速方法を実行するために用いられる。図９は、本開示のいくつかの実施例に係るメモリスタに基づくニューラルネットワークの並列加速プロセッサの模式図である。 At least one embodiment of the present disclosure further provides a memristor-based neural network parallel acceleration processor, which is used to perform the aforementioned parallel acceleration method. FIG. 9 is a schematic diagram of a memristor-based neural network parallel acceleration processor according to some embodiments of the present disclosure.

例えば、図６に示すように、当該ニューラルネットワークは、順次に設定された複数の機能層を含み、当該複数の機能層は、第１機能層を含む。例えば、図９に示すように、当該並列加速プロセッサは、複数の算出コアを含み、各メモリスタの算出コアの間は、互いに通信することができる。同時に、各算出コアの内部は、さらに複数のメモリスタアレイ算出ユニットを含む。 For example, as shown in FIG. 6, the neural network includes a plurality of sequentially set functional layers, and the plurality of functional layers includes a first functional layer. For example, as shown in FIG. 9, the parallel acceleration processor may include multiple computational cores, and the computational cores of each memristor may communicate with each other. At the same time, inside each computing core further includes a plurality of memristor array computing units.

例えば、いくつかの実施例において、複数のメモリスタアレイ算出ユニットは、複数の第１メモリスタアレイ算出ユニットを含み、第１機能層の重みパラメータは、複数の第１メモリスタアレイ算出ユニットに書き込まれ、複数の第１メモリスタアレイ算出ユニットは、前記第１機能層の操作に対応する演算を並列して実行するように構成される。すなわち、ニューラルネットワークにおけるある機能層の重みを異なる算出コア又はメモリスタアレイ算出ユニットにプログラミングして書き込むことにより、複数のメモリスタアレイの当該機能層に対する操作に対する並列加速算出を実現することができる。例えば、複数の第１メモリスタアレイは、前述のいずれか一つの実施例に係る並列加速方法を採用して第１機能層の操作の並列加速算出を実現することができる。 For example, in some embodiments, the plurality of memristor array calculation units includes a plurality of first memristor array calculation units, and the weight parameters of the first functional layer are written to the plurality of first memristor array calculation units. and the plurality of first memristor array computation units are configured to perform operations corresponding to operations of the first functional layer in parallel. That is, by programming and writing the weights of a functional layer in the neural network to different computation cores or memristor array computation units, parallel acceleration computation for operations on the functional layer of multiple memristor arrays can be achieved. For example, the plurality of first memristor arrays can adopt the parallel acceleration method according to any one of the above embodiments to realize parallel acceleration calculation of the operation of the first functional layer.

図１０は、図９に示す並列加速プロセッサにおけるメモリスタアレイ算出ユニットの構造模式図である。以下、図９に示すメモリスタアレイ算出ユニットの構造を参照しながらメモリスタアレイ算出ユニットの動作原理を詳細に説明する。 FIG. 10 is a structural schematic diagram of a memristor array calculation unit in the parallel acceleration processor shown in FIG. Hereinafter, the principle of operation of the memristor array calculation unit will be described in detail with reference to the structure of the memristor array calculation unit shown in FIG.

例えば、図１０に示すように、メモリスタアレイ算出ユニットは、メモリスタアレイ及び周辺回路を含む。 For example, as shown in FIG. 10, the memristor array calculation unit includes a memristor array and peripheral circuits.

例えば、一部の例において、図１０に示すように、メモリスタアレイは、１２８×１２８個のメモリスタを含み、本開示の実施例は、これを含むがこれに限定されない。例えば、一部の例において、図１０に示すように、周辺回路は、スイッチアレイ、マルチプレクサ、サンプルホールドモジュール（Ｓ＆Ｈモジュール）、アナログデジタル変換モジュール（ＡＤＣ）及びシフト＆アキュムレータ（Ｓｈ＆Ａ）等を含む。 For example, in some examples, the memristor array includes 128×128 memristors, and embodiments of the present disclosure include, but are not limited to, as shown in FIG. For example, in some examples, the peripheral circuits include switch arrays, multiplexers, sample-and-hold modules (S&H modules), analog-to-digital conversion modules (ADCs) and shift & accumulators (Sh&A), etc., as shown in FIG.

例えば、一部の例において、図１０に示すように、メモリスタアレイ算出ユニットの入力は、複数の８ビット（８－ｂｉｔ）の入力データを含む。例えば、各入力データの各ビットは、一つの制御パルスに対応し、各制御パルスは、各ビットの値に応じて符号化を行い、具体的な符号化方式は、以下のとおりである。

ここで、ｓ＝０、…、Ｂ－１であり、Ｂは、入力データのビット数（例えば、図１０に示すように、Ｂ＝８）を表し、Ｖ_ｋは、第ｋ行の入力データに対応する電圧励起を表し、Ｖ_Ｒは、一定の基準電圧（例えば、図１０に示された読み出し電圧）を表し、ａ_ｋ，ｓは、［ｓ］番目の制御パルスのレベルを表す。例えば、一部の例において、ａ_ｋ，ｓは、８ビットの入力データａ_ｋのバイナリコード（ａ_ｋ，７、ａ_ｋ，６、…、ａ_ｋ，０）のうちの一つに対応することができる。ａ_ｋ，ｓ＝１の場合、［ｓ］番目の制御パルスがハイレベルであることを表し、それによりスイッチアレイにおける対応するスイッチをオンにすることができ、読み出し電圧Ｖ_Ｒをメモリスタアレイの第ｋ行に提供する。ａ_ｋ，ｓ＝０の場合、［ｓ］番目の制御パルスがローレベルであることを表し、それによりスイッチアレイにおける対応するスイッチをオフにすることができ、同時に、スイッチアレイにおける他のスイッチをオンにし、接地レベルをメモリセルアレイのｋ行目に提供し、すなわちこの時にメモリアレイの第ｋ行に信号を提供しない。 For example, in some examples, as shown in FIG. 10, the input of the memristor array computation unit includes a plurality of 8-bit input data. For example, each bit of each input data corresponds to one control pulse, and each control pulse is encoded according to the value of each bit. The specific encoding method is as follows.

where s= ₀ , . , V _R represents a constant reference voltage (eg, the read voltage shown in FIG. 10), and a _k,s represents the level of the [s] th control pulse. For example, in some examples, a _k,s corresponds to one of the binary codes (a _k,7 , a _k,6 , . . . , a _k,0 ) of the 8-bit input data a _k . be able to. If a _k,s =1, it means that the [s] th control pulse is high level, which can turn on the corresponding switch in the switch array, and apply the read voltage V _R to the memristor array. Provided in line k. If a _k,s =0, it means that the [s] th control pulse is at low level, which can turn off the corresponding switch in the switch array, and at the same time turn off the other switches in the switch array. It is turned on and the ground level is provided to the kth row of the memory cell array, ie no signal is provided to the kth row of the memory array at this time.

理解すべきなのは、図１０に示すように、一方では、複数の入力データは、メモリスタアレイに並列して入力される。他方では、各入力データは、複数（例えば、８個）の制御パルスに対応して特徴付けられ、当該複数の制御パルスは、直列にメモリスタセルアレイに入力される。当然のことながら、異なる入力データに対応する同一順序の制御パルスは、並列にメモリスタセルアレイに入力される。 It should be understood that, as shown in FIG. 10, on the one hand, a plurality of input data are input in parallel to the memristor array. On the other hand, each input data is characterized corresponding to a plurality (eg, 8) of control pulses, which are serially input to the memristor cell array. As a matter of course, the same sequence of control pulses corresponding to different input data are input in parallel to the memristor cell array.

キルヒホッフの法則に基づいて、メモリスタアレイの出力電流は、

により得ることができ、
ここで、ｋ＝１、…、ｍであり、ｊ＝１、…、ｎであり、ｍは、メモリスタアレイの行数を示し、ｎは、メモリスタアレイの列数を示し、ｉ_ｊは、全ての入力データに対応するメモリスタアレイの第ｊ列の出力電流を示し、ｉ_ｊ，ｓは、全ての［ｓ］番目の制御パルスに対応するメモリスタアレイの第ｊ列のパルス出力電流を示し、ｇ_ｋ，ｊは、メモリスタアレイのコンダクタンス行列を示す。 Based on Kirchhoff's law, the output current of the memristor array is

can be obtained by
where k=1, . . . , m and _j =1, . , indicates the output current of the j-th column of the memristor array corresponding to all input data, and i _j,s is the pulse output current of the j-th column of the memristor array corresponding to all [s]th control pulses. and g _k,j denotes the conductance matrix of the memristor array.

当該式から分かるように、全ての入力データに対応する全ての［ｓ］番目の制御パルスがスイッチアレイに印加される場合、読出電圧Ｖ_Ｒは、高レベルの制御パルスの調整で並列にメモリスタアレイに印加することができ、それにより、メモリスタアレイは複数のパルス出力電流ｉ_ｊ，ｓを対応して出力し、ここで、ｉ_ｊ，ｓは以下の式（５）である。

As can be seen from the equation, when every [s]-th control pulse corresponding to all input data is applied to the switch array, the read voltage V _R is parallel to the memristor with the adjustment of the high-level control pulse. The array can be applied to cause the memristor array to correspondingly output a plurality of pulsed output currents i _j,s where i _j,s is Equation (5) below.

なお、図１０に示す実施例において、上記式に基づいて各制御パルスに対応するパルス出力電流に重み付け（パルス出力電流ｉ_ｊ，ｓに対応する重み値が２ｓである）を加算して第ｊ列の出力電流ｉ_ｊを得るものではない。例えば、図１０に示すように、各パルス出力電流は、サンプルホールド（Ｓ＆Ｈ）モジュールにより保持可能な電圧信号に変換され、次にアナログデジタル変換モジュールによりデジタル情報（例えば、バイナリデジタル情報）に量子化され、最後にシフト＆アキュムレータにより各パルス出力電流に対応するバイナリデジタル情報をシフト積算する。例えば、パルス出力電流ｉ_ｊ，１に対応するバイナリデジタル情報は、パルス出力電流ｉ_ｊ，０に対応するバイナリデジタル情報に対して一つのビット（すなわち前者の最下位ビットが後者の最後から２番目の下位ビットに対応する）だけ前に移動し、パルス出力電流ｉ_ｊ，２に対応するバイナリデジタル情報は、パルス出力電流ｉ_ｊ，１に対応するバイナリデジタル情報に対して一つのビットだけ前に移動し、これによって類推する。 In the embodiment shown in FIG. 10, a weighting (the weighting value corresponding to the pulse output current i _j,s is 2s) is added to the pulse output current corresponding to each control pulse based on the above equation, and the j-th It does not obtain the column output current _ij . For example, as shown in FIG. 10, each pulsed output current is converted into a holdable voltage signal by a sample and hold (S&H) module and then quantized into digital information (e.g., binary digital information) by an analog-to-digital conversion module. Finally, the shift & accumulator shifts and accumulates the binary digital information corresponding to each pulse output current. For example, the binary digital information corresponding to pulsed output current i _j,1 is one bit for the binary digital information corresponding to pulsed output current i _j,0 (i.e., the least significant bit of the former is the penultimate bit of the latter). ), and the binary digital information corresponding to the pulse output current i _j,2 moves one bit forward with respect to the binary digital information corresponding to the pulse output current i _j,1 Go and by analogy with this.

例えば、いくつかの実施例において、図１０に示すように、メモリスタアレイの各列の出力は、二組のサンプルホールドモジュールにより交互に変換し、それによりハードウェア動作時の並列性を増加させることができる。同時に、プロセッサチップの消費電力及び面積を節約するために、アナログデジタル変換（ＡＤＣ）モジュールは、時分割多重化方式で動作することができ、例えば４列の出力は、一つのアナログデジタル変換モジュールを共有する。メモリスタアレイ算出ユニットが動作する時、現在の時刻に［ｓ］番目のビット（すなわち［ｓ］番目の制御パルス）を算出ユニットの入力信号とし、信号を切り替えてスイッチアレイを制御することにより第１組のサンプルホールドモジュールをゲーティングし、列上のパルス出力電流を同時に対応する電圧に変換して出力する。同時に、アナログデジタル変換モジュールは、マルチプレクサにより補助されて、前の時刻（すなわち［ｓ－１］番目の制御パルスに対応する時刻）のパルス出力電流を高速に量子化する。続いて、次の時刻に、［ｓ＋１］番目のビット（すなわち［ｓ＋１］番目の制御パルス）を算出ユニットの入力信号とし、信号を切り替えてスイッチアレイを制御することにより第２組のサンプルホールドモジュールをストローブし、同時にアナログデジタル変換モジュールは、前の第１組のサンプルホールドモジュールが保持した電圧値を量子化する。例えば、メモリスタアレイ算出ユニットの動作過程において、全てのスイッチ切り替え操作は、いずれもマルチプレクサを制御することにより実現することができる。 For example, in some embodiments, as shown in FIG. 10, the output of each column of the memristor array is alternately converted by two sets of sample-and-hold modules, thereby increasing parallelism in hardware operation. be able to. At the same time, in order to save the power consumption and area of the processor chip, the analog-to-digital conversion (ADC) module can work in a time-division multiplexing manner, for example, four columns of output can be connected to one analog-to-digital conversion module. Share. When the memristor array calculation unit operates, the [s]th bit (that is, the [s]th control pulse) at the current time is taken as the input signal of the calculation unit, and the signal is switched to control the switch array to control the switch array. A set of sample-and-hold modules is gated to simultaneously convert the pulsed output current on the string to a corresponding voltage for output. At the same time, the analog-to-digital conversion module, assisted by a multiplexer, rapidly quantizes the pulse output current of the previous time (ie, the time corresponding to the [s−1]th control pulse). Subsequently, at the next time, the [s+1]th bit (i.e., the [s+1]th control pulse) is taken as the input signal of the calculation unit, and the signal is switched to control the switch array to control the switch array of the second set of sample-and-hold modules. and at the same time the analog-to-digital conversion module quantizes the voltage values held by the previous first set of sample-and-hold modules. For example, in the working process of the memristor array calculation unit, all switching operations can be realized by controlling multiplexers.

なお、図９に示す並列加速プロセッサと図１０に示すメモリスタアレイ算出ユニットは、いずれも例示的なものであり、本開示の実施例は、その具体的な実現形態及び細部をいずれも限定しない。 It should be noted that the parallel acceleration processor shown in FIG. 9 and the memristor array calculation unit shown in FIG. 10 are both exemplary, and the embodiments of the present disclosure do not limit their specific implementations and details. .

本開示の実施例に係る並列加速プロセッサの技術的効果は、前述の実施例における並列加速方法に対応する説明を参照することができ、ここで説明を省略する。 For the technical effects of the parallel acceleration processor according to the embodiments of the present disclosure, reference can be made to the description corresponding to the parallel acceleration method in the previous embodiments, and the description is omitted here.

本開示の少なくとも一つの実施例は、メモリスタに基づくニューラルネットワークの並列加速装置をさらに提供する。図１１は、本開示のいくつかの実施例に係るメモリスタに基づくニューラルネットワークの並列加速装置の概略ブロック図である。例えば、図１１に示すように、当該並列加速装置は、上記実施例に係る並列加速プロセッサと、当該並列加速プロセッサに接続された入力インタフェース及び出力インタフェースとを含む。例えば、当該並列加速装置は、その中の並列加速プロセッサにより前述の並列加速方法を実行することができる。 At least one embodiment of the present disclosure further provides a memristor-based neural network parallel accelerator. FIG. 11 is a schematic block diagram of a memristor-based neural network parallel accelerator in accordance with some embodiments of the present disclosure. For example, as shown in FIG. 11, the parallel acceleration device includes the parallel acceleration processor according to the above embodiment, and an input interface and an output interface connected to the parallel acceleration processor. For example, the parallel acceleration device can perform the aforementioned parallel acceleration method by means of a parallel acceleration processor therein.

例えば、一部の例において、図１１に示すように、当該並列加速装置は、システムバスをさらに含み、並列加速プロセッサと入力インタフェース及び出力インタフェースとの間は、システムバスにより互いに通信することができる。例えば、入力インタフェースは、外部コンピュータ装置、ユーザ等から命令を受信して前記並列加速プロセッサの作動等を制御するように構成される。例えば、並列加速装置は、前記並列加速プロセッサの作動結果等を出力するように構成される。例えば、入力インタフェース及び出力インタフェースを介して並列加速装置と通信する外部装置は、いかなるタイプの、ユーザがそれと対話可能なユーザインタフェースを提供する環境に含まれてもよい。ユーザインタフェースの種類としては、例えば、グラフィカルユーザインタフェース、自然ユーザインタフェースなどが挙げられる。例えば、グラフィカルユーザインタフェースは、ユーザがキーボード、マウス、リモコンなどのような入力装置を採用する入力を受信し、かつディスプレイなどの出力装置に出力を提供することができる。また、自然ユーザインタフェースは、ユーザがキーボード、マウス、リモコンなどのような入力装置に強く制約される必要がない方式で並列加速装置と対話することができる。それに対して、自然ユーザインタフェースは、音声認識、タッチ及び指示ペン識別、スクリーン上及びスクリーン近傍のジェスチャ認識、エアジェスチャ、頭部及び眼追跡、音声及び音、視覚、タッチ、ジェスチャ、及び機器インテリジェント等に依存することができる。
また、並列加速装置は、図１１において単一のシステムとして示されるが、並列加速装置は、分散システムであってもよく、さらにクラウド施設（パブリックネットワーク又は、プライベートクラウドを含む）に配置されてもよい。したがって、例えば、複数の装置は、ネットワーク接続を介して通信することができかつ共同で並列加速装置により実行されると記述されるタスクを実行することができる。 For example, in some examples, as shown in FIG. 11, the parallel accelerator further includes a system bus, and the parallel accelerator processor and the input and output interfaces can communicate with each other via the system bus. . For example, the input interface is configured to receive instructions from an external computing device, user, or the like to control the operation, or the like, of the parallel acceleration processor. For example, the parallel accelerator is configured to output an operation result of the parallel acceleration processor. For example, an external device that communicates with the parallel accelerator via input and output interfaces may be included in any type of environment that provides a user interface with which a user can interact. Types of user interfaces include, for example, graphical user interfaces and natural user interfaces. For example, a graphical user interface can receive input from a user employing input devices such as a keyboard, mouse, remote control, etc., and provide output to an output device such as a display. Also, the natural user interface allows the user to interact with the parallel accelerator in a manner that does not require the user to be strongly constrained to input devices such as keyboards, mice, remote controls, and the like. In contrast, natural user interfaces include voice recognition, touch and pointing pen identification, on- and near-screen gesture recognition, air gestures, head and eye tracking, voice and sound, vision, touch, gestures, and machine intelligence. can rely on.
Also, although the parallel accelerator is shown as a single system in FIG. 11, the parallel accelerator may be a distributed system and even located in a cloud facility (including a public network or private cloud). good. Thus, for example, multiple devices can communicate over a network connection and jointly perform tasks described as being performed by a parallel accelerator.

例えば、並列加速方法の実行過程は、上記並列加速方法の実施例における関連説明を参照することができ、これを重複して説明しない。 For example, the execution process of the parallel acceleration method can refer to the related descriptions in the embodiments of the parallel acceleration method above, and will not be described repeatedly.

なお、本開示の実施例に係る並列加速装置は、例示的なものでありであり、限定されるものではなく、実際の応用ニーズに応じて、当該並列加速装置は、さらに他の一般的な部品又は構造を含むことができ、例えば、並列加速装置の必要な機能を実現するために、当業者は、具体的な応用シーンに基づいて他の一般的な部品又は構造を設定することができ、本開示の実施例は、これを限定しない。 It should be noted that the parallel accelerator according to the embodiments of the present disclosure is only an example and not a limitation. For example, to achieve the required functions of the parallel accelerator, those skilled in the art can set up other common parts or structures based on the specific application scene. , the embodiments of the present disclosure do not limit this.

本開示の実施例に係る並列加速装置の技術的効果は、前述の実施例における並列加速方法及び並列加速プロセッサに対応する説明を参照することができ、ここで説明を省略する。 For the technical effects of the parallel acceleration apparatus according to the embodiments of the present disclosure, reference can be made to the description corresponding to the parallel acceleration method and the parallel acceleration processor in the previous embodiments, and the description is omitted here.

本開示に対しては、下記の点をさらに説明する必要がある。
（１）本開示の実施例の図面は、本開示の実施例の関する構造のみに関し、他の構造について通常の設計を参照することができる。 For this disclosure, the following points need further explanation.
(1) The drawings of the embodiments of the present disclosure only relate to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.

（２）コンフリクトがない場合、本開示の同一の実施例及び異なる実施例における特徴は、互いに組み合わせることができる。 (2) In the absence of conflict, features in the same and different embodiments of the disclosure may be combined with each other.

上記は、本開示の具体的な実施形態にすぎないが、本開示の保護範囲はそれに限定されない。当業者によって、本開示に開示された技術的範囲内に容易に想像できるいかなる変更または置換は、本開示の保護範囲内にカバーされるべきである。したがって、本開示の保護範囲は、特許請求の範囲の保護範囲に従うべきである。 The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any change or replacement that can be easily imagined by a person skilled in the art within the technical scope disclosed in the present disclosure shall be covered within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should follow the protection scope of the claims.

Claims

A parallel acceleration method for a memristor-based neural network, comprising:
The neural network includes a plurality of sequentially arranged functional layers, the plurality of functional layers being a first functional layer including a plurality of first memristor arrays arranged in parallel and located after the first functional layer. and a second functional layer for performing, the plurality of first memristor arrays are used to perform operations of the first functional layer in parallel and output operation results to the second functional layer,
The parallel acceleration method includes:
A parallel acceleration method, comprising the step of performing operations of the first functional layer in parallel using the plurality of first memristor arrays and outputting the operation results to the second functional layer.

The step of executing the operations of the first functional layer in parallel using the plurality of first memristor arrays and outputting the operation results to the second functional layer includes:
dividing the input data received by the first functional layer into a plurality of sub-input data corresponding to the plurality of first memristor arrays;
performing in parallel the operations of the first functional layer on the plurality of sub-input data using the plurality of first memristor cell arrays to correspondingly generate a plurality of sub-operation results. A parallel acceleration method according to claim 1.

3. The parallel acceleration method of claim 2, further comprising joining the plurality of sub-operation results and using the second functional layer to perform the operation of the second functional layer on the joined result.

4. The parallel acceleration method according to claim 2 or 3, wherein sizes of said plurality of sub-input data are basically the same.

The step of executing the operations of the first functional layer in parallel using the plurality of first memristor arrays and outputting the operation results to the second functional layer includes:
respectively providing a plurality of input data received by the first functional layer to the plurality of first memristor arrays;
using at least a portion of the plurality of first memristor cell arrays to perform the operations of the first functional layer in parallel on the plurality of received input data to correspond to a plurality of sub-operation results; 2. The parallel acceleration method of claim 1, comprising the steps of:

6. The parallel acceleration method of claim 5, further comprising performing operations of the second functional layer on each of the plurality of sub-operation results using the second functional layer.

7. The parallel acceleration method according to claim 5, wherein said plurality of input data are different from each other.

The parallel acceleration method according to any one of claims 1 to 7, wherein said neural network is a convolutional neural network.

9. The parallel acceleration method of claim 8, wherein said first functional layer is an initial convolutional layer of said neural network.

A parallel acceleration method according to any one of claims 1 to 9, wherein said plurality of functional layers further comprises a third functional layer, the output of said third functional layer being provided to said first functional layer.

The weight parameters of the neural network are obtained by off-chip training, the weight parameters of the neural network include the weight parameters of the first function layer, and the weight parameters of the first function layer are the plurality of first memristors. A parallel acceleration method according to any one of claims 1 to 10, wherein the conductance of said plurality of first memristor arrays is determined by writing to the array.

The weight parameters of the neural network further include weight parameters of functional layers other than the first functional layer, and the weight parameters of the other functional layers are written in memristor arrays corresponding to the other functional layers. 12. The parallel acceleration method according to claim 11, wherein the conductance of the memristor array corresponding to said other functional layer is determined at .

A memristor-based neural network parallel acceleration processor comprising:
The neural network includes a plurality of sequentially set functional layers, the plurality of functional layers including a first functional layer,
The parallel acceleration processor includes a plurality of memristor array calculation units, the plurality of memristor array calculation units includes a plurality of first memristor array calculation units, and the weight parameter of the first functional layer is the plurality of memristor array calculation units. wherein the plurality of first memristor array computation units are configured to perform in parallel operations corresponding to operations of the first functional layer .

A parallel accelerator for a memristor-based neural network, comprising:
A parallel acceleration processor according to claim 13;
an input interface and an output interface connected to the parallel acceleration processor;
A parallel acceleration apparatus, wherein the input interface is configured to receive instructions to control operation of the parallel acceleration processor, and the output interface is configured to output an operation result of the parallel acceleration processor.