JP2023517921A

JP2023517921A - Processors and implementations, electronics, and storage media

Info

Publication number: JP2023517921A
Application number: JP2022554384A
Authority: JP
Inventors: ヤン、シアオピン
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2021-08-05
Publication date: 2023-04-27
Anticipated expiration: 2041-08-05
Also published as: KR20220122756A; EP4075759A4; EP4075759A1; JP7379794B2; US20230179546A1; WO2022037422A1; CN112152947A; US11784946B2; CN112152947B

Abstract

本開示は、人工知能及びディープラーニングの分野に関するプロセッサ及び実現方法、電子機器、及び記憶媒体を開示し、前記プロセッサは、所定のデータパケット情報をデータパッキングアンパッキングモジュールに送信するためのシステムコントローラと、データパケット情報に基づいてストレージアレイモジュールから対応するデータパケットデータを取得し、データパケット情報とパッキングし、パッキングされた第１のデータパケットを演算モジュールに送信して演算処理を行い、演算モジュールによって返された第２のデータパケットを取得し、アンパッキングによって演算結果データを取得し、ストレージアレイモジュールに記憶するためのデータパッキングアンパッキングモジュールと、データ記憶を行うためのストレージアレイモジュールと、取得された第１のデータパケットに対して演算処理を行い、演算結果データに基づいて第２のデータパケットを生成し、データパッキングアンパッキングモジュールに返すための演算モジュールと、を含む。本開示に記載された方案を適用することによって、設計の難易度を低下し、全体的な処理効率などを向上させることができる。【選択図】図１The present disclosure discloses a processor and implementation method, an electronic device and a storage medium in the fields of artificial intelligence and deep learning, wherein the processor is a system controller for sending predetermined data packet information to a data packing and unpacking module. acquires corresponding data packet data from the storage array module based on the data packet information, packs it with the data packet information, transmits the packed first data packet to the arithmetic module for arithmetic processing, and the arithmetic module performs a data packing and unpacking module for obtaining the returned second data packet, obtaining operation result data by unpacking, and storing in the storage array module; a storage array module for performing data storage; an arithmetic module for performing arithmetic processing on the first data packet, generating a second data packet based on the arithmetic result data, and returning the second data packet to the data packing and unpacking module. By applying the schemes described in the present disclosure, it is possible to reduce design difficulty and improve overall processing efficiency. [Selection drawing] Fig. 1

Description

［関連出願の相互参照］
本開示は、出願日が２０２０年０８月２１日であり、出願番号が２０２０１０８５１７５７７であり、発明の名称が「プロセッサ及び実現方法、電子機器、及び記憶媒体」である中国特許出願の優先権を主張する。
本開示は、コンピュータアプリケーション技術に関し、特に、人工知能及びディープラーニング分野のプロセッサ及び実現方法、電子機器、及び記憶媒体に関する。 [Cross reference to related applications]
This disclosure claims priority to a Chinese patent application with filing date Aug. 21, 2020, filing number 2020108517577 and titled "Processor and implementation method, electronic device, and storage medium" do.
The present disclosure relates to computer application technology, and more particularly to processors and implementation methods, electronic devices, and storage media in the fields of artificial intelligence and deep learning.

ますますインテリジェント化したアプリケーションはニューラルネットワークアルゴリズムをより多様化させ、ニューラルネットワークモデル全体をますます複雑にし、それに応じて、より大量の演算とデータの記憶のインタラクションをもたらし、そのためニューラルネットワークプロセッサ（ＮＰＵ、ＮｅｔｗｏｒｋＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）チップなどのニューラルネットワークに基づくプロセッサがますます重視されている。 Increasingly intelligent applications are making neural network algorithms more diverse, making the overall neural network model more complex, and correspondingly more computational and data-storage interactions, thus requiring more neural network processors (NPUs). There is increasing emphasis on neural network-based processors such as Network Processing Unit chips.

現在のＮＰＵには、加速器をコアとするか又は命令拡張をコアとする２種類の主流の設計方式が含まれており、その中の前者の設計方式は、汎用性と拡張性が劣っているため、滅多に使用されておらず、主に後者の設計方式が使用されている。しかし、後者の設計方式では、ニューラルネットワークの演算操作に対応する煩雑な命令セットを拡張する必要があり、専用のコンパイラを開発してサポートする必要があるなど、設計の難度が高く、特に音声データのリアルタイム処理に応用する場合に難度がさらに高くなる。 The current NPU includes two mainstream design methods, one with accelerator as the core and the other with instruction extension as the core, the former design method being inferior in versatility and expandability. Therefore, it is rarely used, and the latter design method is mainly used. However, in the latter design method, it is necessary to expand the complicated instruction set corresponding to the arithmetic operation of the neural network, and it is necessary to develop a dedicated compiler to support it. The difficulty becomes even higher when applying to real-time processing of

本開示は、プロセッサ及び実現方法、電子機器、及び記憶媒体を提供する。 The present disclosure provides processors and implementations, electronics, and storage media.

プロセッサであって、システムコントローラ、ストレージアレイモジュール、データパッキングアンパッキングモジュール、及び演算モジュールを含み、
前記システムコントローラは、所定のデータパケット情報を前記データパッキングアンパッキングモジュールに送信するために用いられ、
前記データパッキングアンパッキングモジュールは、前記データパケット情報に基づいて前記ストレージアレイモジュールから対応するデータパケットデータを取得し、前記データパケットデータと前記データパケット情報をパッキングし、パッキングされた第１のデータパケットを前記演算モジュールに送信して演算処理を行い、前記演算モジュールによって返された第２のデータパケットを取得し、前記第２のデータパケットをアンパッキングして演算結果データを取得して、前記ストレージアレイモジュールに記憶するために用いられ、
前記ストレージアレイモジュールは、データ記憶を行うために用いられ、
前記演算モジュールは、取得された前記第１のデータパケットに対して演算処理を行い、演算結果データに基づいて前記第２のデータパケットを生成し、前記データパッキングアンパッキングモジュールに返すために用いられる。 a processor, including a system controller, a storage array module, a data packing and unpacking module, and a computing module;
the system controller is used to send predetermined data packet information to the data packing and unpacking module;
The data packing and unpacking module obtains corresponding data packet data from the storage array module based on the data packet information, packs the data packet data and the data packet information, and packs a first packed data packet. to the arithmetic module for arithmetic processing, obtaining a second data packet returned by the arithmetic module, unpacking the second data packet to obtain arithmetic result data, and storing the storage used to store in the array module,
The storage array module is used to store data,
The arithmetic module is used to perform arithmetic processing on the obtained first data packet, generate the second data packet based on the arithmetic result data, and return the second data packet to the data packing and unpacking module. .

プロセッサ実現方法であって、
システムコントローラ、ストレージアレイモジュール、データパッキングアンパッキングモジュール、及び演算モジュールで構成されるプロセッサを構築するステップと、
前記プロセッサを使用してニューラルネットワーク演算を行うステップと、を含み、前記システムコントローラは、所定のデータパケット情報を前記データパッキングアンパッキングモジュールに送信するために用いられ、前記データパッキングアンパッキングモジュールは、前記データパケット情報に基づいて前記ストレージアレイモジュールから対応するデータパケットデータを取得し、前記データパケットデータと前記データパケット情報をパッキングし、パッキングされた第１のデータパケットを前記演算モジュールに送信して演算処理を行い、前記演算モジュールによって返された第２のデータパケットを取得し、前記第２のデータパケットをアンパッキングして演算結果データを取得して、前記ストレージアレイモジュールに記憶するために用いられ、前記ストレージアレイモジュールは、データ記憶を行うために用いられ、前記演算モジュールは、取得された前記第１のデータパケットに対して演算処理を行い、演算結果データに基づいて前記第２のデータパケットを生成し、前記データパッキングアンパッキングモジュールに返すために用いられる。 A processor implementation method comprising:
building a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and a computing module;
performing neural network operations using the processor, wherein the system controller is used to send predetermined data packet information to the data packing and unpacking module, the data packing and unpacking module comprising: obtaining corresponding data packet data from the storage array module based on the data packet information, packing the data packet data and the data packet information, and transmitting a packed first data packet to the computing module; used to perform arithmetic processing, obtain a second data packet returned by the arithmetic module, and unpack the second data packet to obtain arithmetic result data for storage in the storage array module; The storage array module is used for data storage, and the arithmetic module performs arithmetic processing on the obtained first data packet, and generates the second data based on the arithmetic result data. Used to generate packets and return them to the data packing and unpacking module.

電子機器であって、少なくとも一つのプロセッサと、前記少なくとも一つのプロセッサに通信接続されたメモリと、を含み、前記メモリに前記少なくとも一つのプロセッサにより実行可能な命令が記憶されており、前記命令が前記少なくとも一つのプロセッサにより実行されると、前記少なくとも一つのプロセッサが上記の方法を実行させる。 An electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, the memory storing instructions executable by the at least one processor, the instructions comprising: When executed by the at least one processor, the at least one processor causes the above method to be performed.

コンピュータ命令が記憶されている非一時的なコンピュータ読み取り可能な記憶媒体であって、前記コンピュータ命令は、前記コンピュータに上記の方法を実行させる。 A non-transitory computer-readable storage medium having computer instructions stored thereon, said computer instructions causing said computer to perform the above method.

上記の開示の１つの実施例は、利点又は有益な効果を有する。ストレージ及びコンピューティングの統合された実現方式を提出し、プロセッサにおいてニューラルネットワークが記憶から演算への全体的なインタラクションを完了し、複雑な命令設計と難易度の高いコンパイラ開発などを回避し、設計難易度を低下させ、全体的な処理効率などを向上させる。 One embodiment of the above disclosure has advantages or beneficial effects. Propose an integrated implementation method of storage and computing, complete the overall interaction of neural network from memory to operation in the processor, avoid complicated instruction design and difficult compiler development, etc., and design difficulty reduce the speed and improve the overall processing efficiency, etc.

本明細書で説明された内容は、本開示の実施例のキー又は重要な特徴を特定することを意図しておらず、本開示の範囲を制限するためにも使用されないことを理解されたい。本開示の他の特徴は、以下の明細書を通じて容易に理解できる。 It should be understood that nothing described herein is intended to identify key or critical features of embodiments of the disclosure, nor is it used to limit the scope of the disclosure. Other features of the present disclosure can be readily understood through the following specification.

図面は、本開示をより良く理解するためのものであり、本開示を限定しない。
本開示のプロセッサ１０の第１の実施例の構成の概略構造図である。本開示のプロセッサ１０の第２の実施例の構成の概略構造図である。本開示のプロセッサ１０の第３の実施例の構成の概略構造図である。本開示のプロセッサ実現方法の実施例のフローチャートである。本開示の実施例による方法の電子機器のブロック図である。 The drawings are for a better understanding of the disclosure and do not limit the disclosure.
1 is a schematic structural diagram of the configuration of a first embodiment of a processor 10 of the present disclosure; FIG. FIG. 4 is a schematic structural diagram of the configuration of the second embodiment of the processor 10 of the present disclosure; 3 is a schematic structural diagram of the configuration of the third embodiment of the processor 10 of the present disclosure; FIG. 4 is a flowchart of an embodiment of a processor-implemented method of the present disclosure; FIG. 3 is a block diagram of the electronics of the method according to an embodiment of the present disclosure;

以下、図面に基づいて、本開示の例示的な実施例を説明する。理解を容易にするために、本開示の実施例の様々な詳細が含まれており、それらは単なる例示と見なされるべきである。従って、当業者は、本開示の範囲及び精神から逸脱することなく、本明細書に記載の実施形態に対して様々な変更及び修正を行うことができることを認識するはずである。同様に、簡明のために、以下の説明では、よく知られた機能と構造の説明は省略される。 Exemplary embodiments of the present disclosure will now be described with reference to the drawings. Various details of the embodiments of the disclosure are included for ease of understanding and should be considered as exemplary only. Accordingly, those skilled in the art should appreciate that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Similarly, for the sake of clarity, descriptions of well-known functions and constructions are omitted in the following description.

また、本明細書の用語「及び／又は」は、関連対象の関連関係のみを説明するものであり、３種類の関係が存在可能であることを表し、例えば、Ａ及び／又はＢは、Ａのみが存在するか、Ａ及びＢが同時に存在するか、Ｂのみが存在するという３つの場合を表すことができる。符号「／」は、一般的に前後の関連対象が「又は」の関係であることを表すことを理解されたい。 Also, the term "and/or" in this specification describes only the related relationship of related objects, and represents that three types of relationships can exist, for example, A and / or B is A Three cases can be represented: there is only A, A and B are present at the same time, or only B is present. It should be understood that the symbol "/" generally indicates that the related objects before and after are in an "or" relationship.

図１は本開示のプロセッサ１０の第１の実施例の構成の概略構造図である。図１に示すように、システムコントローラ１０１、ストレージアレイモジュール１０２、データパッキングアンパッキングモジュール１０３、及び演算モジュール１０４を含む。 FIG. 1 is a schematic structural diagram of the configuration of the first embodiment of processor 10 of the present disclosure. As shown in FIG. 1, it includes a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 and an arithmetic module 104 .

システムコントローラ１０１は、所定のデータパケット情報をデータパッキングアンパッキングモジュール１０３に送信するために用いられる。 System controller 101 is used to send predetermined data packet information to data packing and unpacking module 103 .

データパッキングアンパッキングモジュール１０３は、データパケット情報に基づいてストレージアレイモジュール１０２から対応するデータパケットデータを取得し、データパケットデータとデータパケット情報をパッキングし、パッキングされた第１のデータパケットを演算モジュール１０４に送信して演算処理を行い、演算モジュール１０４によって返された第２のデータパケットを取得し、第２のデータパケットをアンパッキングして演算結果データを取得し、ストレージアレイモジュール１０２に記憶するために用いられる。 The data packing and unpacking module 103 obtains corresponding data packet data from the storage array module 102 based on the data packet information, packs the data packet data and the data packet information, and sends the packed first data packet to the computing module. 104 to perform arithmetic processing, obtain a second data packet returned by the arithmetic module 104, unpack the second data packet to obtain arithmetic result data, and store in the storage array module 102. used for

ストレージアレイモジュール１０２は、データ記憶を行うために用いられる。 Storage array modules 102 are used to provide data storage.

演算モジュール１０４は、取得された第１のデータパケットに対して演算処理を行い、演算結果データに基づいて第２のデータパケットを生成し、データパッキングアンパッキングモジュール１０３に返すために用いられる。 The arithmetic module 104 is used to perform arithmetic processing on the acquired first data packet, generate a second data packet based on the arithmetic result data, and return it to the data packing and unpacking module 103 .

上記の実施例ではストレージ及びコンピューティングの統合された実現方式を提出し、プロセッサにおいてニューラルネットワークが記憶から演算への全体的なインタラクションを完了し、複雑な命令設計と難易度の高いコンパイラ開発などを回避し、設計難易度を低下させ、全体的な処理効率などを向上させることを分かることができる。 The above embodiment presents an integrated realization of storage and computing, in which the neural network completes the overall interaction from memory to operation in the processor, eliminating complex instruction design and difficult compiler development, etc. It can be seen that it avoids, reduces the design difficulty, and improves the overall processing efficiency.

図１に示されることに基づいて、プロセッサ１０は、ダイレクトメモリアクセス（ＤＭＡ、ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）モジュール、ルーティング交換モジュールのうちの１つ又は全部をさらに含む。 Based on what is shown in FIG. 1, processor 10 further includes one or all of a Direct Memory Access (DMA) module, a routing and switching module.

好ましくは、上記の２つのモジュールを同時に含むことができ、それに応じて、図２は本開示のプロセッサ１０の第２の実施例の構成の概略構造図である。図２に示すように、システムコントローラ１０１、ストレージアレイモジュール１０２、データパッキングアンパッキングモジュール１０３、演算モジュール１０４、ＤＭＡモジュール１０５、及びルーティング交換モジュール１０６を含む。 Preferably, the above two modules can be included at the same time, accordingly, FIG. 2 is a schematic structural diagram of the configuration of the second embodiment of the processor 10 of the present disclosure. As shown in FIG. 2, it includes a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing and switching module 106 .

その中、ＤＭＡモジュール１０５は、システムコントローラ１０１の制御の下で外部ストレージデータとストレージアレイモジュール１０３の内部ストレージアレイデータの高速交換を実現するために用いられる。 Among them, the DMA module 105 is used to realize high-speed exchange of external storage data and internal storage array data of the storage array module 103 under the control of the system controller 101 .

ルーティング交換モジュール１０６は、データパッキングアンパッキングモジュール１０３から取得した第１のデータパケットを演算モジュール１０４に送信して、演算モジュール１０４から取得した第２のデータパケットをデータパッキングアンパッキングモジュール１０３に送信するために用いられる。 The routing switch module 106 sends the first data packet obtained from the data packing and unpacking module 103 to the computing module 104 and sends the second data packet obtained from the computing module 104 to the data packing and unpacking module 103. used for

図２に示すように、演算モジュール１０４は、汎用演算モジュール１０４１とアクティベーション演算モジュール１０４２をさらに含むことができる。名前が示すように、汎用演算モジュール１０４１は、汎用演算を行うために用いられることができ、アクティベーション演算モジュール１０４２は、アクティベーション演算を行うために用いられることができる。 As shown in FIG. 2, computing module 104 may further include general computing module 1041 and activation computing module 1042 . As the names suggest, general purpose operation module 1041 can be used to perform general purpose operations and activation operation module 1042 can be used to perform activation operations.

システムコントローラ１０１は、単純な制御ロジック又はステートマシン設計を使用することができ、複雑なプロセッサＩＰを含むこともでき、ＩＰは知的財産権（ＩｎｔｅｌｌｅｃｔｕａｌＰｒｏｐｅｒｔ）の略語であり、例えば、前記複雑なプロセッサＩＰは、高度な縮小命令セットマシン（ＡＲＭ、ＡｄｖａｎｃｅｄＲＩＳＣＭａｃｈｉｎｅ）、デジタル信号処理（ＤＳＰ、ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ）、Ｘ８６、マイクロコントローラーユニット（ＭＣＵ、ＭｉｃｒｏｃｏｎｔｒｏｌｌｅｒＵｎｉｔ）コアＩＰなどを含むことができる。 System controller 101 may use a simple control logic or state machine design and may also include complex processor IP, IP being an abbreviation for Intellectual Property, e.g. The processor IP may include advanced reduced instruction set machine (ARM, Advanced RISC Machine), Digital Signal Processing (DSP, Digital Signal Processing), X86, Microcontroller Unit (MCU, Microcontroller Unit) core IP, and the like.

ストレージアレイモジュール１０２は、複数の静的ランダムアクセスメモリ（ＳＲＡＭ、ＳｔａｔｉｃＲａｎｄｏｍ－ＡｃｃｅｓｓＭｅｍｏｒｙ）で構成され、複数のポートの高速同時読み取り又は書き込みをサポートし、マトリックスの方式を使用してデータの高速キャッシュ又は記憶を実現することができる。ストレージアレイモジュール１０２に記憶されたデータは、ニューラルネットワークモデルデータ、外部入力データ、及び中間層の一時データなどを含むことができる。 The storage array module 102 is composed of multiple static random-access memories (SRAMs), supports high-speed simultaneous reading or writing of multiple ports, and uses a matrix scheme for high-speed caching of data. Or memory can be realized. The data stored in the storage array module 102 may include neural network model data, external input data, intermediate layer temporary data, and the like.

データパッキングアンパッキングモジュール１０３は、ストレージアレイモジュール１０２に対してデータ読み取りと記憶操作を行い、システムコントローラ１０１から取得したデータパケット情報及びストレージアレイモジュール１０２のデータパケットデータに対してパッキング操作を行い、パッキングされた第１のデータパケットをルーティング交換モジュール１０６を介して演算モジュール１０４に送信して、演算モジュール１０４がルーティング交換モジュール１０６を介して返された第２のデータパケットをアンパッキングし、取得された演算結果データをストレージアレイモジュール１０２に記憶することができる。 The data packing and unpacking module 103 performs data read and storage operations on the storage array module 102, performs packing operations on data packet information obtained from the system controller 101 and data packet data in the storage array module 102, and performs packing. The first data packet received is sent to the computing module 104 via the routing switch module 106, and the computing module 104 unpacks the second data packet returned via the routing switch module 106 to obtain the obtained Operation result data may be stored in the storage array module 102 .

それに応じて、ルーティング交換モジュール１０６は、データパッキングアンパッキングモジュール１０３と演算モジュール１０４のデータパケットを受信して、データ交換などを行うことができる。 Accordingly, the routing exchange module 106 can receive the data packets of the data packing and unpacking module 103 and the computing module 104 for data exchange and the like.

汎用演算モジュール１０４１によって実行される汎用演算は、ベクトル四則演算、ロジック演算、比較演算、ドット乗算、累積、加算などの汎用のベクトル演算を含むことができる。アクティベーション演算モジュール１０４２によって実行されるアクティベーション演算は、非線形関数ｓｉｇｍｏｉｄ、ｔａｎｈ、ｒｅｌｕ、ｓｏｆｔｍａｘ演算のうちの１つ又は複数などを含むことができる。 General-purpose operations performed by general-purpose operation module 1041 may include general-purpose vector operations such as vector arithmetic, logic operations, comparison operations, dot multiplication, accumulation, and addition. The activation operations performed by the activation operations module 1042 may include one or more of the nonlinear functions sigmoid, tanh, relu, softmax operations, and the like.

システムコントローラ１０１は、全体を管理と制御することができ、例えば、上記のデータパケット情報をデータパッキングアンパッキングモジュール１０２に送信して、データパッキングアンパッキングモジュール１０２がデータのパッキングアンパッキング作業などを行うようにし、ＤＭＡモジュール１０５の起動を担当して外部ストレージデータとストレージアレイモジュール１０２内の内部ストレージアレイデータの高速交換などを実現することができる。 The system controller 101 can manage and control the whole, for example, the above data packet information is sent to the data packing and unpacking module 102, and the data packing and unpacking module 102 performs data packing and unpacking work. In this way, the DMA module 105 can be activated to realize high-speed exchange of external storage data and internal storage array data in the storage array module 102 .

分かるように、上記の実施例では、プロセッサ全体は、ストレージアレイモジュール＋データパッキングアンパッキングモジュール＋ルーティング交換モジュールの本体構造を使用して、ニューラルネットワークが記憶から演算への全体的なインタラクションを完了し、複雑な命令設計と難易度の高いコンパイラ開発などを回避し、設計難易度を低下させ、全体的な処理効率などを向上させる。 As can be seen, in the above embodiment, the whole processor uses the body structure of storage array module + data packing and unpacking module + routing exchange module to allow the neural network to complete the overall interaction from storage to operation. , Avoid complex instruction design and difficult compiler development, etc., reduce design difficulty, improve overall processing efficiency, etc.

図３は本開示のプロセッサ１０の第３の実施例の構成の概略構造図である。図３に示すように、システムコントローラ１０１、ストレージアレイモジュール１０２、データパッキングアンパッキングモジュール１０３、演算モジュール１０４、ＤＭＡモジュール１０５、及びルーティング交換モジュール１０６を含む。その中、ストレージアレイモジュール１０２は、Ｎ１個のストレージユニット１０２１を含むことができ、各ストレージユニット１０２１は、１つのセットのＳＲＡＭなどであってよく、データパッキングアンパッキングモジュール１０３は、Ｎ２個のデータパッキングアンパッキングユニット１０３１を含むことができ、各データパッキングアンパッキングユニット１０３１は、１つのデータチャネルを介してルーティング交換モジュール１０６にそれぞれ接続することができ、Ｎ１とＮ２はいずれも１より大きい正整数であり、また、汎用演算モジュール１０４１は、Ｍ個の演算ユニット１０４１１を含むことができ、アクティベーション演算モジュール１０４２は、Ｐ個の演算ユニット１０４２１を含むことができ、各演算ユニット１０４１１／１０４２１は、１つのデータチャネルを介してルーティング交換モジュール１０６にそれぞれ接続することができ、ＭとＰは、いずれも１より大きい正整数である。Ｎ１、Ｎ２、Ｍ、及びＰの具体的な値は、実際のニーズに応じて決定することができる。 FIG. 3 is a schematic structural diagram of the configuration of the third embodiment of the processor 10 of the present disclosure. As shown in FIG. 3, it includes a system controller 101 , a storage array module 102 , a data packing and unpacking module 103 , an arithmetic module 104 , a DMA module 105 and a routing and switching module 106 . Therein, the storage array module 102 can include N1 storage units 1021, each storage unit 1021 can be a set of SRAMs, etc., and the data packing and unpacking module 103 can store N2 data can include packing and unpacking units 1031, each data packing and unpacking unit 1031 can be respectively connected to the routing switching module 106 via one data channel, N1 and N2 are both positive integers greater than 1; and the general-purpose computing module 1041 can include M computing units 10411, the activation computing module 1042 can include P computing units 10421, and each computing unit 10411/10421 can Each can be connected to the routing switching module 106 via one data channel, and both M and P are positive integers greater than one. Specific values of N1, N2, M and P can be determined according to actual needs.

それに応じて、データパッキングアンパッキングユニット１０３１は、ストレージユニット１０２１から取得したデータパケットデータとシステムコントローラ１０１から取得したデータパケット情報をパッキングし、データチャネルを使用して、パッキングされた第１のデータパケットをルーティング交換モジュール１０６を介して演算ユニット１０４１１／１０４２１に送信して演算処理を行い、データチャネルを使用して、ルーティング交換モジュール１０６を介して演算ユニット１０４１１／１０４２１によって返された第２のデータパケットを取得し、第２のデータパケットをアンパッキングして演算結果データを取得し、ストレージユニット１０２１に記憶することができる。 Accordingly, the data packing and unpacking unit 1031 packs the data packet data obtained from the storage unit 1021 and the data packet information obtained from the system controller 101, and uses the data channel to generate the packed first data packet. to the arithmetic unit 10411/10421 via the routing switch module 106 for arithmetic processing, and uses the data channel to send the second data packet returned by the arithmetic unit 10411/10421 via the routing switch module , and unpacking the second data packet to obtain operation result data, which can be stored in the storage unit 1021 .

実際の応用において、システムコントローラ１０１は、例えば、どのデータを取得し、どこから取得し、どの演算を行う必要があるかなどの、毎回のニューラルネットワーク演算の詳細などをシミュレーションすることができ、それに応じて、データパケット情報を生成し、関連するデータパッキングアンパッキングユニット１０３１に送信することができる。各データパッキングアンパッキングユニット１０３１は、例えば、システムコントローラ１０１からのデータパケット情報をそれぞれ取得し、パッキングアンパッキング操作を行うなど、並行に作業することができる。 In a practical application, the system controller 101 can simulate the details of each neural network operation, such as what data to get, where to get it from, what operations need to be performed, and so on. to generate and send data packet information to the associated data packing and unpacking unit 1031 . Each data packing and unpacking unit 1031 can work in parallel, for example, obtain data packet information respectively from the system controller 101 and perform packing and unpacking operations.

それに応じて、データパケット情報は、ソースチャネル、ソースアドレス、宛先チャネル（演算チャネル）、演算タイプ及びデータパケットの長さなどを含むことができる。データパッキングアンパッキングユニット１０３１は、ソースチャネルに対応するストレージユニット１０２１のソースアドレスからデータパケットデータを取得することができ、ルーティング交換モジュール１０６は、取得された第１のデータパケットを宛先チャネルに対応する演算ユニット１０４１１／１０４２１に送信することができ、演算ユニット１０４１１／１０４２１は、演算タイプに基づいて、対応するタイプの演算処理を行うことができる。 Accordingly, the data packet information may include source channel, source address, destination channel (operation channel), operation type and data packet length, and the like. The data packing and unpacking unit 1031 can obtain the data packet data from the source address of the storage unit 1021 corresponding to the source channel, and the routing switching module 106 converts the obtained first data packet to the destination channel. It can be sent to the arithmetic unit 10411/10421, and the arithmetic unit 10411/10421 can perform the corresponding type of arithmetic processing based on the operation type.

好ましくは、Ｎ１とＮ２の値は同じであり、すなわちストレージユニット１０２１とデータパッキングアンパッキングユニット１０３１の数は同じであり、各データパッキングアンパッキングユニット１０３１は、それぞれ１つのストレージユニット１０２１に対応し、対応するストレージユニット１０２１からデータパケットデータを取得することができる。このようにして、各データパッキングアンパッキングユニット１０３１の並行作業をより良好に保証することができ、２つのデータパッキングアンパッキングユニット１０３１がいずれも特定のストレージユニット１０２１からデータを取得すると仮定すると、待機の状況が出現する可能性があり、すなわちその中の１つのデータパッキングアンパッキングユニット１０３１は、別のデータパッキングアンパッキングユニット１０３１がデータの取得を完了することを待った後にのみ、データを取得する必要があるため、効率の低下などを引き起こす。 Preferably, the values of N1 and N2 are the same, i.e. the number of storage units 1021 and data packing and unpacking units 1031 are the same, each data packing and unpacking unit 1031 corresponds to one storage unit 1021 respectively, Data packet data can be obtained from the corresponding storage unit 1021 . In this way, the parallelism of each data packing and unpacking unit 1031 can be better guaranteed, and assuming both data packing and unpacking units 1031 retrieve data from a particular storage unit 1021, the wait situations may arise, i.e. one data packing and unpacking unit 1031 in it needs to retrieve data only after waiting for another data packing and unpacking unit 1031 to complete retrieving data. Therefore, it causes a decrease in efficiency.

上記の処理方式では、ユニットを分割することにより、並行処理能力を上へのさせ、データの記憶のインタラクション能力などをさらに向上させる。 In the above processing method, by dividing the units, the parallel processing capability is increased, and the interaction capability of data storage and the like are further improved.

既存の命令拡張をコアとするＮＰＵでは、データの記憶のインタラクションは、統一されたロード／記憶（ｌｏａｄ／ｓｔｏｒｅ）モードを使用して、順次に同期操作して、非効率である。本開示に記載された処理方式を使用した後、並行に処理することができ、同期操作による待ち時間遅れなどを回避して、システム制御とデータ記憶のインタラクションなどがより効率的になる。 In existing instruction extension-core NPUs, data store interactions are inefficient, using unified load/store modes, sequential synchronous operations. After using the processing scheme described in this disclosure, processing can be performed in parallel, avoiding latency delays due to synchronization operations, etc., making system control and data storage interactions, etc. more efficient.

データパケット情報は、宛先アドレス又はストレージ戦略をさらに含むことができる。データパケット情報に宛先アドレスが含まれている場合、データパッキングアンパッキングユニット１０３１は、宛先アドレスに基づいて、演算結果データを対応するストレージユニット１０２１に記憶することができ、データパケット情報にストレージ戦略が含まれている場合、データパッキングアンパッキングユニット１０３１は、ストレージ戦略に基づいて、演算結果データを対応するストレージユニット１０２１に記憶することができる。前記ストレージ戦略は、データ整列を実現するストレージ戦略であってもよい。 Data packet information may further include destination addresses or storage strategies. If the data packet information contains a destination address, the data packing and unpacking unit 1031 can store the operation result data in the corresponding storage unit 1021 according to the destination address, and the data packet information includes a storage strategy. If included, the data packing and unpacking unit 1031 can store the operation result data in the corresponding storage unit 1021 based on the storage strategy. The storage strategy may be a storage strategy that achieves data alignment.

演算ユニット１０４１１／１０４２１が演算を完了した後、演算結果データを第１のデータパケットの中のデータセグメントのデータに置き換えることができ、データ長さは、通常、変化が発生するため、さらに、データパケット中のデータ長さ情報などを修正する必要があり、生成された第２のデータパケットを第１のデータパケットの伝送経路に従ってデータパッキングアンパッキングユニット１０３１に返し、データパッキングアンパッキングユニット１０３１は、演算結果データを第２のデータパケットから解析した後、演算結果データをどのように記憶するかという問題に関連する。 After the operation unit 10411/10421 completes the operation, the operation result data can be replaced with the data of the data segment in the first data packet, and the data length is usually changed, so the data The data length information in the packet needs to be modified, and the generated second data packet is returned to the data packing and unpacking unit 1031 according to the transmission path of the first data packet, and the data packing and unpacking unit 1031 A related issue is how to store the operation result data after it has been parsed from the second data packet.

それに応じて、データパケット情報は、ソースチャネル、ソースアドレス、宛先チャネル、及び宛先アドレスなどを含むことができ、すなわちソースアドレス、宛先アドレスと両側のチャネルアドレスを含むことができ、このようにして、取得された演算結果データについて、データパッキングアンパッキングユニット１０３１は、宛先アドレスに基づいて、それを対応するストレージユニット１０２１に記憶することができる。又は、データパケット情報は、宛先アドレスを含まないが、ストレージ戦略を含むこともでき、データパッキングアンパッキングユニット１０３１は、ストレージ戦略に基づいて、演算結果データを対応するストレージユニット１０２１に記憶することができ、データの自動整列などを実現することができる。 Accordingly, the data packet information may include source channel, source address, destination channel, destination address, etc., i.e., source address, destination address and channel addresses on both sides, thus: For the obtained operation result data, the data packing and unpacking unit 1031 can store it in the corresponding storage unit 1021 based on the destination address. Alternatively, the data packet information does not include the destination address, but may include the storage strategy, and the data packing and unpacking unit 1031 may store the operation result data in the corresponding storage unit 1021 based on the storage strategy. It is possible to realize automatic sorting of data.

前記ストレージ戦略が具体的にどのような戦略かは、実際のニーズに応じて決定することができ、例えば、上への整列、下への整列、整列後に他の場所どのように処理するか（例えば、充填処理を行うなど）などを含むことができる。 What kind of storage strategy is specifically can be determined according to actual needs. For example, performing a filling process, etc.).

ニューラルネットワークに関する演算操作は、データ縮小又は膨張になり、すなわち上記のデータ長さが変化し、演算後のデータ不整列が容易になり、既存の命令拡張をコアとするＮＰＵでは、通常、追加のデータ変換又は転置でデータ整列問題を解決し、このような追加の操作は全体的な処理効率を低下させ、ニューラルネットワーク演算は大量の繰り返しの記憶演算インタラクション反復操作に関するため、全体的な処理効率に大きな影響を与える。本開示に記載された処理方式では、ルーティング交換の方式で記憶と演算の自由なインタラクションを実現し、ストレージ戦略などによって記憶を自動に完了し、データの自動整列を実現し、実現方式が簡単であり、全体的な処理効率などを向上させる。 Arithmetic operations on neural networks result in data shrinkage or dilation, i.e., the above data length changes, post-operation data misalignment becomes easy, and NPUs with existing instruction extensions at their core usually add additional Data transformation or transposition solves the data alignment problem, such additional operations reduce the overall processing efficiency, and neural network operations involve large amounts of repeated memory operations interaction iteration operations, so the overall processing efficiency make a big impact. In the processing method described in the present disclosure, the routing exchange method realizes the free interaction of storage and operation, the storage strategy automatically completes storage, the automatic data alignment is achieved, and the implementation method is simple. Yes, and improve overall processing efficiency.

図３に示すように、システムコントローラ１０１は、外部バスインタフェースを介して処理ユニットとインタラクションすることができ、ＤＭＡモジュール１０５は、外部バス記憶インタフェースを介してダブルデータレート（ＤＤＲ、ＤｏｕｂｌｅＤａｔａＲａｔｅ）外部ストレージユニットとインタラクションなどを行うことができ、具体的な実現は既存の技術である。 As shown in FIG. 3, the system controller 101 can interact with the processing unit via an external bus interface, and the DMA module 105 can communicate with a Double Data Rate (DDR) external memory via an external bus storage interface. It can interact with the storage unit, etc., and the specific realization is existing technology.

以上は装置の実施例の説明であり、以下は方法の実施例を通じて、本開示に記載された方案をさらに説明する。 The above is the description of the device embodiments, and the following will further describe the solutions described in the present disclosure through the method embodiments.

図４は本開示のプロセッサ実現方法の実施例のフローチャートである。図４に示すように、以下の具体的な実現方式を含む。 FIG. 4 is a flow chart of an embodiment of a processor-implemented method of the present disclosure. As shown in FIG. 4, it includes the following specific implementation schemes.

４０１では、システムコントローラ、ストレージアレイモジュール、データパッキングアンパッキングモジュール、及び演算モジュールで構成されるプロセッサを構築する。 At 401, build a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and a computing module.

４０２では、プロセッサを使用してニューラルネットワーク演算を行い、システムコントローラは、所定のデータパケット情報をデータパッキングアンパッキングモジュールに送信するために用いられ、データパッキングアンパッキングモジュールは、データパケット情報に基づいてストレージアレイモジュールから対応するデータパケットデータを取得し、データパケットデータとデータパケット情報をパッキングし、パッキングされた第１のデータパケットを演算モジュールに送信して演算処理を行い、演算モジュールによって返された第２のデータパケットを取得し、第２のデータパケットをアンパッキングして演算結果データを取得するために用いられ、ストレージアレイモジュールに記憶し、ストレージアレイモジュールは、データ記憶を行うために用いられ、演算モジュールは、取得された第１のデータパケットに対して演算処理を行い、演算結果データに基づいて第２のデータパケットを生成し、データパッキングアンパッキングモジュールに返すために用いられる。 At 402, a processor is used to perform neural network operations, and a system controller is used to send predetermined data packet information to a data packing and unpacking module, which, based on the data packet information, Obtain corresponding data packet data from the storage array module, pack the data packet data and the data packet information, send the packed first data packet to the computing module for computing, and return by the computing module used to obtain a second data packet, unpack the second data packet to obtain operation result data, and store in the storage array module, the storage array module used to perform data storage; , the operation module is used to perform operation processing on the obtained first data packet, generate a second data packet based on the operation result data, and return it to the data packing and unpacking module.

上記を基礎として、さらに、プロセッサにＤＭＡモジュールを追加することができ、ＤＭＡモジュールは、システムコントローラの制御の下で外部ストレージデータとストレージアレイモジュール内の内部ストレージアレイデータの高速交換を実現するために用いられることができる。 Based on the above, a DMA module can also be added to the processor, the DMA module to realize high-speed exchange of external storage data and internal storage array data in the storage array module under the control of the system controller. can be used.

また、さらに、プロセッサにルーティング交換モジュールを追加することができ、ルーティング交換モジュールは、データパッキングアンパッキングモジュールから取得した第１のデータパケットを演算モジュールに送信し、演算モジュールから取得した第２のデータパケットをデータパッキングアンパッキングモジュールに送信するために用いられることができる。 In addition, a routing exchange module can be added to the processor, the routing exchange module sending the first data packet obtained from the data packing and unpacking module to the computing module, and the second data packet obtained from the computing module. It can be used to send packets to the data packing and unpacking module.

演算モジュールは、汎用演算を行うための汎用演算モジュールと、アクティベーション演算を行うためのアクティベーション演算モジュールを含むことができる。 The computing modules can include a general-purpose computing module for performing general-purpose computing and an activation computing module for performing activation computing.

また、ストレージアレイモジュールは、Ｎ１個のストレージユニットを含むことができ、データパッキングアンパッキングモジュールは、Ｎ２個のデータパッキングアンパッキングユニットを含むことができ、各データパッキングアンパッキングユニットは、それぞれ１つのデータチャネルを介してルーティング交換モジュールに接続され、Ｎ１とＮ２はいずれも１より大きい正整数である。汎用演算モジュールは、Ｍ個の演算ユニットを含むことができ、アクティベーション演算モジュールは、Ｐ個の演算ユニットを含むことができ、各演算ユニットは、それぞれ１つのデータチャネルを介してルーティング交換モジュールに接続することができ、ＭとＰは、いずれも１より大きい正整数である。 Also, the storage array module may include N1 storage units, and the data packing and unpacking module may include N2 data packing and unpacking units, each data packing and unpacking unit each having one It is connected to the routing switching module via a data channel, and both N1 and N2 are positive integers greater than one. The general purpose computing module may contain M computing units and the activation computing module may contain P computing units, each of which communicates with the routing switch module via one data channel respectively. , and both M and P are positive integers greater than one.

それに応じて、データパッキングアンパッキングユニットは、ストレージユニットから取得したデータパケットデータとシステムコントローラから取得したデータパケット情報をパッキングし、データチャネルを使用して、パッキングされた第１のデータパケットをルーティング交換モジュールを介して演算ユニットに送信して演算処理を行い、データチャネルを使用して、ルーティング交換モジュールを介して演算ユニットによって返された第２のデータパケットを取得し、第２のデータパケットをアンパッキングして演算結果データを取得し、ストレージユニットに記憶するために用いられることができる。 In response, the data packing and unpacking unit packs the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and uses the data channel to route and exchange the packed first data packet. module to an arithmetic unit for arithmetic processing, uses the data channel to obtain a second data packet returned by the arithmetic unit through the routing exchange module, and unloads the second data packet. It can be used to obtain operation result data by packing and store it in a storage unit.

データパケット情報は、ソースチャネル、ソースアドレス、宛先チャネル、及び演算タイプを含むことができる。それに応じて、データパケットデータは、データパッキングアンパッキングユニットが、ソースチャネルに対応するストレージユニットのソースアドレスから取得したデータパケットデータであってもよく、第１のデータパケットを取得した演算ユニットは、ルーティング交換モジュールによって決定された宛先チャネルに対応する演算ユニットであってもよく、演算処理は、演算ユニットが行う前記演算タイプの演算処理であってもよい。 Data packet information may include source channel, source address, destination channel, and operation type. Correspondingly, the data packet data may be the data packet data obtained by the data packing and unpacking unit from the source address of the storage unit corresponding to the source channel, the arithmetic unit obtaining the first data packet: It may be an arithmetic unit corresponding to the destination channel determined by the routing exchange module, and the arithmetic operation may be an arithmetic operation of said arithmetic type performed by the arithmetic unit.

好ましくは、Ｎ１とＮ２の値は同じであり、各データパッキングアンパッキングユニットは、１つのストレージユニットにそれぞれ対応し、対応するストレージユニットからデータパケットデータを取得する。 Preferably, the values of N1 and N2 are the same, and each data packing and unpacking unit respectively corresponds to one storage unit and obtains data packet data from the corresponding storage unit.

データパケット情報は、宛先アドレス又はストレージ戦略をさらに含むことができる。データパケット情報に宛先アドレスが含まれている場合、データパッキングアンパッキングユニットは、宛先アドレスに基づいて、演算結果データを対応するストレージユニットに記憶することができ、データパケット情報にストレージ戦略が含まれている場合、データパッキングアンパッキングユニットは、ストレージ戦略に基づいて、演算結果データを対応するストレージユニットに記憶することができる。前記ストレージ戦略は、データ整列を実現するストレージ戦略であってもよい。 Data packet information may further include destination addresses or storage strategies. If the data packet information includes a destination address, the data packing and unpacking unit can store the operation result data in the corresponding storage unit based on the destination address, and the data packet information includes a storage strategy. If so, the data packing and unpacking unit can store the operation result data in the corresponding storage unit based on the storage strategy. The storage strategy may be a storage strategy that achieves data alignment.

図４に示す方法の実施例の具体的なワークフローは上記の装置の実施例の関連説明を参照し、ここで説明しない。 The specific workflow of the method embodiment shown in FIG. 4 is referred to the related description of the apparatus embodiment above and is not described here.

要するに、本開示の方法の実施例に記載された方案を使用して、ストレージ及びコンピューティングの統合された実現方式を提出し、プロセッサにおいてニューラルネットワークが記憶から演算への全体的なインタラクションを完了し、複雑な命令設計と難易度の高いコンパイラ開発などを回避し、設計難易度を低下させ、全体的な処理効率などを向上させる。 In short, using the schemes described in the method embodiments of the present disclosure, an integrated realization of storage and computing is presented, and the neural network completes the overall interaction from memory to operation in the processor. , Avoid complex instruction design and difficult compiler development, etc., reduce design difficulty, improve overall processing efficiency, etc.

本開示の実施例によれば、本開示は電子機器及び読み取り可能な記憶媒体をさらに提供する。 According to embodiments of the disclosure, the disclosure further provides an electronic device and a readable storage medium.

図５に示すように、それは本開示の実施例の前記方法に係る電子機器のブロック図である。電子機器は、ラップトップコンピュータ、デスクトップコンピュータ、ワークステーション、パーソナルデジタルアシスタント、サーバ、ブレードサーバ、大型コンピュータ、及び他の適切なコンピュータなどの様々な形式のデジタルコンピュータを表すことを目的とする。電子機器は、パーソナルデジタル処理、携帯電話、スマートフォン、ウェアラブルデバイス、他の同様のコンピューティングデバイスなどの様々な形式のモバイルデバイスを表すこともできる。本明細書で示されるコンポーネント、それらの接続と関係、及びそれらの機能は単なる例であり、本明細書の説明及び／又は要求される本開示の実現を制限することを意図したものではない。 As shown in FIG. 5, it is a block diagram of an electronic device according to the method of an embodiment of the present disclosure. Electronic equipment is intended to represent various forms of digital computers such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronics can also represent various forms of mobile devices such as personal digital assistants, cell phones, smart phones, wearable devices, and other similar computing devices. The components, their connections and relationships, and their functionality illustrated herein are merely examples and are not intended to limit the description and/or required implementation of the disclosure herein.

図５に示すように、当該電子機器は、一つ又は複数のプロセッサＹ０１と、メモリＹ０２と、高速インターフェースと低速インターフェースを含む各コンポーネントを接続するためのインターフェースと、を含む。各コンポーネントは、異なるバスで相互に接続され、共通のマザーボードに取り付けられるか、又は必要に基づいて他の方式で取り付けることができる。プロセッサは、外部入力／出力装置（インターフェースに結合されたディスプレイデバイスなど）にＧＵＩの図形情報をディスプレイするためにメモリに記憶されている命令を含む、電子機器内に実行される命令を処理することができる。他の実施方式では、必要であれば、複数のプロセッサ及び／又は複数のバスを、複数のメモリと複数のメモリとともに使用することができる。同様に、複数の電子機器を接続することができ、各電子機器は、部分的な必要な操作（例えば、サーバアレイ、ブレードサーバ、又はマルチプロセッサシステムとする）を提供することができる。図５では、一つのプロセッサＹ０１を例とする。 As shown in FIG. 5, the electronic device includes one or more processors Y01, memory Y02, and interfaces for connecting components including high-speed interfaces and low-speed interfaces. Each component is interconnected by a different bus and can be mounted on a common motherboard or otherwise mounted based on needs. The processor processes instructions executed within the electronic device, including instructions stored in memory for displaying graphical information of the GUI on an external input/output device (such as a display device coupled to the interface). can be done. In other implementations, multiple processors and/or multiple buses can be used, along with multiple memories and multiple memories, if desired. Similarly, multiple electronic devices can be connected, and each electronic device can provide a partial required operation (eg, be a server array, blade server, or multi-processor system). In FIG. 5, one processor Y01 is taken as an example.

メモリＹ０２は、本開示により提供される非一時的なコンピュータ読み取り可能な記憶媒体である。その中、前記メモリには、少なくとも一つのプロセッサによって実行される命令を記憶して、前記少なくとも一つのプロセッサが本開示により提供される前記方法を実行することができるようにする。本開示の非一時的なコンピュータ読み取り可能な記憶媒体は、コンピュータが本開示により提供される前記方法を実行するためのコンピュータ命令を記憶する。 Memory Y02 is a non-transitory computer-readable storage medium provided by the present disclosure. Therein, the memory stores instructions to be executed by at least one processor to enable the at least one processor to perform the methods provided by the present disclosure. A non-transitory computer-readable storage medium of the present disclosure stores computer instructions for a computer to perform the methods provided by the present disclosure.

メモリＹ０２は、非一時的なコンピュータ読み取り可能な記憶媒体として、本開示の実施例における前記方法に対応するプログラム命令／モジュールように、非一時的なソフトウェアプログラム、非一時的なコンピュータ実行可能なプログラム及びモジュールを記憶するために用いられる。プロセッサＹ０１は、メモリＹ０２に記憶されている非一時的なソフトウェアプログラム、命令及びモジュールを実行することによって、サーバの様々な機能アプリケーション及びデータ処理を実行し、すなわち上記の方法の実施例における前記方法を実現する。 Memory Y02 is a non-transitory computer-readable storage medium, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure, non-transitory software programs, non-transitory computer-executable programs and used to store modules. Processor Y01 performs the various functional applications and data processing of the server by executing non-transitory software programs, instructions and modules stored in memory Y02, i.e. the method in the above method embodiment. Realize

メモリＹ０２は、ストレージプログラム領域とストレージデータ領域とを含むことができ、その中、ストレージプログラム領域は、オペレーティングシステム、少なくとも一つの機能に必要なアプリケーションプログラムを記憶することができ、ストレージデータ領域は、電子機器の使用によって作成されたデータなどを記憶することができる。また、メモリＹ０２は、高速ランダム存取メモリを含むことができ、非一時的なメモリをさらに含むことができ、例えば、少なくとも一つのディスクストレージデバイス、フラッシュメモリデバイス、又は他の非一時的なソリッドステートストレージデバイスである。いくつかの実施例では、メモリＹ０２は、プロセッサＹ０１に対して遠隔に設置されたメモリを含むことができ、これらの遠隔メモリは、ネットワークを介して電子機器に接続されることができる。上記のネットワークの例は、インターネット、イントラネット、ローカルエリアネットワーク、モバイル通信ネットワーク、及びその組み合わせを含むが、これらに限定しない。 The memory Y02 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, application programs required for at least one function, and the storage data area can: It can store data and the like created by using an electronic device. Also, memory Y02 can include high-speed random storage memory, and can further include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state memory device. It is a state storage device. In some embodiments, memory Y02 may include memory located remotely to processor Y01, and these remote memories may be connected to electronic equipment via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

電子機器は、入力装置Ｙ０３と出力装置Ｙ０４とをさらに含むことができる。プロセッサＹ０１、メモリＹ０２、入力装置Ｙ０３、及び出力装置Ｙ０４は、バス又は他の方式を介して接続することができ、図５では、バスを介して接続することを例とする。 The electronic device can further include an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03, and the output device Y04 can be connected via a bus or other methods, and the connection via a bus is taken as an example in FIG.

入力装置Ｙ０３は、入力された数字又は文字情報を受信することができ、及び前記方法を実現する電子機器のユーザ設置及び機能制御に関するキー信号入力を生成することができ、例えば、タッチスクリーン、キーパッド、マウス、トラックパッド、タッチパッド、指示杆、一つ又は複数のマウスボタン、トラックボール、ジョイスティックなどの入力装置である。出力装置Ｙ０４は、ディスプレイデバイス、補助照明デバイス、及び触覚フィードバックデバイス（例えば、振動モータ）などを含むことができる。当該ディスプレイデバイスは、液晶ディスプレイ、発光ダイオードディスプレイ、及びプラズマディスプレイを含むことができるが、これらに限定しない。いくつかの実施方式では、ディスプレイデバイスは、タッチスクリーンであってもよい。 The input device Y03 can receive input numeric or character information, and can generate key signal input for user installation and function control of the electronic equipment that implements the method, such as touch screen, key Input devices such as pads, mice, trackpads, touchpads, pointers, one or more mouse buttons, trackballs, and joysticks. Output devices Y04 may include display devices, auxiliary lighting devices, haptic feedback devices (eg, vibration motors), and the like. Such display devices can include, but are not limited to, liquid crystal displays, light emitting diode displays, and plasma displays. In some implementations, the display device may be a touch screen.

本明細書で説明されるシステムと技術の様々な実施方式は、デジタル電子回路システム、集積回路システム、特定用途向け集積回路、コンピュータハードウェア、ファームウェア、ソフトウェア、及び／又はそれらの組み合わせで実現することができる。これらの様々な実施方式は、一つ又は複数のコンピュータプログラムで実施されることを含むことができ、当該一つ又は複数のコンピュータプログラムは、少なくとも一つのプログラマブルプロセッサを含むプログラム可能なシステムで実行及び／又は解釈されることができ、当該プログラマブルプロセッサは、特定用途向け又は汎用プログラマブルプロセッサであってもよく、ストレージシステム、少なくとも一つの入力装置、及び少なくとも一つの出力装置からデータ及び命令を受信し、データ及び命令を当該ストレージシステム、当該少なくとも一つの入力装置、及び当該少なくとも一つの出力装置に伝送することができる。 Various implementations of the systems and techniques described herein may be implemented in digital electronic circuit systems, integrated circuit systems, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. can be done. These various implementations can include being embodied in one or more computer programs, which are executed and executed in a programmable system including at least one programmable processor. /or may be interpreted, the programmable processor may be an application-specific or general-purpose programmable processor, receives data and instructions from a storage system, at least one input device, and at least one output device; Data and instructions can be transmitted to the storage system, the at least one input device, and the at least one output device.

これらのコンピューティングプログラム（プログラム、ソフトウェア、ソフトウェアアプリケーション、又はコードとも呼ばれる）は、プログラマブルプロセッサの機械命令、高レベルのプロセス及び／又はオブジェクト指向プログラミング言語、及び／又はアセンブリ／機械言語でこれらのコンピューティングプログラムを実施することを含む。本明細書に使用されるように、用語「機械読み取り可能な媒体」及び「コンピュータ読み取り可能な媒体」は、機械命令及び／又はデータをプログラマブルプロセッサに提供するために使用される任意のコンピュータプログラム製品、機器、及び／又は装置（例えば、磁気ディスク、光ディスク、メモリ、プログラマブルロジックデバイ）を指し、機械読み取り可能な信号である機械命令を受信する機械読み取り可能な媒体を含む。用語「機械読み取り可能な信号」は、機械命令及び／又はデータをプログラマブルプロセッサに提供するための任意の信号を指す。 These computing programs (also called programs, software, software applications, or code) are written in programmable processor machine instructions, high-level process and/or object-oriented programming languages, and/or assembly/machine language to Including implementing the program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product that can be used to provide machine instructions and/or data to a programmable processor. , apparatus, and/or apparatus (eg, magnetic disk, optical disk, memory, programmable logic device), including machine-readable media for receiving machine instructions, which are machine-readable signals. The term "machine-readable signal" refers to any signal for providing machine instructions and/or data to a programmable processor.

ユーザとのインタラクションを提供するために、コンピュータ上でここで説明されているシステム及び技術を実施することができ、当該コンピュータは、ユーザに情報を表示するためのディスプレイ装置（例えば、陰極線管又は液晶ディスプレイモニタ）と、キーボード及びポインティングデバイス（例えば、マウス又はトラックボール）とを有し、ユーザは、当該キーボード及び当該ポインティングデバイスによって入力をコンピュータに提供することができる。他の種類の装置は、ユーザとのインタラクションを提供するために用いられることもでき、例えば、ユーザに提供されるフィードバックは、任意の形式のセンシングフィードバック（例えば、視覚フィードバック、聴覚フィードバック、又は触覚フィードバック）であってもよく、任意の形式（音響入力と、音声入力と、触覚入力とを含む）でユーザからの入力を受信することができる。 To provide interaction with a user, the systems and techniques described herein can be implemented on a computer, which includes a display device (e.g., cathode ray tube or liquid crystal display) for displaying information to the user. display monitor), and a keyboard and pointing device (eg, mouse or trackball) through which a user can provide input to the computer. Other types of devices can also be used to provide interaction with a user, for example, the feedback provided to the user can be any form of sensing feedback (e.g., visual, auditory, or tactile feedback). ) and can receive input from the user in any form (including acoustic, speech, and tactile input).

ここで説明されるシステム及び技術は、バックエンドコンポーネントを含むコンピューティングシステム（例えば、データサーバとする）、又はミドルウェアコンポーネントを含むコンピューティングシステム（例えば、アプリケーションサーバー）、又はフロントエンドコンポーネントを含むコンピューティングシステム（例えば、グラフィカルユーザインタフェース又はウェブブラウザを有するユーザコンピュータ、ユーザは、当該グラフィカルユーザインタフェース又は当該ウェブブラウザによってここで説明されるシステム及び技術の実施方式とインタラクションする）、又はこのようなバックエンドコンポーネントと、ミドルウェアコンポーネントと、フロントエンドコンポーネントの任意の組み合わせを含むコンピューティングシステムで実施することができる。任意の形式又は媒体のデジタルデータ通信（例えば、通信ネットワーク）によってシステムのコンポーネントを相互に接続されることができる。通信ネットワークの例は、ローカルエリアネットワーク（ＬＡＮ）と、ワイドエリアネットワーク（ＷＡＮ）と、ブロックチェーンネットワークと、インターネットとを含む。 The systems and techniques described herein may be computing systems that include back-end components (e.g., data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include front-end components. A system (e.g., a user computer having a graphical user interface or web browser, through which the user interacts with implementations of the systems and techniques described herein), or such a back-end component , middleware components, and front-end components in any combination. The components of the system can be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

コンピュータシステムは、クライアントとサーバとを含むことができる。クライアントとサーバは、一般に、互いに離れており、通常に通信ネットワークを介してインタラクションする。対応するコンピュータ上で実行され、互いにクライアント－サーバ関係を有するコンピュータプログラムによってクライアントとサーバとの関係が生成される。サーバは、クラウドサーバであってもよく、クラウド計算又はクラウドホストとも呼ばれ、クラウド計算サービスシステムの中の一つのホスト製品であり、従来の物理ホストとＶＰＳサービスに、存在する管理困難度が高く、業務拡張性が弱い欠陥を解決する。 The computer system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship to each other. The server may be a cloud server, also called cloud computing or cloud host, which is one host product in the cloud computing service system, and the existing physical host and VPS service have a high management difficulty , to solve the defect of weak business extensibility.

上記に示される様々な形式のプロセスを使用して、ステップを並べ替え、追加、又は削除することができることを理解されたい。例えば、本開示に記載されている各ステップは、並列に実行されてもよいし、順次的に実行されてもよいし、異なる順序で実行されてもよいが、本開示で開示されている技術案が所望の結果を実現することができれば、本明細書では限定されない。 It should be appreciated that steps may be reordered, added, or deleted using the various types of processes shown above. For example, each step described in the present disclosure may be performed in parallel, sequentially, or in a different order, but the techniques disclosed in the present disclosure The scheme is not limited herein so long as it can achieve the desired result.

上記の具体的な実施方式は、本開示に対する保護範囲の制限を構成するものではない。当業者は、設計要求と他の要因に基づいて、様々な修正、組み合わせ、サブコンビネーション、及び代替を行うことができる。任意の本開示の精神と原則内で行われる修正、同等の置換、及び改善などは、いずれも本開示の保護範囲内に含まれなければならない。
The above specific implementation manners do not constitute a limitation of the protection scope of this disclosure. Those skilled in the art can make various modifications, combinations, subcombinations, and substitutions based on design requirements and other factors. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this disclosure shall all fall within the protection scope of this disclosure.

Claims

a processor,
including a system controller, a storage array module, a data packing and unpacking module, and a computing module;
said system controller sending predetermined data packet information to said data packing and unpacking module;
The data packing and unpacking module obtains corresponding data packet data from the storage array module based on the data packet information, packs the data packet data and the data packet information, and packs a first packed data packet. to the arithmetic module for arithmetic processing, obtaining a second data packet returned by the arithmetic module, unpacking the second data packet to obtain arithmetic result data, and storing the storage stored in the array module,
the storage array module provides data storage;
The arithmetic module performs arithmetic processing on the obtained first data packet, generates the second data packet based on the arithmetic result data, and returns the second data packet to the data packing and unpacking module.
processor.

further comprising a direct memory access module for providing high speed exchange of external storage data with internal storage array data within said storage array module under control of said system controller;
2. The processor of claim 1.

a routing switch module for transmitting the first data packet obtained from the data packing and unpacking module to the computing module, and transmitting the second data packet obtained from the computing module to the data packing and unpacking module; include,
2. The processor of claim 1.

The computing module includes a general computing module and an activation computing module,
The general-purpose operation module performs general-purpose operations, and the activation operation module performs activation operations.
4. The processor of claim 3.

the storage array module includes N1 storage units;
The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module via one data channel, N1 and N2 are both 1 is a positive integer greater than
The general purpose computing module comprises M computing units, the activation computing module comprises P computing units, each computing unit is respectively connected to the routing switching module via one data channel, and M and P are all positive integers greater than 1,
The data packing and unpacking unit packs the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and uses the data channel to generate the packed first data packet. to an arithmetic unit through the routing exchange module for arithmetic processing, and using the data channel to obtain the second data packet returned by the arithmetic unit through the routing exchange module. , unpacking the second data packet to obtain operation result data and storing it in the storage unit;
5. The processor of claim 4.

the data packet information includes source channel, source address, destination channel, and operation type;
the data packing and unpacking unit obtains the data packet data from the source address of a storage unit corresponding to the source channel;
The routing switch module sends the first data packet to an arithmetic unit corresponding to the destination channel for arithmetic processing of the arithmetic type.
6. The processor of claim 5.

The N1 and N2 values are the same, and each data packing and unpacking unit respectively corresponds to one storage unit and obtains the data packet data from the corresponding storage unit.
7. The processor of claim 6.

the data packet information further includes a destination address or storage strategy;
if the data packet information includes the destination address, the data packing and unpacking unit stores the operation result data in a corresponding storage unit based on the destination address;
if the data packet information includes the storage strategy, the data packing and unpacking unit stores the operation result data in a corresponding storage unit based on the storage strategy;
8. The processor of claim 7.

the storage strategy includes a storage strategy that achieves data alignment;
9. The processor of claim 8.

A processor implementation method comprising:
building a processor consisting of a system controller, a storage array module, a data packing and unpacking module, and a computing module;
performing neural network operations using the processor;
said system controller sending predetermined data packet information to said data packing and unpacking module;
The data packing and unpacking module obtains corresponding data packet data from the storage array module based on the data packet information, packs the data packet data and the data packet information, and packs a first packed data packet. to the arithmetic module for arithmetic processing, obtaining a second data packet returned by the arithmetic module, unpacking the second data packet to obtain arithmetic result data, and storing the storage stored in the array module,
the storage array module provides data storage;
The arithmetic module performs arithmetic processing on the obtained first data packet, generates the second data packet based on the arithmetic result data, and returns the second data packet to the data packing and unpacking module.
A processor implementation.

further comprising adding a direct memory access module to the processor;
the direct memory access module provides high-speed exchange of external storage data with internal storage array data within the storage array module under control of the system controller;
11. A processor implementation method according to claim 10.

further comprising adding a routing exchange module to the processor;
The routing exchange module transmits the first data packet obtained from the data packing and unpacking module to the computing module, and transmits the second data packet obtained from the computing module to the data packing and unpacking module. do,
11. A processor implementation method according to claim 10.

The arithmetic module includes a general-purpose arithmetic module that performs general-purpose arithmetic and an activation arithmetic module that performs activation arithmetic,
13. A processor implementation method as claimed in claim 12.

the storage array module includes N1 storage units;
The data packing and unpacking module includes N2 data packing and unpacking units, each data packing and unpacking unit is respectively connected to the routing switching module via one data channel, N1 and N2 are both 1 is a positive integer greater than
The general purpose computing module comprises M computing units, the activation computing module comprises P computing units, each computing unit is respectively connected to the routing switching module via one data channel, and M and P are all positive integers greater than 1,
The data packing and unpacking unit packs the data packet data obtained from the storage unit and the data packet information obtained from the system controller, and uses the data channel to generate the packed first data packet. to an arithmetic unit through the routing exchange module for arithmetic processing, and using the data channel to obtain the second data packet returned by the arithmetic unit through the routing exchange module. , unpacking the second data packet to obtain operation result data and storing it in the storage unit;
14. A processor implementation method as claimed in claim 13.

the data packet information includes source channel, source address, destination channel, and operation type;
the data packet data is data packet data obtained by the data packing and unpacking unit from the source address of the storage unit corresponding to the source channel;
the computing unit that obtained the first data packet is the computing unit corresponding to the destination channel determined by the routing switch module;
The arithmetic processing is arithmetic processing of the arithmetic type performed by the arithmetic unit,
15. A processor implementation method as claimed in claim 14.

The N1 and N2 values are the same, and each data packing and unpacking unit respectively corresponds to one storage unit and obtains the data packet data from the corresponding storage unit.
16. A processor implementation method as claimed in claim 15.

the data packet information further includes a destination address or storage strategy;
if the data packet information includes the destination address, the data packing and unpacking unit stores the operation result data in a corresponding storage unit based on the destination address;
if the data packet information includes the storage strategy, the data packing and unpacking unit stores the operation result data in a corresponding storage unit based on the storage strategy;
17. A processor implementation method according to claim 16.

the storage strategy includes a storage strategy that achieves data alignment;
18. A processor implementation method as claimed in claim 17.

an electronic device,
at least one processor;
a memory communicatively coupled to the at least one processor;
Instructions executable by the at least one processor are stored in the memory, and when the instructions are executed by the at least one processor, the at least one processor performs the operation of any one of claims 10 to 18. executing the processor implementation described in
Electronics.

a program,
cause a computer to perform the processor implementation method according to any one of claims 10 to 18,
program.