CN110059050A

CN110059050A - AI supercomputer based on the restructural elastic calculation of high-performance

Info

Publication number: CN110059050A
Application number: CN201910350877.6A
Authority: CN
Inventors: 向志宏; 杨延辉; 吴君安
Original assignee: Beijing Super Dimension Computing Technology Co Ltd
Current assignee: Beijing Meilian Dongqing Technology Co ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-07-26
Anticipated expiration: 2039-04-28
Also published as: CN110059050B

Abstract

A kind of AI supercomputer based on the restructural elastic calculation of high-performance.In embodiment, which includes: machine perceptron；Reconfigurable Computation unit R PU array, under the control of configuration information, the reconfigurable data for executing AI supercomputer based on reconfigurable data is calculated；Machine behavior device；High-performance elastic based on Reconfigurable Computation calculates master control system, connects machine perceptron and machine behavior device, Reconfigurable Computation cell array；High-performance elastic based on Reconfigurable Computation connects system, and connection high-performance elastic calculates master control system and RPU array under the control of control information；AI supercomputer compiling system based on restructural elastic calculation, for by compiling of application generate the control code of master controller, the control information of HEC_Link, RPU array every configuration information.This specification embodiment may be implemented stronger calculation power deployment, more parallel computations and be performed simultaneously a variety of different tasks.

Description

AI supercomputer based on the restructural elastic calculation of high-performance

Technical field

This specification embodiment is related to a kind of supercomputer, more particularly to a kind of based on the restructural elastic calculation of high-performance (HEC) AI (artificial intelligence, artificial intelligence) supercomputer (AISC).

Background technique

The development of artificial intelligence (AI) is advanced by leaps and bounds, but it operation platform it is most of or based on CPU, GPU, FPGA and The platform that ASIC and combinations thereof is formed.These operation platforms cause when AI product allocation to developer and user much tired It disturbs:

(1) CPU flexibility highest, but its Energy Efficiency Ratio is very low under the scene of parallel computation a large amount of for needs such as AI；GPU Solves the problems, such as a part of parallel computation with FPGA, but power consumption and cost are always an important factor for influencing its deployment；ASIC With good Energy Efficiency Ratio, but it can only adapt to fixed algorithm, helpless to algorithm evolution.

(2) platform being made of the one or more of CPU, GPU, FPGA and ASIC calculates power in the complexity of system architecture Expandability, the power consumption of system, cost etc. it is all difficult such as people's will.

Summary of the invention

This specification embodiment proposes a kind of AI supercomputer based on the restructural elastic calculation of high-performance, the computer It can flexibly be disposed according to product demand and use environment and calculate power, can support edge calculations, large-scale calculations and great scale It calculates, can support the various neural computings without order-driven, support on-line training and on-line Algorithm iteration and have High versatility, flexibility and Energy Efficiency Ratio.

The AI supercomputer based on the restructural elastic calculation of high-performance of this specification embodiment, comprising: machine perception Machine enters information as reconfigurable data for providing environment sensing information or equipment；Reconfigurable Computation unit R PU array；Machine Device behavior device, for exporting the result of the calculating of AI supercomputer or reasoning or executing the relevant instruction of AI supercomputer；Based on restructural The high-performance elastic of calculating calculates master control system HEC；High-performance elastic based on Reconfigurable Computation connects system HEC_Link；Base In the AI supercomputer compiling system of restructural elastic calculation, for compiling of application to be generated to control code, the HEC_ of master controller Every configuration information of the control information of Link, RPU array, so that master control system connects machine perception under the control of control code Machine and machine behavior device, HEC_Link connection high-performance elastic under the control of control information calculate HEC and RPU gusts of master control system Column, under the control of configuration information, the reconfigurable data for executing AI supercomputer based on reconfigurable data calculates RPU array.

Under possible embodiment, the HEC include: by multilayer system bus, configuration bus, on-chip DMA controller, System-level the weighing of at least one of on-chip memory, on piece storage control, peripheral control unit and chip external memory composition Structure data path.

Under possible embodiment, high-performance elastic connection system (HEC_Link) is under the control of control information It realizes between RPU and RPU and the high speed data transfer between RPU and Master control chip system and Master control chip and RPU Between configuration status communicated with configuration information.

Under possible embodiment, RPU array includes being connected between one or more RPU, RPU by HEC_Link Together；Each RPU passes through HEC_Link and obtains configuration information.

Under possible embodiment, AI supercomputer includes the AI supercomputer operation system based on restructural elastic calculation System, for managing the software and hardware resource and peripheral resources of AI supercomputer, for executing the compiling file of compiling system output, It is that AI is calculated as a result, for executing for being executed in control machine behavior device for obtaining the information obtained from machine perceptron Compiled online.

Under possible embodiment, compiling system is adapted to according to the characteristics of neural network.

Under possible embodiment, the compiling system compiling is for utilizing according to HEC_Link and RPU array feature HEC_Link carries out the control information of the extension of columns and/or rows to RPU array.

Under possible embodiment, the compiling system compiling is for changing the connection relationship of RPU by HEC_Link Control information.

Under possible embodiment, the control information of the compiling system compiling makes one or more RPU: while holding The one or more different training missions of row；It is performed simultaneously one or more different reasoning tasks；It is performed simultaneously multiple training Task and multiple reasoning tasks；Alternatively, same data are input to trained network and inference network simultaneously.

Under possible embodiment, when HEC_Link is disposed using distributed deployment or cloud, the compiling system root Be compiled according to HEC_Link link information, deployment master control system, HEC_Link and and each RPU task.

Under possible embodiment, the compiling system is compiled offline mode, and output compiling file is directly passed to Operating system executes；Or the compiled online mode of operation on an operating system, after compiling finishes, as needed directly by grasping Make system execution.

Under possible embodiment, AI supercomputer includes: that compiling system passes through the network structure of on-line training Compiled online deployed in real time goes down, and realizes the online real-time update of neural network.

This specification embodiment may be implemented stronger calculation power deployment, more parallel computations and be performed simultaneously a variety of Different tasks.

Detailed description of the invention

In order to which technical solution in this specification embodiment and advantage is more clearly understood, below in conjunction with attached drawing to this theory The exemplary embodiment of bright book is described in more detail, it is clear that described embodiment is only one of this specification Point embodiment, rather than the exhaustion of all embodiments.

Fig. 1 is the AI supercomputer system architecture diagram based on the restructural elastic calculation of high-performance；

Fig. 2 (a) and Fig. 2 (b) illustrate the calculation power expander graphs of AI supercomputer；Fig. 2 (a) is tried hard to before extending, Fig. 2 (b) is tried hard to after extending.

Fig. 3 is that AI supercomputer flexible adjustment calculates power, executes the schematic diagram of two tasks parallel.

Specific embodiment

With reference to the accompanying drawing, the scheme provided this specification is described.

Fig. 1 is the AI supercomputer system architecture diagram based on the restructural elastic calculation of high-performance.As shown in Figure 1, being based on The AI supercomputer (AISC) of the restructural elastic calculation of high-performance includes that the high-performance elastic based on Reconfigurable Computation calculates master Control system (HEC), the high-performance elastic based on Reconfigurable Computation connect system (HEC_Link), Reconfigurable Computation cell array (RPU array), at least one machine perceptron 1-P, machine behavior device 1-Q and the AI supercomputer based on restructural elastic calculation are compiled Translate system.P, Q is natural number.

High-performance elastic based on Reconfigurable Computation calculates master control system (HEC) for connecting machine under the control of control code Device perceptron and machine behavior device connect Reconfigurable Computation unit (RPU) array.HEC be AI supercomputer (AISuperComputer, AISC central processing unit) provides hardware operation platform for the operation of compiled online system；It is connection Reconfigurable Computation unit (RPU) control platform of array；It is the peripheral hardware control platform for connecting machine perceptron and machine behavior device.

In one example, HEC may include by multilayer system bus, configuration bus, peripheral control unit, on-chip DMA control Device processed, on-chip memory, on piece storage control, chip external memory, one or more compositions in chip external memory controller System-level reconfigurable data access.In the operation of system-level reconfigurable data access, dma controller is by the master in system After controller setting, chip external memory can be accessed by piece file memory controller, by data (computing object) from piece external storage Device reads and is written to on-chip memory or data (calculated result) read from on-chip memory and are written to piece external storage Device.

In one example, HEC may include the system-level reconstructing controller being made of master controller, configuration bus； The system-level reconstructing controller delivery system grade control task is controlled with will pass through multilayer system bus to the peripheral hardware in system Device, dma controller and piece file memory controller are controlled, to complete system-level control.

In one example, master controller can also undertake the partial function of universal cpu processor.

High-performance elastic connection system (HEC_Link) based on Reconfigurable Computation is used under the control of control information connect It connects high-performance elastic and calculates master control system and RPU array.HEC_Link is that AI supercomputer realizes the auxiliary control for calculating the configuration of power elasticity Unit and the main different mode and the corresponding HEC_ of different application scene configuration realizing carrier, being extended according to power is calculated Link.HEC_Link is can be configured by connection relationship of the master control system to RPU array, and control information is exactly for fixed Adopted connection relationship.

In one example, HEC_Link calculates master control system and Reconfigurable Computation unit for connecting high-performance elastic RPU array, realize RPU and RPU between and high speed data transfer and main control between RPU and Master control chip system Configuration status between chip and RPU is communicated with information.

In one example, the HEC_Link can according to need, and to the RPU array having determined, be connected by changing Relationship is connect, allows between each RPU and is combined in different ways.Combination includes but is not limited to be grouped to RPU, defeated respectively Enter different reconfigurable datas and execute different task, input the different same tasks of data execution, and input is same Data execute different tasks, and input same data and be effectively carried out the same task dispatching etc..

In one example, the HEC_Link can be used for that RPU is carried out calculating depth and be calculated wide according to calculation power demand The extension of degree obtains the ability for executing bigger program or more.

In one example, the HEC_Link can be used for carrying out array deployment to RPU array according to calculation power demand, divide Cloth deployment or cloud deployment.

In one example, there are many forms by HEC_Link, include: protocol controller in various forms (for assisting View conversion) and/or bridge controller, bridgt circuit.These controllers or circuit can reside in main control chip, individual chips, Exist in RPU chip or in the form of other.

Reconfigurable Computation cell array (RPU array) is used under the control of configuration information, is executed based on reconfigurable data The reconfigurable data of AI supercomputer calculates.RPU array is the main operational unit of AI supercomputer elastic calculation, the restructural number of AI supercomputer It can all be completed on RPU array according to calculating；Can according to demand additions and deletions RPU or change RPU array arrangement mode and net Network framework realizes the calculation power configuration of Reconfigurable Computation.

In one example, Reconfigurable Computation cell array (RPU array) includes the multiple RPU, RPU by M row N column arrangement It is linked together between RPU by HEC_Link, each RPU can obtain corresponding configuration information by HEC_Link.

In one example, the reconfigurable data for being input to RPU comes from master control system from other RPU or by HEC_Link System, the calculated result of RPU are output to other RPU or are output to master control system, input source and output purpose by HEC_Link Control depending on master control system and HEC_Link.

Reconfigurable data access in RPU array can exchange data by chip external memory.Weighing in the RPU array Structure data path and system-level reconfigurable data access and the two collectively form the main body of the reconfigurable data access of AI supercomputer.

Settable reconstructing controller in each RPU in RPU array.System-level reconstructing controller and HEC_Link control Reconstructing controller in device, RPU array in each RPU collectively forms the main body of the reconstructing controller of AI supercomputer.

At least one machine perceptron 1-N enters information as restructural number for providing environment sensing information or equipment According to.Machine perceptron is the peripheral hardware of AI supercomputer, for providing vision, the sense of hearing, tactile, the sense of taste, geographical location, pose for AI supercomputer The environment sensings information such as variation or equipment input information.

In one example, machine perceptron includes end sensor, completes the letter of terminal, peripheral environment and oneself state Breath acquisition.End sensor includes but is not limited to imaging sensor (Camera), millimetre-wave radar (Radar), ultrasonic radar (Ultrasonic), laser radar (Lidar), Inertial Measurement Unit (IMU), microphone (MIC), Global Satellite Navigation System (GNSS), touch screen (Touch Panel), stress induction device etc..

In one example, machine perceptron further include: have the sensor module of perception analysis and computing capability, to end The data of end sensor acquisition carry out secondary analysis and calculate the new environment sensing information of generation.Sensor module includes but is not limited to RGB-D depth camera, binocular depth camera, VIO three-dimensional reconstruction camera etc.；Its perception analysis and computing capability can based on CPU, GPU, FPGA or DSP can also be based on RPU.

At least one machine behavior device 1-P is the peripheral hardware of AI supercomputer, for exporting the result of the calculating of AI supercomputer or reasoning Or execute the instruction of AI supercomputer.

In one example, the machine behavior device based on restructural elastic calculation include but is not limited to communication unit, it is man-machine One or more of interface, servo mechanism, control unit etc..

AI supercomputer compiling system based on restructural elastic calculation, for compiling of application to be generated to the control of master controller Code processed, the control information of HEC_Link, RPU array every configuration information.

Specifically, application program can be marked and be pre-processed by AI supercomputer compiling system, resolve into master control system It executes code and RPU executes code, code is then executed to RPU according to RPU array and carries out code conversion and optimization, task time domain It divides, task RPU division, task configuration information generation, final compiling generates the control of the control code, HEC_Link of master controller Every configuration information of information, RPU array.

In one example, the input of the compiling system of AI supercomputer is the application program that high-level programming language is write, as C, Python etc..Compiling system further includes the application framework that high-level language is relied on, such as TensorFlow, Caffe.The compiling system What system exported is the control code, the control information of HEC_Link, the items of RPU array of the master controller of Reconfigurable Computation with confidence The process state information of the compiler tasks such as the universal executive program of breath and master control system.

Further, the compiling system is supported without order-driven neural network.

In one example, the support is without order-driven neural network, comprising:

(1) task is executed without order-driven neural network.The compiling file that neural network is formed after completing compiling, Every configuration information of the control information of control code, HEC_Link including master controller, RPU array.These files are executing When, do not need instruction control, so that it may realize the output that is input to from reconfigurable data, the result that master control system obtains is exactly mind Result end to end through network.

(2) various neural networks can be supported by extension compiling system.Include in the training and reasoning process of neural network A large amount of parallel computations with compute repeatedly, compiling system can be adapted to according to the characteristics of various neural networks.

Further, the compiling system, can be in compiling, and cooperation HEC_Link realizes the multi-mode bullet to RPU array Property deployment.Multi-mode elasticity deployment mode includes but is not limited to:

(1) according to HEC_Link and RPU array feature, the extension of columns and rows is carried out to RPU array using HEC_Link. The column extension of RPU array is conducive to support more parallel computations, and the row extension of RPU array is conducive to accelerate the longer sequence of calculation Pipeline speed.HEC_Link is Open architecture, can extend RPU according to the calculation power demand and environmental demand of AI supercomputer Array.

Fig. 2 (a) and Fig. 2 (b) illustrate a kind of calculation power expander graphs of AI supercomputer.Fig. 2 (a) is tried hard to before extending.? In Fig. 2 (a), RPU array includes 6 RPU.Configuration information from master control system be configured bus be transmitted to HEC_Link and RPU array, so that RPU array forms such as flowering structure: 6 RPU line up 2 column, 3 row being connected in parallel to each other, and the RPU between every row cannot Be exchanged with each other reconfigurable data, Crossbar according to configuration information or agreement by any one RPU of lastrow export can Reconstruct data pass to any one RPU in next line, and Crossbar is according to configuration information or agreement by last line The reconfigurable data of any one RPU output returns to any one RPU in the first row.

Fig. 2 (b) is tried hard to after extending.In Fig. 2 (b), RPU array extension is 16 RPU, is divided into 4 column 4 in parallel Column, the RPU between every row cannot be exchanged with each other reconfigurable data, and Crossbar is according to configuration information or agreement by lastrow The reconfigurable data of any one RPU output passes to any one RPU in next line, Crossbar according to configuration information or Any one RPU of last line reconfigurable data exported is returned to any one RPU in the first row by person's agreement.

(2) following setting can be done in compiling, changes the connection relationship of RPU by HEC_Link, for having determined RPU array, carry out the adjustment of width perhaps depth to adapt to appointing for more parallel computations or longer calculating cycle respectively Business.

(3) following setting is done in compiling, changes the connection relationship of RPU by HEC_Link, coordinate one in RPU array A or several RPU form an independent RPU group, multiple RPU groups can be formed in a RPU array, asynchronous parallel executes respectively Multitask, and not interfering between each other, to realize the multitask of MIMD and MISD, highly-parallel and the calculating mesh of efficient operation Mark.

Fig. 3 is that AI another kind supercomputer flexible adjustment calculates power, executes the schematic diagram of two tasks parallel.Such as the upper half figure of Fig. 3 It is shown, under the control that master control system is configured the configuration information of bus transmission, the intersection bridge CrossBridge of HEC_Link Form structure so: then interface of the data of task A through HEC_Link inputs RPU0 through A node and HEC_Link interface, The operation result of RPU0 enters RPU1 through HEC_Link interface and CrossBridge.The operation result of RPU1 is equally through HEC_ Link interface and CrossBridge enter RPU2.Hereby it is achieved that the operation of task A.

Equally, the stream compression of task B includes RPU3, RPU4, RPU5, RPU6, finally arrives RPU7.Realize task B's Operation.

It can be seen from figure 3 that task A and task B are carried out parallel substantially.The lower half figure of Fig. 3 briefly illustrate two tasks into Market condition.

Still further, each independent PRU group can be performed simultaneously one or more different training missions；It can also be same The one or more different reasoning tasks of Shi Zhihang；It may also be performed simultaneously multiple training missions and multiple reasoning tasks, and Same data are input to trained network and inference network simultaneously.

(4) it when HEC_Link is disposed using distributed deployment or cloud, can be compiled according to HEC_Link link information Translate, deployment master control system, HEC_Link and and each RPU task, the calculation power to adapt to great scale dispose with it is parallel in terms of It calculates.

Further, when RPU array changes, the including but not limited to variation of RPU quantity, the variation of RPU connection type Deng, it can be by recompilating, realization part or the overall situation redeploy task.

Further, the compiling system can support multi-mode to compile.

In one example, compilation process can be compiled offline mode, and output compiling file is directly passed to operation system System executes；

In one example, it is also possible to run compiled online mode on an operating system, after compiling finishes, according to It needs directly to be executed by operating system.

In one example, the network structure of on-line training can be passed through compiled online deployed in real time by compiled online system Go down, realizes the online real-time update of neural network.

Optionally, AI supercomputer includes the AI supercomputer operating system based on restructural elastic calculation, for managing AI The hardware and software resource of supercomputer manages the peripheral resources of AI supercomputer, and according to the output of compiling system as a result, executing control Code, and configuration information and reconfigurable data are output to RPU array, control executes Reconfigurable Computation program and returns the result.It is main Control system HEC provides hardware operation platform for the operating system of AI supercomputer.

In one example, the operating system is used to obtain the information obtained from machine perceptron, in control machine Device behavior device executes the result that AI is calculated.

In one example, the operating system is for executing compiled online.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all Including within protection scope of the present invention.

Claims

1. a kind of AI supercomputer based on the restructural elastic calculation of high-performance characterized by comprising

Machine perceptron enters information as reconfigurable data for providing environment sensing information or equipment；

Reconfigurable Computation unit (RPU) array；

Machine behavior device, for exporting the result of the calculating of AI supercomputer or reasoning or executing the relevant instruction of AI supercomputer；

High-performance elastic based on Reconfigurable Computation calculates master control system (HEC)；

High-performance elastic based on Reconfigurable Computation connects system (HEC_Link)；

AI supercomputer compiling system based on restructural elastic calculation, for compiling of application to be generated to the control of master controller Every configuration information of code, the control information of HEC_Link, RPU array, so that master control system connects under the control of control code Machine perceptron and machine behavior device, HEC_Link connection high-performance elastic under the control of control information calculate master control system (HEC) and RPU array, RPU array execute the reconfigurable data of AI supercomputer based on reconfigurable data under the control of configuration information It calculates.

2. AI supercomputer according to claim 1, which is characterized in that the HEC includes:

By multilayer system bus, configuration bus, on-chip DMA controller, on-chip memory, on piece storage control, peripheral hardware control The system-level reconfigurable data access of at least one of device and chip external memory composition.

3. AI supercomputer according to claim 1, it is characterised in that: the high-performance elastic connects system (HEC_Link) high speed between RPU and RPU and between RPU and Master control chip system is being realized under the control of control information Data transmission and the configuration status between Master control chip and RPU are communicated with configuration information.

4. AI supercomputer according to claim 1, which is characterized in that RPU array includes one or more RPU, It is linked together between RPU by HEC_Link；Each RPU passes through HEC_Link and obtains configuration information.

5. AI supercomputer according to claim 1, which is characterized in that including the AI based on restructural elastic calculation Supercomputer operating system, for managing the software and hardware resource and peripheral resources of AI supercomputer, for executing compiling system output Compiling file, for obtain from machine perceptron obtain information, for control machine behavior device execute AI calculate knot Fruit, for executing compiled online.

6. AI supercomputer according to claim 1, it is characterised in that: the characteristics of compiling system is according to neural network It is adapted to.

7. AI supercomputer according to claim 1, which is characterized in that the compiling system compiling is used for basis The characteristics of HEC_Link and RPU array, carries out the control information of the extension of columns and/or rows using HEC_Link to RPU array.

8. AI supercomputer according to claim 1, which is characterized in that the compiling system compiling is for passing through HEC_Link changes the control information of the connection relationship of RPU.

9. AI supercomputer according to claim 1, which is characterized in that the control information of the compiling system compiling So that one or more RPU:

It is performed simultaneously one or more different training missions；

It is performed simultaneously one or more different reasoning tasks；

It is performed simultaneously multiple training missions and multiple reasoning tasks；Or

Same data are input to trained network and inference network simultaneously.

10. AI supercomputer according to claim 1, which is characterized in that use distributed deployment in HEC_Link Or cloud dispose when, the compiling system is compiled according to HEC_Link link information, deployment master control system, HEC_Link and with And the task of each RPU.

11. AI supercomputer according to claim 1, which is characterized in that the compiling system is compiled offline mould Formula, output compiling file are directly passed to operating system execution；Or the compiled online mode of operation on an operating system, compiling After finishing, directly executed as needed by operating system.

12. AI supercomputer according to claim 1 characterized by comprising

Compiling system goes down the network structure of on-line training by compiled online deployed in real time, realizes the online reality of neural network Shi Gengxin.