CN110262996A - A kind of supercomputer based on high-performance Reconfigurable Computation - Google Patents
A kind of supercomputer based on high-performance Reconfigurable Computation Download PDFInfo
- Publication number
- CN110262996A CN110262996A CN201910406990.1A CN201910406990A CN110262996A CN 110262996 A CN110262996 A CN 110262996A CN 201910406990 A CN201910406990 A CN 201910406990A CN 110262996 A CN110262996 A CN 110262996A
- Authority
- CN
- China
- Prior art keywords
- rpu
- array
- information
- reconfigurable
- machine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 230000002093 peripheral effect Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 241001269238 Data Species 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 abstract 1
- 230000006399 behavior Effects 0.000 description 16
- 239000011159 matrix material Substances 0.000 description 16
- 238000013528 artificial neural network Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 238000012549 training Methods 0.000 description 10
- 238000012546 transfer Methods 0.000 description 8
- 230000004087 circulation Effects 0.000 description 6
- 230000009975 flexible effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 4
- ROXBGBWUWZTYLZ-UHFFFAOYSA-N [6-[[10-formyl-5,14-dihydroxy-13-methyl-17-(5-oxo-2h-furan-3-yl)-2,3,4,6,7,8,9,11,12,15,16,17-dodecahydro-1h-cyclopenta[a]phenanthren-3-yl]oxy]-4-methoxy-2-methyloxan-3-yl] 4-[2-(4-azido-3-iodophenyl)ethylamino]-4-oxobutanoate Chemical compound O1C(C)C(OC(=O)CCC(=O)NCCC=2C=C(I)C(N=[N+]=[N-])=CC=2)C(OC)CC1OC(CC1(O)CCC2C3(O)CC4)CCC1(C=O)C2CCC3(C)C4C1=CC(=O)OC1 ROXBGBWUWZTYLZ-UHFFFAOYSA-N 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 210000005036 nerve Anatomy 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000014860 sensory perception of taste Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000010181 polygamy Effects 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/382—Information transfer, e.g. on bus using universal interface adapter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
- G06F15/7871—Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/38—Universal adapter
- G06F2213/3852—Converter between protocols
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Stored Programmes (AREA)
Abstract
The present invention relates to a kind of supercomputers based on high-performance Reconfigurable Computation, comprising: machine perceptron, for obtaining reconfigurable data;RPU array, for calculating the reconfigurable data of input;Reconfigurable data is transmitted to RPU array for controlling by master control system;Machine behavior device, for exporting calculated result and/or executing supercomputer instruction;Compiling system for application task to be marked and pre-process, and is decomposed into master control system and executes code and RPU execution code, and ultimately generate every configuration information of the control code of master control system, elastic connection control information and RPU array;Under the control of control code, to form the data path of machine perceptron and RPU array, and the data path of formation machine behavior device and RPU array;And elastic connection control information makes RPU array form elastic computing architecture;And every configuration information of RPU array configures the RPU in RPU array, for calculating reconfigurable data.
Description
Technical field
The present invention relates to Reconfigurable Computation field, more particularly, to it is a kind of by the AI of high-performance Reconfigurable Computation it is super based on
Calculation machine.
Background technique
With the development of science and technology the development of artificial intelligence (artificial intelligence, AI) is advanced by leaps and bounds.But
It is that the platform overwhelming majority that it runs is still based on central processing unit (central processing unit, CPU), figure
Processor (graphics processing unit, GPU), field programmable gate array (field programmable gate
Array, FPGA) and specific integrated circuit (application specific integrated circuit, ASIC) and its group
Close the platform formed.Currently, above-mentioned operation platform still causes much when AI product allocation to developer and user
Puzzlement.
Such as the flexibility ratio highest of CPU, but under the scene of parallel computation a large amount of for needs such as AI, efficiency ratio is very
It is low.The use of GPU and FPGA solves the problems, such as a part of parallel computation, but power consumption and cost are still to influence its deployment
Major reason.For ASIC, there is good efficiency ratio, but AISC can only adapt to fixed algorithm, to algorithm
Evolution is helpless.Secondly, by the one or more platform formed answering in system architecture of CPU, GPU, FPGA and ASIC
Polygamy, expandability, the power consumption of system and the cost for calculating power etc. are all unsatisfactory.
For passing through high speed serialization computer expansion bus standard (peripheral under existing X86-based
Component interconnect express, PCIE) product of AI calculation power is extended in practical applications to iteratively faster AI
The support of algorithm, and to calculate power deployment flexibility by biggish restriction.Nowadays operation platform has become limitation AI
The maximum restraining factors of deployment.
Summary of the invention
The characteristics of the present invention is based on AI calculating connects one by the PCIE interface of primary processor based on X86-based
Or multiple Reconfigurable Computation unit (reconfigurable processing unit, RPU) arrays, can according to product demand and
Power is calculated in the deployment of use environment elasticity, and edge calculations, large-scale calculations and great scale can be supported to calculate, and can also be supported without referring to
It enables the various neural computings of driving, support on-line training and on-line Algorithm iteration, and have high versatility, flexible
Property and Energy Efficiency Ratio.
To achieve the above object, one aspect of the present invention provides a kind of supercomputing based on high-performance Reconfigurable Computation
Machine, comprising: at least one machine perceptron enters information as restructural number for obtaining environment sensing information and/or equipment
According to;At least one Reconfigurable Computation unit R PU array, for calculating the reconfigurable data of input;Master control system is used for
Reconfigurable data is transmitted at least one RPU array by control;At least one machine behavior device, for export calculated result and/
Or execute supercomputer instruction;Compiling system for application task to be marked and pre-process, and is decomposed into master control system
It executes code and RPU executes code;Code is executed to RPU according at least one RPU array and carries out code conversion and optimization, finally
Generate the control code of master control system, every configuration information of elastic connection control information and RPU array;So as in the control of control code
Under system, the data path of at least one machine perceptron and at least one RPU array is formed, and forms at least one machine row
For the data path of device and at least one RPU array;And elastic connection control information makes at least one RPU array form bullet
The computing architecture of property;And every configuration information of RPU array configures the RPU at least one RPU array, for pair
Reconfigurable data is calculated.
Preferably, master control system includes: platform courses center PCH and the master controller based on X86/AMD64 framework;PCH
It is connected with master controller by direct media interface DMI;PCH is connected at least one machine perceptron, is used for environment
Perception information and/or equipment input information are transmitted to the master controller based on X86/AMD64 framework;Based on X86/AMD64 framework
Master controller be connected by PCIE interface at least one RPU array, for reconfigurable data to be transmitted at least one
RPU array, to be calculated;PCH is connected at least one machine behavior device, is used for calculated result from based on X86/
The master controller of AMD64 framework is transmitted at least one machine behavior device.
Preferably, RPU array includes: elastic connecting system HEC_link;One or more RPU;HEC_link is in elasticity
Under the control of connection control information, one or more RPU is connected;One or more RPU are matched accordingly by HEC_link acquisition
Confidence breath;And one or more RPU obtains reconfigurable data from master control system or other RPU by HEC_link;And it is logical
It crosses HEC_link and calculated result is transmitted to master control system or other RPU.
Preferably, at least one RPU array is connected with master control system by PCIE interface, and HEC_link includes: PCIE
Protocol converter, for by PCIE interface message and at least one RPU array configuration bus and reconfigurable data bus into
Row protocol conversion.
Preferably, HEC_link controls information to one or more RPU at least one RPU array according to elastic connection
It carries out calculating depth and calculates the extension of width;And one or more RPU at least one RPU array are grouped, it is used for
Different reconfigurable datas is inputted respectively and executes different task;Or different reconfigurable datas is inputted respectively and executes identical
Business;Or identical reconfigurable data is inputted respectively and executes different task;Or identical reconfigurable data is inputted respectively and is executed
Same task.
Preferably, compiling system is at least one the RPU array having determined, by HEC_link carry out width and/or
The adjustment of depth, to change the connection relationship of one or more RPU.
Preferably, further includes: operating system, for managing the software and hardware resource and peripheral hardware money of supercomputer
Source, and the compiling file of compiling system output is executed, and obtain the information from machine perceptron, and control machine row
Calculated result is executed for device, and drives at least one RPU array, and control to compile according to every configuration information of RPU array
It translates system and executes compiled online.
Preferably, compiling system is compiled offline mode, and the compiling file that compiling is completed is transferred to operating system;Or it compiles
Translating system is compiled online mode, compiles and disposes for operating system real-time perfoming.
Preferably, machine perceptron includes: end sensor, for acquiring surrounding enviroment information and oneself state information;
Sensor module, by being carried out based on secondary analysis to the collected surrounding enviroment information of end sensor and oneself state information
It calculates, build environment perception information and/or equipment enter information as reconfigurable data.
Preferably, machine behavior device includes: communication unit, man-machine interface, servo mechanism and control unit.
Preferably, end sensor includes: imaging sensor, millimetre-wave radar, ultrasonic radar, laser radar, inertia
Measuring unit, microphone, Global Satellite Navigation System, touch screen and stress induction device;Sensor module includes: RGB-D depth
Camera, binocular depth camera and VIO three-dimensional reconstruction camera.
Preferably, environment sensing information includes: vision, the sense of hearing, tactile, the sense of taste, geographical location and change in location.
The present invention realizes based on X86-based, is connected by primary processor with machine perceptron, machine behavior device,
And one or more RPU array is connected by PCIE interface, can flexibly be disposed according to product demand and use environment and calculate power.
It can also support edge calculations, large-scale calculations and great scale to calculate simultaneously, can support the various nerves without order-driven
Network query function supports on-line training and on-line Algorithm iteration, and has high versatility, flexibility and Energy Efficiency Ratio.
Detailed description of the invention
Fig. 1 is the supercomputer architecture schematic diagram provided in an embodiment of the present invention based on high-performance Reconfigurable Computation;
Fig. 2 is that power schematic diagram is calculated in a kind of elasticity deployment provided in an embodiment of the present invention;
Fig. 3 is that power schematic diagram is calculated in another elasticity deployment provided in an embodiment of the present invention;
Fig. 4 a is that a kind of flexible adjustment provided in an embodiment of the present invention calculates power execution multitask schematic diagram;
Fig. 4 b is that another flexible adjustment provided in an embodiment of the present invention calculates power execution multitask schematic diagram.
Specific embodiment
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Fig. 1 is the supercomputer architecture schematic diagram provided in an embodiment of the present invention based on high-performance Reconfigurable Computation.This
The supercomputer being related in invention can be AI supercomputer.
As shown in Figure 1, in one embodiment, the present invention has supplied a kind of supercomputing based on high-performance Reconfigurable Computation
Machine, comprising: at least one machine perceptron enters information as restructural number for obtaining environment sensing information and/or equipment
According to.At least one Reconfigurable Computation unit R PU array, for calculating the reconfigurable data of input.Master control system is used for
Reconfigurable data is transmitted at least one RPU array by control.At least one machine behavior device, for export calculated result and/
Or execute supercomputer instruction.Compiling system for application task to be marked and pre-process, and is decomposed into master control system
It executes code and RPU executes code.Code is executed to RPU according at least one RPU array and carries out code conversion and optimization, finally
Generate the control code of master control system, every configuration information of elastic connection control information and RPU array.So as in the control of control code
Under system, the data path of at least one machine perceptron and at least one RPU array is formed, and forms at least one machine row
For the data path of device and at least one RPU array;And elastic connection control information makes at least one RPU array form bullet
The computing architecture of property;And every configuration information of RPU array configures the RPU at least one RPU array, for pair
Reconfigurable data is calculated.
In one embodiment, master control system is based on X86/AMD64 framework.Master control based on X86/AMD64 framework
System is the central processing unit of AI supercomputer (AI super computer, AISC), be AISC operating system and
The operation of line compiling system provides hardware operation platform.Meanwhile and the multiple RPU arrays of connection control platform.Or it connects
The peripheral hardware control platform of multiple machine perceptrons and multiple machine behavior devices;It is the operation platform and reconfigurable data of general program
The control platform of calculating.
In one example, the master control system based on X86/AMD64 framework may include: platform courses center
(planform controller hub, PCH) and master controller based on X86/AMD64 framework.
Wherein, PCH is the primary interface controller that the primary processor based on X86/AMD64 framework is connect with peripheral hardware.PCH with
Master controller is connected by direct media interface (direct media interface, DMI).PCH and at least one machine
Perceptron is connected, for environment sensing information and/or equipment input information to be transmitted to the master control based on X86/AMD64 framework
Device processed.Master controller based on X86/AMD64 framework is connected by PCIE interface at least one RPU array, and being used for can
Reconstruct data are transmitted at least one RPU array, to be calculated.PCH is connected at least one machine behavior device, is used for
Calculated result or the instruction for executing AI supercomputer are transmitted at least one from the master controller based on X86/AMD64 framework
Machine behavior device.
In one example, the primary processor based on X86/ADM64 framework can be the system-level of AI supercomputer can
Reconfigurable controller, while also undertaking the repertoire of universal cpu processor.Wherein, system-level reconstructing controller, RPU array
In HEC_Link controller and RPU array in reconstructing controller in each RPU together constitute AI supercomputer
Reconstructing controller main body.
In another example, the master control system based on X86/AMD64 framework can also include: Double Data Rate synchronous dynamic
Random access memory (double data rate SDRAM, abbreviation DDR), can for the master controller based on X86/AMD64 framework
Temporarily data to be stored to DDR.To save the realization that executes of task, and improve task execution efficiency.
In one example, the compiling system of AI supercomputer is divided for application program to be marked and pre-process
Solution executes code at master control system and RPU executes code.Then according to RPU array to RPU execute code carry out code conversion with
Optimization.Such as task temporal partitioning, task RPU are divided, the generation of task configuration information.Final compiling generates the control of master controller
Code processed, the control information of RPU_Link, RPU array every configuration information.
In one embodiment, at least one RPU array is the main operational list that AI supercomputer carries out elastic calculation
Member, the reconfigurable data calculating carried out in AI supercomputer are all completed on RPU array.And it is let it pass according to actual
Demand, can be with the arrangement mode and the network architecture of RPU quantity or change RPU array in additions and deletions RPU array, so as to reality
The elasticity configuration of the calculation power of existing Reconfigurable Computation.
In one example, RPU array may include: elastic connecting system HEC_link;One or more RPU.HEC_
Link connects one or more RPU under the control of elastic connection control information.One or more RPU are obtained by HEC_link
Corresponding configuration information, and one or more RPU is taken to obtain restructural number from master control system or other RPU by HEC_link
According to, while calculated result is also transmitted to by master control system or other RPU by HEC_link.Those skilled in the art should be noted that can
Reconstruct data input source and calculate structure output purpose depending on based on X86/ADM64 framework master control system and
The control of HEC_Link.
In one embodiment, HEC_Link be AI supercomputer realize calculate power elasticity configuration main logic unit with
And realize carrier.It in one example, can be corresponding according to the different mode and different application scene configuration of calculating power extension
HEC_Link.HEC_Link is used to connect multiple RPU in master control system and RPU array based on X86/ADM64 framework, thus
It realizes between high speed data transfer and master control system and the RPU between RPU and RPU and between RPU and master control system
Configuration status is communicated with information.
In one example, HEC_Link can according to need, to the RPU array for having determined that RPU number, by changing
Become connection relationship, allows between each RPU and be combined in different ways.
In one example, compiling system carries out width by HEC_link at least one the RPU array having determined
And/or the adjustment of depth, to change the connection relationship of one or more RPU.Believe for example, HEC_link is controlled according to elastic connection
It ceases and one or more RPU at least one RPU array is carried out calculating depth and calculate the extension of width, to be executed
The ability of bigger program or more.HEC_link can also control information at least one RPU array according to elastic connection
In one or more RPU grouping, for inputting different reconfigurable datas respectively and executing different task;Or it inputs respectively not
Same reconfigurable data simultaneously executes same task;Or identical reconfigurable data is inputted respectively and executes different task;Or respectively
It inputs identical reconfigurable data and expeditiously executes same task etc..
In one example, at least one RPU array and the master control system based on X86/AMD64 framework pass through PCIE interface
It is connected, HEC_link may include: PCIE protocol converter, for will be in PCIE interface message and at least one RPU array
Configuration bus and reconfigurable data bus carry out protocol conversion.
Present invention employs PCIE interface connection master control system and at least one RPU arrays, adopt compared to existing some schemes
It is compared with special purpose interface, makes versatility more preferable using the PCIE interface of standard, reduce hardware cost and development cost.
In another example, HEC_link further includes reconfigurable data bus control unit, reconfigurable data bus, can weigh
Structure data bus bridging circuits, configuration bus control unit and configuration bus.By reconfigurable data bus control unit, restructural number
The reconfigurable data access collectively constituted according to bus, reconfigurable data bus bridge circuit.And by configuration bus control unit, match
Set the RPU configuration information access that bus collectively constitutes.
In one embodiment, machine perceptron includes: end sensor, for acquiring surrounding enviroment information and itself shape
State information;Sensor module, it is secondary for being carried out to the collected surrounding enviroment information of end sensor and oneself state information
Analytical calculation, build environment perception information and/or equipment enter information as reconfigurable data.
In one example, machine perceptron is the peripheral device of AI supercomputer, for mentioning for AI supercomputer
Information is inputted for the environment sensings information such as vision, the sense of hearing, tactile, the sense of taste, geographical location, pose variation or equipment.In an example
In son, machine perceptron includes end sensor, for completing the information collection of terminal, peripheral environment and oneself state;And tool
Standby perception analysis and computing capability sensor module, the data for acquiring to end sensor carry out secondary analysis and calculate generation
New environment sensing information.Wherein, end sensor may include: imaging sensor (camera), millimetre-wave radar
(radar), ultrasonic radar (ultrasonic), laser radar (lidar), Inertial Measurement Unit (IMU), microphone (MIC),
Global Satellite Navigation System (global navigation satellite system, GNSS), touch screen (touch panel)
With stress induction device etc..Sensor module may include: RGB-D depth camera, binocular depth camera and VIO three-dimensional reconstruction phase
Machine etc..In another example, perception analysis and computing capability can be based on CPU, GPU, FPGA or DSP, can also be based on
RPU。
Those skilled in the art should be noted that any equipment that can provide input information may each be machine perceptron, this hair
It is bright it is not limited here.
In one embodiment, machine behavior device is the peripheral device of AI supercomputer, for exporting AI supercomputing
In machine calculate or reasoning as a result, and execute AI supercomputer instruction.Wherein, calculated result can be for training nerve net
The trained neural network model exported when network;The result of reasoning can be the meter that sample data is passed through to neural network model
The result obtained after calculating.Wherein, machine behavior device may include: communication unit, man-machine interface, servo mechanism and control unit
Deng.
Those skilled in the art should be noted that any equipment for executing output result that can provide may each be machine behavior device,
The present invention is it is not limited here.
In one embodiment, AI supercomputer further include: operating system, for manage supercomputer software and
Hardware resource and peripheral resources, and the compiling file of compiling system output is executed, and obtain from machine perceptron
Information, and control machine behavior device execute calculated result, and drive at least one according to every configuration information of RPU array
RPU array, and control compiling system execute compiled online.
In one example, the operating system of AI supercomputer is used to manage the hardware and software money of AI supercomputer
Source, and the peripheral resources of management AI supercomputer.For executing the compiling file of compiling system output, for obtaining slave
The information that device perceptron obtains.It is also used to execute control code, and configuration information and reconfigurable data is output to corresponding RPU
In array, to control execution Reconfigurable Computation program and to return the result.And it controls machine behavior device and executes AI supercomputer
Calculate as a result, and execute compiled online.
In another example, operating system can be also used for driving the RPU array according to the information of RPU array.
In one example, the input of compiling system can be the application program that high-level programming language is write, as C language,
Python etc..Or the application framework also relied on comprising high-level language, such as TensorFlow, Caffe neural network frame
Frame.Control code, the control information of RPU_Link, RPU gusts of the master controller that can be Reconfigurable Computation of compiling system output
The process state information of the compiler tasks such as the universal executive program of every configuration informations of column and master control system.
In one example, compiling system is supported in RPU array without order-driven neural network.It is not necessarily to order-driven
Neural network task is executed on RPU array.For example, the compiling file that neural network is formed after completing compiling, may include
The control code of primary processor, the control information of RPU_Link, in RPU array RPU every configuration information.In task execution,
Instruction control is not needed, but uses data-driven, so that it may realize from reconfigurable data and be input to output, master control system obtains
The result obtained is exactly the result end to end of neural network.In another example, it can also be supported by extension compiling system each
Kind neural network.In the training and reasoning process of neural network comprising a large amount of parallel computations with compute repeatedly, compiling system can root
It is adapted to according to the characteristics of various neural networks, finally show that neural network structure and neural network network parameter are deployed to accordingly
RPU array on.
In one example, compiling system can dispose the multi-mode elasticity of RPU array in compiling.Multi-mode is compiled
The input for translating system can be not limited to the different mode such as application program or neural network.Compiling system can be according to RPU gusts
The characteristics of column, can use the extension that RPU_Link carries out columns and rows to RPU array.Wherein, the column of RPU array are extended advantageous
In supporting more parallel computations, the row extension of RPU array is conducive to accelerate the assembly line (pipeline) of the longer sequence of calculation
Speed.Wherein, RPU_Link is Open architecture, i.e., without limitation to the quantity of row and column in RPU array.The column of RPU array
It is depended primarily on capable extension and calculates power demand.
In another example, it can also be arranged in compiling, change the connection relationship of RPU by HEC_Link.For
It has determined that for the RPU array of RPU quantity, the adjustment of width or depth can also be carried out, to adapt to respectively more simultaneously
Row calculates or the task of longer calculating cycle.
In further example, it can also be arranged in compiling, change the connection relationship of RPU by HEC_Link, coordinate RPU
One or several RPU in array form an independent RPU group.Multiple RPU groups can be formed in one RPU array, it is different respectively
Step is parallel to execute multitask, and does not interfere between each other, to realize MIMD multiple-instruction-stream multiple-data stream (multiple instruction
Multiple data, MIMD) and Multiple Instruction stream Single Data stream (multiple instruction single data, MISD)
The calculating target of equal multitasks, highly-parallel and efficient operation.
For example, one or more different training missions can be performed simultaneously;Alternatively, one or more differences can be performed simultaneously
Reasoning task;Alternatively, multiple training missions and multiple reasoning tasks can be performed simultaneously;Alternatively, can be defeated simultaneously by same data
Enter to training network and inference network.
In further example, multiple RPU arrays can also be accessed, in master control system to adapt to more according to actual needs
It is large-scale to calculate power deployment and parallel computation.
In one example, compiling system can support multi-mode to compile.Those skilled in the art should be noted that be sent out in RPU array
When changing, the including but not limited to variation of RPU array quantity, the variation of RPU quantity, the variation of RPU connection type in RPU array
Deng.It can realize that part or the overall situation redeploy task by recompilating.
In one embodiment, compiling system is compiled offline mode, and the compiling file that compiling is completed is transferred to operation
System;Or compiling system is compiled online mode, compiles and disposes for operating system real-time perfoming.
In one example, the multi-mode compiling that compiling system is supported can be compiled offline mode or compiled online mould
Formula.Compiled offline mode is that the compiling file of output is directly passed to operating system to be used to execute.In other words, as by complete
Whole compiling file is passed to and is executed in operating system.Compiled online mode is after a part compiling finishes, so that it may root
It is directly executed by operating system according to needs.In other words, as complete compiling file may also not have compiling to complete, but wherein
A part has been completed compiling, at this point it is possible to directly be executed by operating system, after waiting further parts to complete compiling, then by grasping
Make system execution.In another example, the network structure of on-line training can be passed through the real-time portion of compiled online by compiling system
Administration goes down, and realizes the online real-time update of neural network.
The present invention is connected based on X86-based, through primary processor with machine perceptron, machine behavior device, and leads to
PCIE interface connection one or more RPU array is crossed, can flexibly be disposed according to product demand and use environment and calculate power.Simultaneously also
Edge calculations, large-scale calculations and great scale can be supported to calculate, can support the various neural network meters without order-driven
It calculates, support on-line training and on-line Algorithm iteration, and there is high versatility, flexibility and Energy Efficiency Ratio.
Fig. 2 is that power schematic diagram is calculated in a kind of elasticity deployment provided in an embodiment of the present invention.
As shown in Fig. 2, for a kind of calculation power schematic diagram of elasticity deployment, for example, needing to be implemented at this time for task needs 6
RPU, RPU as shown in Figure 200、RPU01、RPU10、RPU11、RPU20And RPU21.Above-mentioned 6 RPU constitute the RPU of 3 rows 2 column
Array.But if desired executing for task can then go to RPU array from when 6 RPU being needed to become to need 12 RPU
Or the extension of column.Calculation power schematic diagram after extension can be as shown in Figure 3.Fig. 3 is another elasticity provided in an embodiment of the present invention
Power schematic diagram is calculated in deployment.Needing to be implemented at this time for task becomes 12 RPU by 6 original RPU, as shown in Figure 3
RPU00、RPU01、RPU02、RPU03、RPU10、RPU11、RPU12、RPU13、RPU20、RPU21、RPU22、RPU23、RPU30、RPU31、
RPU32And RPU33.Above-mentioned 12 RPU constitute the RPU array of 4 rows 4 column.To realize the elasticity deployment of RPU array,
Fig. 4 a is that a kind of flexible adjustment provided in an embodiment of the present invention calculates power execution multitask schematic diagram.
Another flexible adjustment is provided as shown in fig. 4 a and calculates power, and executes the schematic diagram of multitask.It can incite somebody to action at this time
Different number RPU in RPU array is grouped, such as A group --- RPU 2, B group are RPU 3 --- RPU 7 that is RPU 0.This
When, it can be grouped for different RPU, execute different task respectively, such as A group executes A task, B group executes B task.Such as figure
Shown in 4a, A task needs to call the RPU 0 in RPU array --- RPU 2;B task needs to call the RPU in RPU array
3——RPU7。
Master control system is by configuring bus for the configuration information for being used to configure bridge module and being used to configure RPU array
Configuration information is transmitted to protocol controller.The configuration information received is carried out protocol conversion by protocol controller, is converted into serial
Signal.Protocol controller is by the first control bus in first group of channel, by the serial signal transfer after conversion to bridging mould
Block.Since the second control bus in bridge module is connected with the first control bus in first group of channel, so that in the first control
The serial signal transmitted in bus processed is transmitted to the bridge controller in bridge module by the second control bus.Bridge controller
The configuration information of RPU array will be used to configure, is controlled by the RPU that the second control bus is connect with each RPU in RPU array total
Line is transmitted in RPU array in corresponding RPU, to complete the configuration to RPU.
Bridge controller configures bridge joint submodule according to the configuration information for configuring bridge module.In an example
In son, such as current A task needs to use 4 RPU and carries out serial computing, respectively RPU0 --- RPU 3.A task is held at this time
RPU needs sequence is called when row, sequentially can be from RPU 0 to RPU 3 successively sequence call.Bridge controller is to bridge joint submodule
Corresponding access matrix is bridged in block, such as access matrix in bridge module in Fig. 4 a, the wherein intersection location of black dot
The position bridged, for connecting the RPU input channel and RPU output channel that are in intersection location.For A task, figure
Bridge joint mode shown in 4a is that the second Output matrix channel (Output_Array) is bridged to the input channel of RPU 0
(Input_RPU 0), the output channel (Output_RPU 0) of RPU 0 are bridged to the input channel (Input_RPU of RPU 1
1), the output channel (Output_RPU 1) of RPU 1 is bridged to the input channel (Input_RPU 2) of RPU2, the output of RPU 2
Channel (Output_RPU 2) is bridged to the input channel (Input_RPU 3) of RPU 3, when A task only need to call RPU 0 to
When a circulation of RPU 3 can be completed, then the output channel (Output_RPU 3) of RPU 3 is bridged to the second square
Battle array input channel (Input_Array).But in another example, if RPU's 0 to RPU 3 is more when A task needs to call
When a circulation could be completed, then the output channel (Output_RPU 3) of RPU 3 is bridged to the input channel of RPU 0
(Input_RPU 0).Those skilled in the art, which should be noted that, needs to discharge at this time second Output matrix channel (Output_Array) bridge
It is connected to the bridge joint of the input channel (Input_RPU 0) of RPU 0, to complete the recursive call of RPU.Until having executed last
When a circulation, the output channel (Output_RPU end) and other RPU input channels (Input_RPU) of RPU end are discharged
Bridge joint, and the output channel of RPU end (Output_RPU end) is bridged to the second Input matrix channel (Input_
Array), for calculated result to be transferred to HEC master control system.Wherein, RPU end indicates the last one RPU executed, can
To be any one in RPU 0 to RPU 3.
For B task, the bridge joint mode shown in Fig. 4 a is to be bridged to the second Output matrix channel (Output_Array)
The input channel (Input_RPU 4) of RPU 4, the input that the output channel (Output_RPU 4) of RPU 4 is bridged to RPU 5 are logical
Road (Input_RPU 5), the output channel (Output_RPU 5) of RPU 5 are bridged to the input channel (Input_RPU of RPU 6
6), the output channel (Output_RPU 6) of RPU 6 is bridged to the input channel (Input_RPU 7) of RPU 7.When B task
When needing to call a circulation of RPU 4 to RPU7 can complete, then by the output channel (Output_RPU of RPU 7
7) it is bridged to the second Input matrix channel (Input_Array).But in another example, if when B task needs to call RPU
When 4 to RPU 7 multiple circulations could be completed, then the output channel (Output_RPU 7) of RPU 7 is bridged to the defeated of RPU 4
Enter channel (Input_RPU 4).Those skilled in the art, which should be noted that, needs to discharge at this time the second Output matrix channel (Output_
Array it is bridged to the bridge joint of the input channel (Input_RPU 4) of RPU 4) to complete the recursive call of RPU.Until executing
When the last one complete circulation, the output channel (Output_RPU end) and other RPU input channels of RPU end are discharged
(Input_RPU) bridge joint, and the output channel of RPU end (Output_RPU end) is bridged to the second Input matrix channel
(Input_Array), for calculated result to be transferred to HEC master control system.Wherein, RPU end indicates that the last one has been executed
RPU, can be any one in RPU 0 to RPU 7.
Those skilled in the art should be noted that RPU quantity is depending on practical calculating task demand, and the present invention corresponds to RPU number
It measures and is not specifically limited.Meanwhile those skilled in the art shall also be noted that for different task, the second output channel needs pass through
The mode of time-sharing multiplex, is bridged from different RPU input channels and the second input channel needs side by time-sharing multiplex
Formula is bridged from different RPU output channels, to realize the parallel execution of multitask.
By above-mentioned bridge module to the bridge joint of RPU array, when needing A task and B task executes parallel, for calculating
Reconfigurable data by data bus transmission to protocol controller, by protocol controller by the restructural of A task and B task
Data carry out protocol conversion, are converted to serial signal.And pass through the first Output matrix channel transfer in first group of channel to bridge joint
Second Output matrix channel of module.The serial signal of A task can be passed through bridge according to above-mentioned bridge joint mode by bridge module
It connects from the second Output matrix channel transfer to 0 input channel of RPU, and by the serial signal of B task by bridging from the second square
Battle array output channel is transmitted to 4 input channel of RPU.
A task is by 0 input channel of RPU in second group of channel connecting with RPU 0 by serial signal transfer to RPU
It is accordingly calculated in 0.After the completion of the calculating of RPU 0, the data having been calculated are passed through in second group of channel connecting with RPU 0
0 output channel of RPU be transmitted to 0 output channel of RPU in bridge module.Bridge module further according to above-mentioned bridge joint mode,
The data having been calculated are transmitted to 1 input channel of RPU by 0 output channel of RPU, are continued so that data are transmitted in RPU 1
It is calculated.Repeat the above process until calculating task after the completion of, by the data having been calculated by connect with RPU end second
RPU end output channel in group channel is transmitted to the RPU end output channel in bridge module.Bridge module is according to above-mentioned
The data having been calculated are transmitted to the second Input matrix channel by RPU end output channel, and pass through first group by bridge joint mode
The first Output matrix channel transfer in channel is to protocol controller.The data having been calculated are carried out protocol conversion by protocol controller
Afterwards, it is transmitted to HEC master control system, to complete calculating task.
B task is by 4 input channel of RPU in second group of channel connecting with RPU 4 by serial signal transfer to RPU
It is accordingly calculated in 4.After the completion of the calculating of RPU 4, the data having been calculated are passed through in second group of channel connecting with RPU 4
4 output channel of RPU be transmitted to 4 output channel of RPU in bridge module.Bridge module further according to above-mentioned bridge joint mode,
The data having been calculated are transmitted to 5 input channel of RPU by 4 output channel of RPU, are continued so that data are transmitted in RPU 5
It is calculated.Repeat the above process until calculating task after the completion of, by the data having been calculated by connect with RPU end second
RPU end output channel in group channel is transmitted to the RPU end output channel in bridge module.Bridge module is according to above-mentioned
The data having been calculated are transmitted to the second Input matrix channel by RPU end output channel, and pass through first group by bridge joint mode
The first Output matrix channel transfer in channel is to protocol controller.The data having been calculated are carried out protocol conversion by protocol controller
Afterwards, it is transmitted to HEC master control system, to complete calculating task.
Those skilled in the art should be noted that during above-mentioned task execution, in RPU array for execute the RPU of task with
And bridge module can also by the status information of itself by respective control bus feed back to the bridge controller of bridge module with
And HEC master control system, when executing complex task, can dynamically to adjust the bridged appearances in bridge module, and adjustment
RPU is to execute different calculating tasks.
During above-mentioned task execution, each configuration information recycled is transmitted to RPU according to demand by control bus
On array, status information of the bridge module according to the task execution of each RPU in configuration information and RPU array, dynamic control
The bridge joint of input channel and output channel processed, the data transmitted for RPU according to corresponding configuration information and input channel into
Row is corresponding to be calculated, and is exported the data having been calculated by output channel and believed by the state that corresponding control bus feeds back itself
Breath.
Fig. 4 b is that another flexible adjustment provided in an embodiment of the present invention calculates power execution multitask schematic diagram.
As described in Fig. 4 b, the logical architecture figure of Fig. 4 a is shown.For the parallel execution of multitask in Fig. 4 a, master control system
Can be by configuring bus, the sequence for configuring multiple RPU of different task in RPU array is called, such as A task be RPU 0 to
RPU 2 successively call by sequence, and B task is that RPU 3 to RPU7 is successively sequentially called.And pass through data/address bus for A task and B task
Reconfigurable data for calculating is transmitted in RPU array in corresponding RPU, and is calculated.When the calculation is finished, then pass through
A task and the complete data of B task computation are transmitted to master control system by data/address bus.Those skilled in the art, which should be noted that, to be calculated
Cheng Zhong, master control system can also be according to the status informations for the RPU feedback used in bridge module and RPU array, by configuring bus
New configuration information is sent, for bridging the bridge joint in submodule and the RPU in RPU array in dynamic configuration bridge module.
The present invention realizes based on X86-based, is connected by primary processor with machine perceptron, machine behavior device,
And one or more RPU array is connected by PCIE interface, can flexibly be disposed according to product demand and use environment and calculate power.
It can also support edge calculations, large-scale calculations and great scale to calculate simultaneously, can support the various nerves without order-driven
Network query function supports on-line training and on-line Algorithm iteration, and has high versatility, flexibility and Energy Efficiency Ratio.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure
Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate
The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.
These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.
Professional technician can use different methods to achieve the described function each specific application, but this realization
It should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor
The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory
(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field
In any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of supercomputer based on high-performance Reconfigurable Computation characterized by comprising
At least one machine perceptron enters information as reconfigurable data for obtaining environment sensing information and/or equipment;
At least one Reconfigurable Computation unit R PU array, for calculating the reconfigurable data of input;
The reconfigurable data is transmitted at least one described RPU array for controlling by master control system;
At least one machine behavior device, for exporting calculated result and/or executing supercomputer instruction;
Compiling system for application task to be marked and pre-process, and is decomposed into master control system and executes code and RPU execution
Code;Code is executed to the RPU according at least one described RPU array and carries out code conversion and optimization, ultimately generates master control
Control code, the elastic connection of system control every configuration information of information and RPU array;So as in the control of the control code
Under, the data path of at least one machine perceptron Yu at least one RPU array is formed, and form at least one machine
The data path of behavior device and at least one RPU array;And the elastic connection control information makes described at least one
A RPU array forms the computing architecture of elasticity;And every configuration information of the RPU array is to described at least one RPU gusts
RPU in column is configured, for calculating the reconfigurable data.
2. supercomputer according to claim 1, which is characterized in that the master control system includes: platform courses center
PCH and master controller based on X86/AMD64 framework;The PCH is connected with the master controller by direct media interface DMI
It connects;
The PCH is connected at least one described machine perceptron, for inputting the environment sensing information and/or equipment
Information is transmitted to the master controller based on X86/AMD64 framework;
The master controller based on X86/AMD64 framework is connected by the PCIE interface at least one described RPU array
It connects, for the reconfigurable data to be transmitted at least one described RPU array, to be calculated;
The PCH is connected at least one described machine behavior device, for calculated result to be based on X86/AMD64 frame from described
The master controller of structure is transmitted at least one described machine behavior device.
3. supercomputer according to claim 1, which is characterized in that the RPU array includes:
Elastic connecting system HEC_link;
One or more RPU;
The HEC_link connects one or more of RPU under the control of elastic connection control information;
One or more of RPU obtain corresponding configuration information by the HEC_link;And
One or more of RPU obtain the restructural number from the master control system or other RPU by the HEC_link
According to;And calculated result is transmitted to by the master control system or other RPU by the HEC_link.
4. supercomputer according to claim 3, which is characterized in that at least one described RPU array and the master control
System is connected by PCIE interface, and the HEC_link includes:
PCIE protocol converter, for by the PCIE interface message and at least one described RPU array configuration bus and
Reconfigurable data bus carries out protocol conversion.
5. supercomputer according to claim 3, which is characterized in that the HEC_link is controlled according to elastic connection to be believed
It ceases and one or more RPU at least one described RPU array is carried out calculating depth and calculate the extension of width;
And one or more RPU at least one described RPU array are grouped, it is used for
Different reconfigurable datas is inputted respectively and executes different task;Or
Different reconfigurable datas is inputted respectively and executes same task;Or
Identical reconfigurable data is inputted respectively and executes different task;Or
Identical reconfigurable data is inputted respectively and executes same task.
6. supercomputer according to claim 3, which is characterized in that the compiling system to described in having determined extremely
A few RPU array, carries out the adjustment of width and/or depth, by the HEC_link to change one or more of RPU
Connection relationship.
7. supercomputer according to claim 1, which is characterized in that further include:
Operating system, for managing the software and hardware resource and peripheral resources of the supercomputer, and described in executing
The compiling file of compiling system output, and the information from the machine perceptron is obtained, and the control machine behavior
Device executes calculated result, and drives at least one described RPU array according to every configuration information of the RPU array, and
It controls the compiling system and executes compiled online.
8. supercomputer according to claim 7, which is characterized in that the compiling system is compiled offline mode, will
The compiling file that compiling is completed is transferred to the operating system;Or
The compiling system is compiled online mode, compiles and disposes for the operating system real-time perfoming.
9. supercomputer according to claim 1, which is characterized in that the machine perceptron includes:
End sensor, for acquiring surrounding enviroment information and oneself state information;
Sensor module, it is secondary for being carried out to the collected surrounding enviroment information of the end sensor and oneself state information
Analytical calculation, generates the environment sensing information and/or equipment enters information as the reconfigurable data.
10. supercomputer according to claim 1, which is characterized in that the machine behavior device include: communication unit,
Man-machine interface, servo mechanism and control unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406990.1A CN110262996B (en) | 2019-05-15 | 2019-05-15 | Super computer based on high-performance reconfigurable calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406990.1A CN110262996B (en) | 2019-05-15 | 2019-05-15 | Super computer based on high-performance reconfigurable calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110262996A true CN110262996A (en) | 2019-09-20 |
CN110262996B CN110262996B (en) | 2023-11-24 |
Family
ID=67914737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910406990.1A Active CN110262996B (en) | 2019-05-15 | 2019-05-15 | Super computer based on high-performance reconfigurable calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110262996B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111859904A (en) * | 2020-07-31 | 2020-10-30 | 南京三百云信息科技有限公司 | NLP model optimization method and device and computer equipment |
CN112202243A (en) * | 2020-09-17 | 2021-01-08 | 许继集团有限公司 | Full-acquisition intelligent terminal for power transmission line state monitoring |
TWI798642B (en) * | 2021-02-09 | 2023-04-11 | 寧茂企業股份有限公司 | Array controlling system and controlling method thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646243B1 (en) * | 2016-09-12 | 2017-05-09 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN108804379A (en) * | 2017-05-05 | 2018-11-13 | 清华大学 | Reconfigurable processor and its configuration method |
-
2019
- 2019-05-15 CN CN201910406990.1A patent/CN110262996B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646243B1 (en) * | 2016-09-12 | 2017-05-09 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN108804379A (en) * | 2017-05-05 | 2018-11-13 | 清华大学 | Reconfigurable processor and its configuration method |
Non-Patent Citations (3)
Title |
---|
王延升: "《粗粒度动态可重构处理器中的高能效关键配置技术研究》", 《中国博士学位论文全文数据库信息科技辑》 * |
魏少军等: "可重构计算处理器技术", 《中国科学:信息科学》 * |
黄石等: "粗粒度可重构并行计算的面向对象仿真研究", 《计算机工程与设计》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111859904A (en) * | 2020-07-31 | 2020-10-30 | 南京三百云信息科技有限公司 | NLP model optimization method and device and computer equipment |
CN112202243A (en) * | 2020-09-17 | 2021-01-08 | 许继集团有限公司 | Full-acquisition intelligent terminal for power transmission line state monitoring |
TWI798642B (en) * | 2021-02-09 | 2023-04-11 | 寧茂企業股份有限公司 | Array controlling system and controlling method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN110262996B (en) | 2023-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rupp et al. | The NAPA adaptive processing architecture | |
Jin et al. | Modeling spiking neural networks on SpiNNaker | |
CN110262996A (en) | A kind of supercomputer based on high-performance Reconfigurable Computation | |
CN106201651A (en) | The simulator of neuromorphic chip | |
Meng et al. | Accelerating proximal policy optimization on cpu-fpga heterogeneous platforms | |
Smaragdos et al. | BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations | |
CN112580792B (en) | Neural network multi-core tensor processor | |
Curzel et al. | Automated generation of integrated digital and spiking neuromorphic machine learning accelerators | |
Vidal et al. | Solving optimization problems using a hybrid systolic search on GPU plus CPU | |
CN114035916A (en) | Method for compiling and scheduling calculation graph and related product | |
Nurvitadhi et al. | Scalable low-latency persistent neural machine translation on CPU server with multiple FPGAs | |
CN110059050B (en) | AI supercomputer based on high-performance reconfigurable elastic calculation | |
Chatzikonstantis et al. | Multinode implementation of an extended hodgkin–huxley simulator | |
Othman et al. | MPSoC design approach of FPGA-based controller for induction motor drive | |
Smaragdos et al. | Performance analysis of accelerated biophysically-meaningful neuron simulations | |
Zhang et al. | Biophysically accurate foating point neuroprocessors for reconfigurable logic | |
CN112799603A (en) | Task behavior model for multiple data stream driven signal processing system | |
Fox | Massively parallel neural computation | |
Lin et al. | swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer | |
CN110135572B (en) | SOC-based trainable flexible CNN system design method | |
Boniface et al. | A bridge between two paradigms for parallelism: Neural networks and general purpose mimd computers | |
Kerckhoffs et al. | Speeding up backpropagation training on a hypercube computer | |
Hielscher et al. | Platform generation for edge AI devices with custom hardware accelerators | |
CN117540783B (en) | Method and device for generating simulated brain activity data, electronic device and storage medium | |
CN102760097A (en) | Computer architecture performance simulation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20231030 Address after: Room 1802, Taixin Building, No. 33 Jinhua Road, Shibei District, Qingdao City, Shandong Province, 266011 Applicant after: Qingdao TianKuo Information Technology Co.,Ltd. Address before: 100142 907, area 1, floor 9, No. 160, North West Fourth Ring Road, Haidian District, Beijing Applicant before: BEIJING HYPERX AI COMPUTING TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |