CN114127689B - 用于与硬件加速器接口的方法 - Google Patents

用于与硬件加速器接口的方法 Download PDF

Info

Publication number
CN114127689B
CN114127689B CN202080051285.5A CN202080051285A CN114127689B CN 114127689 B CN114127689 B CN 114127689B CN 202080051285 A CN202080051285 A CN 202080051285A CN 114127689 B CN114127689 B CN 114127689B
Authority
CN
China
Prior art keywords
operations
routine
operation set
hardware accelerator
computing task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080051285.5A
Other languages
English (en)
Chinese (zh)
Other versions
CN114127689A (zh
Inventor
C·皮维特奥
N·约安诺
I·克拉夫祖克
M·勒加洛-布尔多
A·塞巴斯蒂安
E·S·埃勒夫塞里奥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN114127689A publication Critical patent/CN114127689A/zh
Application granted granted Critical
Publication of CN114127689B publication Critical patent/CN114127689B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)
CN202080051285.5A 2019-07-15 2020-06-30 用于与硬件加速器接口的方法 Active CN114127689B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/511,689 US11250107B2 (en) 2019-07-15 2019-07-15 Method for interfacing with hardware accelerators
US16/511,689 2019-07-15
PCT/EP2020/068377 WO2021008868A1 (en) 2019-07-15 2020-06-30 A method for interfacing with hardware accelerators

Publications (2)

Publication Number Publication Date
CN114127689A CN114127689A (zh) 2022-03-01
CN114127689B true CN114127689B (zh) 2025-06-06

Family

ID=71409414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080051285.5A Active CN114127689B (zh) 2019-07-15 2020-06-30 用于与硬件加速器接口的方法

Country Status (5)

Country Link
US (1) US11250107B2 (https=)
EP (1) EP3999957B1 (https=)
JP (1) JP7361192B2 (https=)
CN (1) CN114127689B (https=)
WO (1) WO2021008868A1 (https=)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3933709A1 (en) * 2020-06-30 2022-01-05 Upstride Graph processing method and system
EP4285286A1 (en) * 2021-02-01 2023-12-06 Microsoft Technology Licensing, LLC Semi-programmable and reconfigurable co-accelerator for a deep neural network with normalization or non-linearity
CN115904696A (zh) * 2021-09-30 2023-04-04 想象技术有限公司 用于配置具有可配置流水线的神经网络加速器的方法和设备
TR2021020689A2 (tr) * 2021-12-22 2023-07-21 Havelsan Hava Elektronik Sanayi Ve Ticaret Anonim Sirketi Gömülü ve bütünleşi̇k si̇stemlerde paralel yapay si̇ni̇r ağlari i̇le topluluk öğrenmesi̇
US20240161222A1 (en) * 2022-11-16 2024-05-16 Nvidia Corporation Application programming interface to indicate image-to-column transformation
US12455900B1 (en) 2023-03-07 2025-10-28 QEngine LLC Method for executing a query in a multi-dimensional data space using vectorization and a related system
KR102740239B1 (ko) * 2023-03-24 2024-12-10 한국과학기술원 다중 신경망 가속을 위한 확장가능 벡터-어레이 이종 가속기 구조 및 스케쥴링 기법

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6081890A (en) * 1998-11-30 2000-06-27 Intel Corporation Method of communication between firmware written for different instruction set architectures
US7219085B2 (en) * 2003-12-09 2007-05-15 Microsoft Corporation System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US8250578B2 (en) 2008-02-22 2012-08-21 International Business Machines Corporation Pipelining hardware accelerators to computer systems
US7984267B2 (en) * 2008-09-04 2011-07-19 International Business Machines Corporation Message passing module in hybrid computing system starting and sending operation information to service program for accelerator to execute application program
US8751556B2 (en) 2010-06-11 2014-06-10 Massachusetts Institute Of Technology Processor for large graph algorithm computations and matrix operations
US8752064B2 (en) * 2010-12-14 2014-06-10 Advanced Micro Devices, Inc. Optimizing communication of system call requests
JP2012256150A (ja) * 2011-06-08 2012-12-27 Renesas Electronics Corp コンパイル装置、コンパイル方法、及びプログラム
US9411853B1 (en) 2012-08-03 2016-08-09 Healthstudio, LLC In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability
US9471388B2 (en) * 2013-03-14 2016-10-18 Altera Corporation Mapping network applications to a hybrid programmable many-core device
US20150006341A1 (en) * 2013-06-27 2015-01-01 Metratech Corp. Billing transaction scheduling
US10540588B2 (en) 2015-06-29 2020-01-21 Microsoft Technology Licensing, Llc Deep neural network processing on hardware accelerators with stacked memory
JP2018173672A (ja) * 2015-09-03 2018-11-08 株式会社Preferred Networks 実装装置
JP6658033B2 (ja) * 2016-02-05 2020-03-04 富士通株式会社 演算処理回路、および情報処理装置
US9646243B1 (en) 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
JP6724869B2 (ja) * 2017-06-19 2020-07-15 株式会社デンソー 多層ニューラルネットワークのニューロンの出力レベル調整方法
US11620490B2 (en) * 2017-10-17 2023-04-04 Xilinx, Inc. Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions
US10698766B2 (en) * 2018-04-18 2020-06-30 EMC IP Holding Company LLC Optimization of checkpoint operations for deep learning computing
CN108876702A (zh) * 2018-06-21 2018-11-23 北京邮电大学 一种加速分布式深度神经网络的训练方法及装置
US10620951B2 (en) * 2018-06-22 2020-04-14 Intel Corporation Matrix multiplication acceleration of sparse matrices using column folding and squeezing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Mixed-precision_architecture_based_on_computational_memory_for_training_deep_neural_networks;S. R. Nandakumar;《2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS(ISCAS)》;20180426;正文摘要、第II-III部分 *

Also Published As

Publication number Publication date
JP2022541144A (ja) 2022-09-22
WO2021008868A1 (en) 2021-01-21
US20210019362A1 (en) 2021-01-21
JP7361192B2 (ja) 2023-10-13
US11250107B2 (en) 2022-02-15
EP3999957A1 (en) 2022-05-25
CN114127689A (zh) 2022-03-01
EP3999957B1 (en) 2025-09-24

Similar Documents

Publication Publication Date Title
CN114127689B (zh) 用于与硬件加速器接口的方法
US10691996B2 (en) Hardware accelerator for compressed LSTM
US10839292B2 (en) Accelerated neural network training using a pipelined resistive processing unit architecture
US10553207B2 (en) Systems and methods for employing predication in computational models
US20230297819A1 (en) Processor array for processing sparse binary neural networks
EP3298547A1 (en) Batch processing in a neural network processor
KR102396447B1 (ko) 파이프라인 구조를 가지는 인공신경망용 연산 가속 장치
JP2022541144A5 (https=)
US12530170B2 (en) Vector operation acceleration with convolution computation unit
US11586895B1 (en) Recursive neural network using random access memory
TWI842584B (zh) 電腦實施方法及電腦可讀儲存媒體
WO2024187039A1 (en) Implementing and training computational efficient neural network architectures utilizing layer-skip logic
EP4202774A1 (en) Runtime predictors for neural network computation reduction
EP4612616A1 (en) Hardware-aware generation of machine learning models
US11734225B2 (en) Matrix tiling to accelerate computing in redundant matrices
US20210142153A1 (en) Resistive processing unit scalable execution
US20260056710A1 (en) Quantization and Low Precision AI Processor
US20250130771A1 (en) In-memory processing based on multiple weight sets
KR20240096459A (ko) 임베딩 그룹-당 활성화 양자화
JP2024110210A (ja) リザーバ計算機、及び設備状態検知システム
WO2023129491A1 (en) Compute element processing using control word templates
CN110929846A (zh) 一种多层感知机深度神经网络层间流水处理方法
Khamitov et al. Optimization of neural networks training with vector-free heuristic on Apache Spark
JPH02287862A (ja) ニューラルネットワーク演算装置

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant