TWI876016B - 用於處理資料串流修改以減少在並行處理期間的功率效應的方法及處理系統 - Google Patents

用於處理資料串流修改以減少在並行處理期間的功率效應的方法及處理系統 Download PDF

Info

Publication number
TWI876016B
TWI876016B TW110111550A TW110111550A TWI876016B TW I876016 B TWI876016 B TW I876016B TW 110111550 A TW110111550 A TW 110111550A TW 110111550 A TW110111550 A TW 110111550A TW I876016 B TWI876016 B TW I876016B
Authority
TW
Taiwan
Prior art keywords
data
processing
blocks
density
sub
Prior art date
Application number
TW110111550A
Other languages
English (en)
Chinese (zh)
Other versions
TW202143032A (zh
Inventor
熙俊 朴
理查格拉德 霍夫曼
Original Assignee
美商高通公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 美商高通公司 filed Critical 美商高通公司
Publication of TW202143032A publication Critical patent/TW202143032A/zh
Application granted granted Critical
Publication of TWI876016B publication Critical patent/TWI876016B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Power Sources (AREA)
  • Complex Calculations (AREA)
TW110111550A 2020-03-30 2021-03-30 用於處理資料串流修改以減少在並行處理期間的功率效應的方法及處理系統 TWI876016B (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/834,986 US11507423B2 (en) 2020-03-30 2020-03-30 Processing data stream modification to reduce power effects during parallel processing
US16/834,986 2020-03-30

Publications (2)

Publication Number Publication Date
TW202143032A TW202143032A (zh) 2021-11-16
TWI876016B true TWI876016B (zh) 2025-03-11

Family

ID=75640050

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110111550A TWI876016B (zh) 2020-03-30 2021-03-30 用於處理資料串流修改以減少在並行處理期間的功率效應的方法及處理系統

Country Status (9)

Country Link
US (2) US11507423B2 (https=)
EP (1) EP4127928A1 (https=)
JP (2) JP7750847B2 (https=)
KR (1) KR20220154698A (https=)
CN (2) CN115315688B (https=)
BR (1) BR112022018896A2 (https=)
PH (1) PH12022551885A1 (https=)
TW (1) TWI876016B (https=)
WO (1) WO2021203125A1 (https=)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11513799B2 (en) * 2019-11-04 2022-11-29 Apple Inc. Chained buffers in neural network processor
US11507423B2 (en) 2020-03-30 2022-11-22 Qualcomm Incorporated Processing data stream modification to reduce power effects during parallel processing
US11972348B2 (en) 2020-10-30 2024-04-30 Apple Inc. Texture unit circuit in neural network processor
US12277494B2 (en) 2020-11-19 2025-04-15 Apple Inc. Multi-dimensional tensor support extension in neural network processor
JP7806459B2 (ja) * 2021-11-22 2026-01-27 富士通株式会社 制御プログラム、情報処理装置および制御方法
US20240152407A1 (en) * 2022-11-04 2024-05-09 Nvidia Corporation Generating sparse neural networks
TWI845081B (zh) * 2022-12-21 2024-06-11 國立成功大學 圖形處理器
US12236241B2 (en) * 2023-02-24 2025-02-25 Arm Limited Data processing apparatus with selectively delayed transmission of operands
CN120234822B (zh) * 2025-05-30 2025-08-08 苏州元脑智能科技有限公司 数据处理方法、电子设备、存储介质及产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020163670A1 (en) * 2001-03-30 2002-11-07 Masayuki Takahira Image processing method and apparatus, and recording medium
TW201531942A (zh) * 2013-12-10 2015-08-16 Advanced Risc Mach Ltd 用於資料處理裝置之可配置執行緒排序
TWI518589B (zh) * 2011-12-22 2016-01-21 英特爾公司 用以執行groestl雜湊法之指令
US10227644B2 (en) * 2000-02-18 2019-03-12 The Board Of Trustees Of The Leland Stanford Junior University Apparatus and methods for parallel processing of microvolume liquid reactions

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128416A (en) * 1993-09-10 2000-10-03 Olympus Optical Co., Ltd. Image composing technique for optimally composing a single image from a plurality of digital images
US6775417B2 (en) * 1997-10-02 2004-08-10 S3 Graphics Co., Ltd. Fixed-rate block-based image compression with inferred pixel values
JP3392798B2 (ja) * 2000-02-22 2003-03-31 理想科学工業株式会社 画像属性判別方法および装置
US6788302B1 (en) 2000-08-03 2004-09-07 International Business Machines Corporation Partitioning and load balancing graphical shape data for parallel applications
US6934760B1 (en) 2001-02-04 2005-08-23 Cisco Technology, Inc. Method and apparatus for resequencing of packets into an original ordering using multiple resequencing components
US20070019778A1 (en) * 2005-07-22 2007-01-25 Clouse Melvin E Voxel histogram analysis for measurement of plaque
JP2010111099A (ja) * 2008-11-10 2010-05-20 Canon Inc 画像処理装置およびその制御方法
JPWO2010073922A1 (ja) * 2008-12-25 2012-06-14 日本電気株式会社 誤り訂正符号化装置、復号装置、符号化方法、復号方法、及びそのプログラム
US8521782B2 (en) 2011-07-20 2013-08-27 Salesforce.Com, Inc. Methods and systems for processing large graphs using density-based processes using map-reduce
JP6089949B2 (ja) * 2013-05-14 2017-03-08 株式会社リコー Simd型プロセッサ
US9721204B2 (en) * 2013-10-28 2017-08-01 Qualcomm Incorporated Evaluation of a system including separable sub-systems over a multidimensional range
KR101599133B1 (ko) * 2014-06-09 2016-03-15 주식회사 엔지스테크널러지 네비게이션 장치의 지도 데이터 제공 방법 및 시스템
JP6523428B2 (ja) * 2015-03-04 2019-05-29 オリンパス株式会社 画像処理装置
US9774508B1 (en) 2015-12-02 2017-09-26 Color Genomics, Inc. Communication generation using sparse indicators and sensor data
EP4557209A3 (en) * 2015-06-10 2025-08-13 Mobileye Vision Technologies Ltd. Image processor and methods for processing an image
JP6832155B2 (ja) * 2016-12-28 2021-02-24 ソニーセミコンダクタソリューションズ株式会社 画像処理装置、画像処理方法、及び画像処理システム
US10672175B2 (en) * 2017-04-17 2020-06-02 Intel Corporation Order independent asynchronous compute and streaming for graphics
EP3392804A1 (en) * 2017-04-18 2018-10-24 Koninklijke Philips N.V. Device and method for modelling a composition of an object of interest
US10943171B2 (en) 2017-09-01 2021-03-09 Facebook, Inc. Sparse neural network training optimization
US10803096B2 (en) * 2017-09-28 2020-10-13 Here Global B.V. Parallelized clustering of geospatial data
US11561833B1 (en) * 2018-06-28 2023-01-24 Amazon Technologies, Inc. Allocation and placement of resources for network computation
JP2022500755A (ja) * 2018-09-11 2022-01-04 ホアウェイ・テクノロジーズ・カンパニー・リミテッド 順次計算dagのための異種スケジューリング
US11288436B2 (en) * 2020-01-30 2022-03-29 Taiwan Semiconductor Manufacturing Co., Ltd. Method of analyzing and detecting critical cells
US11507423B2 (en) 2020-03-30 2022-11-22 Qualcomm Incorporated Processing data stream modification to reduce power effects during parallel processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10227644B2 (en) * 2000-02-18 2019-03-12 The Board Of Trustees Of The Leland Stanford Junior University Apparatus and methods for parallel processing of microvolume liquid reactions
US20020163670A1 (en) * 2001-03-30 2002-11-07 Masayuki Takahira Image processing method and apparatus, and recording medium
TWI518589B (zh) * 2011-12-22 2016-01-21 英特爾公司 用以執行groestl雜湊法之指令
TW201531942A (zh) * 2013-12-10 2015-08-16 Advanced Risc Mach Ltd 用於資料處理裝置之可配置執行緒排序

Also Published As

Publication number Publication date
EP4127928A1 (en) 2023-02-08
US20230078991A1 (en) 2023-03-16
CN115315688B (zh) 2025-12-23
WO2021203125A1 (en) 2021-10-07
KR20220154698A (ko) 2022-11-22
CN115315688A (zh) 2022-11-08
CN121807567A (zh) 2026-04-07
JP7750847B2 (ja) 2025-10-07
US11983567B2 (en) 2024-05-14
JP2026015712A (ja) 2026-01-30
BR112022018896A2 (pt) 2022-11-08
TW202143032A (zh) 2021-11-16
US20210303359A1 (en) 2021-09-30
US11507423B2 (en) 2022-11-22
PH12022551885A1 (en) 2023-11-20
JP2023519665A (ja) 2023-05-12

Similar Documents

Publication Publication Date Title
TWI876016B (zh) 用於處理資料串流修改以減少在並行處理期間的功率效應的方法及處理系統
Lu et al. Optimizing depthwise separable convolution operations on gpus
Houtgast et al. Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths
KR101959376B1 (ko) 멀티 코어 최적화된 순환 신경망을 위한 시스템 및 방법
Stuart et al. Multi-GPU MapReduce on GPU clusters
US20220012578A1 (en) Methods, apparatus, and articles of manufacture to increase utilization of neural network (nn) accelerator circuitry for shallow layers of an nn by reformatting one or more tensors
Chen et al. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment
JP7834166B2 (ja) ハードウェアアクセラレータ最適化グループ畳み込みベースのニューラルネットワークモデル
Zenker et al. Performance-portable many-core plasma simulations: Porting picongpu to openpower and beyond
CN114428936B (zh) 针对矩阵-矩阵乘法分配处理线程的装置、方法和介质
Lee et al. Accelerated block-sparsity-aware matrix reordering for leveraging tensor cores in sparse matrix-multivector multiplication
Jiang et al. Exploiting potential of deep neural networks by layer-wise fine-grained parallelism
Jeon et al. Tinymem: Boosting multi-dnn inference on tiny ai accelerators with weight memory virtualization
Mpakos et al. Open-source SpMV multiplication hardware accelerator for FPGA-based HPC systems
Khan et al. Optimizing the matrix multiplication using strassen and winograd algorithms with limited recursions on many-core
TWI889583B (zh) 減少硬體加速器中之記憶體庫衝突
Tran et al. Exploring means to enhance the efficiency of GPU bitmap index query processing
Lavin On the Efficiency of Convolutional Neural Networks
Bédorf et al. Sapporo2: a versatile direct N-body library
Klein et al. Tridigpu: a GPU library for block tridiagonal and banded linear equation systems
Cao et al. Lssm-spmm: A long-row splitting and short-row merging approach for parallel spmm on pezy-sc3s
US20260050650A1 (en) Fast Matrix Multiplication Methods and Systems
Qi et al. A dynamic parameter tuning method for high performance SpMM
US20260023687A1 (en) Efficient data processing
Sedigh Baroughi et al. HiSpMM: High Performance High Bandwidth Sparse-Dense Matrix Multiplication on HBM-equipped FPGAs