WO2001055917A1 - Improved apparatus and method for multi-threaded signal processing - Google Patents

Improved apparatus and method for multi-threaded signal processing Download PDF

Info

Publication number
WO2001055917A1
WO2001055917A1 PCT/US2001/002982 US0102982W WO0155917A1 WO 2001055917 A1 WO2001055917 A1 WO 2001055917A1 US 0102982 W US0102982 W US 0102982W WO 0155917 A1 WO0155917 A1 WO 0155917A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
processing
kernel
operations
design
Prior art date
Application number
PCT/US2001/002982
Other languages
English (en)
French (fr)
Inventor
Ravi Subramanian
Keith Rieken
Original Assignee
Morphics Technology Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Morphics Technology Inc. filed Critical Morphics Technology Inc.
Priority to AU2001233119A priority Critical patent/AU2001233119A1/en
Priority to DE10195202T priority patent/DE10195202T1/de
Priority to KR1020027009711A priority patent/KR100784412B1/ko
Priority to GB0217126A priority patent/GB2374701B/en
Priority to JP2001555391A priority patent/JP2003521072A/ja
Publication of WO2001055917A1 publication Critical patent/WO2001055917A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design

Definitions

  • Field of Invention Invention relates to electronic data and signal processing, particularly to high-
  • processors and instruction-set architectures that allow for the exploitation of hardware parallelism and software concurrency.
  • High-performance is typically defined as the ability to execute a very large number of operations per second. This figure of merit is strongly dependent on the
  • Instruction-set architecture refers to the actual programmer-visible sets of
  • Instruction-level parallelism this approach, which exploits parallelism in hardware, provides for parallel threads of processing via the use of a very long or vectorized instruction word, whose fields can be
  • multi-processor systems may employ multi-threaded
  • Multi -threading generally is a known
  • design or functional definition, algorithm, electronic signal, or data file is provided initially to include one or more multi-threaded representation.
  • Such initial prototype is provided initially to include one or more multi-threaded representation.
  • Such element is built by providing a datapath, whose structure and configurability is determined via profiling, a sequencer/finite-state-machine, whose structure and configurability is determined via
  • profiling and local memory, whose structure is determined via profiling memory
  • kernel elements are implemented entirely in software or programmable logic, or combination thereof.
  • term “profiling” refers generally to
  • FIG. 1 is a general methodology and tool architecture diagram for
  • FIGs. 2A-B are functional block diagrams for implementing one aspect of the
  • FIG. 3 is a representative functional diagram illustrating heterogeneous aspect
  • FIG. 4 is a representative functional diagram illustrating reconfigurable aspect
  • FIG. 5 is a representative functional diagram illustrating kernel aspect of the present invention.
  • FIG. 6 is a representative functional diagram illustrating interface aspect of the present invention.
  • FIG. 7 is a system methodology flow chart showing functional operations for implementing one or more aspects of the present invention.
  • FIG. 8 is representative of software code stubs for implementing one or more aspects of the present invention.
  • FIG. 9N-B are representative functional diagrams of one or more applications
  • multi-threaded prototype may be used or otherwise be implemented in fixed, parameterizable, programmable, or configurable logic unit or
  • multi-thread algorithms specific sequences of operations, patterns of memory accesses, or segments, each thread being profiled or characterized to optimize operation or implementation using fixed, parameterizable, programmable, or
  • datapath structure is configured into single or multi-thread
  • profiling terminology is understood to refer generally to any
  • profiling is accomplished according to one or more previously and/or
  • the generated symbolic representation may identify certain threads associated with the
  • Each thread may be profiled for processing by corresponding kernel
  • thread may further be mapped to identify the sequence, or scheduling information, for
  • processing architecture may substantially include a set of kernel elements, such that
  • one kernel element processes certain function represented by corresponding thread
  • each thread may be profiled separately or hierarchically for
  • group kernel element and a second-level or group kernel element are associated with a corresponding first thread and second thread in a given function or
  • front-end processing e.g., data
  • chip-rate processing e.g., sample epoch
  • channel element processing e.g., alignment/deskewing, combiner, soft decision computer, interpath interference equalizer, receive antenna diversity
  • interleaving e.g., deinterleaver controller
  • channel coding e.g.,
  • turbo decoder convolutional decoder, etc.
  • kernel elements i.e., as determined by profiling technique as described further
  • FIG. 1 is a general architecture or system block diagram showing top-level
  • present design methodology serves to provide a tool architecture and processor implementation and architecture, or data file representative thereof, for enabling
  • system architecture such as network implementation.
  • netlist, or high-level description language (such as C or HDL) defining one or more functional modules or algorithms 12 is provided manually or computed automatically.
  • profiling and mapping scheme 14 is processed or applied to primitives 16 and
  • mapping 14 provides scheduling data for schedule operation tables 20.
  • kernels 18 are processed and interconnected for implementation 22, for
  • FIGs. 2A-B functional block diagrams show representative set of kernels 18,
  • one or more kernel 18 is associated with or corresponds to profiled and mapped thread, and is implemented reconfigurably using sequencer 32, datapath 34,
  • multi-threaded representation thereof which may be profiled effectively for parallel processing using one or more corresponding kernel logic elements (e.g., according to
  • FIG. 3 functional diagram shows representative heterogeneous, reconfigurable
  • kernel 8 may implement
  • kernel 6 may implement "large” granularity
  • kernel may be implemented or dynamically reconfigured according to design requirement or profile mapping preference.
  • FIG. 4 functional diagram shows one or more
  • kernels such as reconfigurable logic or programmable function units
  • PFU 40 having programmable logic elements and switch matrix (e.g., for encoding bit-level operations), reconfigurable datapaths 42 having multiplexers, registers, adders, buffers, etc. and configurable signal flow through these elements (e.g., for
  • reconfigurable control 46 having data memory, datapath, program memory, instruction decoder and controller, etc. (e.g., for real-time operating system process
  • FIG. 5 functional diagram shows preferred functional elements for implementing kernel 18, including data sequencer 32, data memory 36, and parameterizable configurable
  • ALU arithmetic logic unit
  • FIG. 6 is a representative functional diagram illustrating optional interface
  • DRL dynamically reconfigurable logic
  • DRL process is heterogeneous and reconfigurable
  • hardware interfaces 54 couples
  • processor element 52 associated with library 62 and specified functional modules 60, including processor software model 57 having C-program model 56 and input/output
  • information e.g., signal or data representation
  • general system design e.g., signal or data representation
  • processor model 50 for functional cooperation or emulated real-time signal
  • FIG. 7 flow chart shows another aspect of present operational steps. Initially, user-generated or computer-generated functions are defined 70 for prototype or other
  • one or more mathematical analysis or design performance optimization scheme may be applied 72 to initial design definition.
  • one or more mathematical analysis or design performance optimization scheme may be applied 72 to initial design definition.
  • constituent algorithms for design definition is provided 74, and representation of such algorithms is thereby coded 76, preferably in high-level, register transfer, or
  • Algorithms may be profiled and mapped 78, or otherwise functionally defined
  • communications semaphores 84 also are provided for communications semaphores 84 and scheduling and finite state
  • FIG. 8 shows
  • profiling processing or
  • reconfigurable algorithms representative thereof is temporal, thereby including
  • temporal application includes changes in receiver algorithms required in a cellular
  • processing throughput requirements in one path may increase or decrease as processing progresses (e.g., from antenna to final retrieved data representation,) present profiling scheme serves to determine hardware-
  • dependent changes in receive path of wireless receiver may need to change at startup for global reconfiguration between transaction configuration (e.g.,
  • implementation may be selected, such as for processing data at highest data rate
  • programmable interconnect For a datapath which may need to be selected at configuration time, but is not changed often, then programmable interconnect may be
  • multiplexing structure may apply. Also, for control functions where operation
  • parameterized kernels for processing operations may apply.
  • FIG. 9A shows general aspects of applying present invention, including flow
  • API API
  • Preferred implementation receives configuration parameters through API 94 to define or implement one or more interconnected block modules 96, representing
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • DRL or other functional block
  • kernel elements 98 are integers.
  • configurable parameters 100 may be defined or implemented to correspond in
  • design and implementation method or system serves to
  • prototype function is thus profiled for parallel processing by one or more thread, for
  • FIG. 9B portable mobile radio handsets 102 transmit and receive signals wirelessly
  • base station 104 possibly coupled to other handsets 102 and base stations 104
  • kernel elements may be configured for operation in base station 104 and/or handset units 102.
  • kernels may be configured for profiled

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Logic Circuits (AREA)
  • Stored Programmes (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Advance Control (AREA)
PCT/US2001/002982 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing WO2001055917A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
AU2001233119A AU2001233119A1 (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing
DE10195202T DE10195202T1 (de) 2000-01-27 2001-01-29 Verfahren und Vorrichtung zur mehrfachverzweigten Signalverarbeitung
KR1020027009711A KR100784412B1 (ko) 2000-01-27 2001-01-29 개선된 멀티-스레드 신호처리 방법 및 장치
GB0217126A GB2374701B (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal procesing
JP2001555391A JP2003521072A (ja) 2000-01-27 2001-01-29 マルチスレッド信号処理のための改良された装置及び方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US49263400A 2000-01-27 2000-01-27
US09/492,634 2000-01-27

Publications (1)

Publication Number Publication Date
WO2001055917A1 true WO2001055917A1 (en) 2001-08-02

Family

ID=23956997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/002982 WO2001055917A1 (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing

Country Status (6)

Country Link
JP (1) JP2003521072A (ko)
KR (1) KR100784412B1 (ko)
AU (1) AU2001233119A1 (ko)
DE (1) DE10195202T1 (ko)
GB (1) GB2374701B (ko)
WO (1) WO2001055917A1 (ko)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839889B2 (en) 2000-03-01 2005-01-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing xDSL communications
US7693257B2 (en) 2006-06-29 2010-04-06 Accuray Incorporated Treatment delivery optimization
USRE44365E1 (en) 1997-02-08 2013-07-09 Martin Vorbach Method of self-synchronization of configurable elements of a programmable module
US8869121B2 (en) 2001-08-16 2014-10-21 Pact Xpp Technologies Ag Method for the translation of programs for reconfigurable architectures
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US9037807B2 (en) 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US9047440B2 (en) 2000-10-06 2015-06-02 Pact Xpp Technologies Ag Logical cell array and bus system
US9075605B2 (en) 2001-03-05 2015-07-07 Pact Xpp Technologies Ag Methods and devices for treating and processing data

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7657893B2 (en) * 2003-04-23 2010-02-02 International Business Machines Corporation Accounting method and logic for determining per-thread processor resource utilization in a simultaneous multi-threaded (SMT) processor
CN107193539B (zh) * 2016-03-14 2020-11-24 北京京东尚科信息技术有限公司 多线程并发处理方法和多线程并发处理系统
US11288072B2 (en) * 2019-09-11 2022-03-29 Ceremorphic, Inc. Multi-threaded processor with thread granularity

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821220A (en) * 1986-07-25 1989-04-11 Tektronix, Inc. System for animating program operation and displaying time-based relationships
US5519867A (en) * 1993-07-19 1996-05-21 Taligent, Inc. Object-oriented multitasking system
US5537226A (en) * 1994-11-22 1996-07-16 Xerox Corporation Method for restoring images scanned in the presence of vibration
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US6112020A (en) * 1996-10-31 2000-08-29 Altera Corporation Apparatus and method for generating configuration and test files for programmable logic devices

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5946487A (en) * 1996-06-10 1999-08-31 Lsi Logic Corporation Object-oriented multi-media architecture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821220A (en) * 1986-07-25 1989-04-11 Tektronix, Inc. System for animating program operation and displaying time-based relationships
US5519867A (en) * 1993-07-19 1996-05-21 Taligent, Inc. Object-oriented multitasking system
US5537226A (en) * 1994-11-22 1996-07-16 Xerox Corporation Method for restoring images scanned in the presence of vibration
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US6112020A (en) * 1996-10-31 2000-08-29 Altera Corporation Apparatus and method for generating configuration and test files for programmable logic devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BERNARD K. GUNTHER: "Multithreading with Distributed Functional Units", IEEE TRANSACTIONS ON COMPUTERS, vol. 46, no. 4, April 1997 (1997-04-01), pages 399 - 411, XP002939210 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE44365E1 (en) 1997-02-08 2013-07-09 Martin Vorbach Method of self-synchronization of configurable elements of a programmable module
USRE45109E1 (en) 1997-02-08 2014-09-02 Pact Xpp Technologies Ag Method of self-synchronization of configurable elements of a programmable module
USRE45223E1 (en) 1997-02-08 2014-10-28 Pact Xpp Technologies Ag Method of self-synchronization of configurable elements of a programmable module
US6839889B2 (en) 2000-03-01 2005-01-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing xDSL communications
US6965960B2 (en) 2000-03-01 2005-11-15 Realtek Semiconductor Corporation xDSL symbol processor and method of operating same
US9047440B2 (en) 2000-10-06 2015-06-02 Pact Xpp Technologies Ag Logical cell array and bus system
US9037807B2 (en) 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US9075605B2 (en) 2001-03-05 2015-07-07 Pact Xpp Technologies Ag Methods and devices for treating and processing data
US8869121B2 (en) 2001-08-16 2014-10-21 Pact Xpp Technologies Ag Method for the translation of programs for reconfigurable architectures
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US7693257B2 (en) 2006-06-29 2010-04-06 Accuray Incorporated Treatment delivery optimization

Also Published As

Publication number Publication date
GB0217126D0 (en) 2002-09-04
AU2001233119A1 (en) 2001-08-07
DE10195202T1 (de) 2003-04-30
KR20030004327A (ko) 2003-01-14
KR100784412B1 (ko) 2007-12-11
GB2374701A (en) 2002-10-23
JP2003521072A (ja) 2003-07-08
GB2374701B (en) 2004-12-15

Similar Documents

Publication Publication Date Title
Chen et al. Using dataflow to optimize energy efficiency of deep neural network accelerators
KR100358631B1 (ko) 애플리케이션특정프로세서및그설계방법
Master The next big leap in reconfigurable systems
Wolf et al. Multiprocessor system-on-chip (MPSoC) technology
US5867400A (en) Application specific processor and design method for same
US20060026578A1 (en) Programmable processor architecture hirarchical compilation
EP1953649B1 (en) Reconfigurable integrated circuit
Smit et al. Dynamic reconfiguration in mobile systems
US20150261723A1 (en) Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
CN1653446A (zh) 具有可配置执行单元的高性能混合处理器
WO2001055917A1 (en) Improved apparatus and method for multi-threaded signal processing
Bondalapati et al. Reconfigurable computing: Architectures, models and algorithms
Niu et al. Automating elimination of idle functions by runtime reconfiguration
Pillement et al. DART: a functional-level reconfigurable architecture for high energy efficiency
Yousuf et al. An automated hardware/software co-design flow for partially reconfigurable FPGAs
David et al. Energy-Efficient Reconfigurable Processsors
Chai et al. Streaming processors for next-generation mobile imaging applications
Heysters et al. A reconfigurable function array architecture for 3G and 4G wireless terminals
Ueda et al. Architecture-level performance estimation method based on system-level profiling
David et al. A compilation framework for a dynamically reconfigurable architecture
Chen et al. Flexible heterogeneous multicore architectures for versatile media processing via customized long instruction words
Guo et al. Rapid scheduling of efficient VLSI architectures for next-generation HSDPA wireless system using Precision C synthesizer
Tiensyrjä et al. Systemc and ocapi-xl based system-level design for reconfigurable systems-on-chip
Bossuet et al. Targeting tiled architectures in design exploration
Galanis et al. A partitioning methodology for accelerating applications in hybrid reconfigurable platforms

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 200217126

Country of ref document: GB

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020027009711

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2001 555391

Country of ref document: JP

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 1020027009711

Country of ref document: KR

122 Ep: pct application non-entry in european phase
RET De translation (de og part 6b)

Ref document number: 10195202

Country of ref document: DE

Date of ref document: 20030430

Kind code of ref document: P

WWE Wipo information: entry into national phase

Ref document number: 10195202

Country of ref document: DE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8607