KR20160116286A

KR20160116286A - Placement method of parallel multiplier

Info

Publication number: KR20160116286A
Application number: KR1020150118175A
Authority: KR
Inventors: 배성민; 김형옥
Original assignee: 삼성전자주식회사
Priority date: 2015-03-25
Filing date: 2015-08-21
Publication date: 2016-10-07
Also published as: KR102318741B1

Abstract

The present invention relates to a method of placing parallel multipliers, capable of optimizing power, performance, and a space. According to the present invention, the method of placing the parallel multipliers using an arrangement-routing tool run on a computer includes the steps of: receiving a datapath netlist of the parallel multipliers; extracting locations of primary input and output cells from the datapath netlist using a structure analysis module; mapping the primary input and output cells onto a predetermined array using the arrangement-routing tool; and aligning columns of the primary input cells and the primary output cells based on physical sizes of the primary input cells using the arrangement-routing tool. The size of the specific array is determined based on the number of the primary input cells.

Description

{PLACEMENT METHOD OF PARALLEL MULTIPLIER}

본 발명은 로직 회로의 배치 방법에 관한 것으로, 좀 더 구체적으로는 병렬 곱셈기의 배치 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of arranging logic circuits, and more particularly, to a method of arranging parallel multipliers.

최근 시스템 온 칩은 짧은 설계 일정으로 인하여 고성능 마이크로프로세서와 같이 데이터패스 로직(datapath logic)이 많이 사용되는 설계에도 자동화된 P&R(Place and Routing) 기법을 통해 제작된다. 하지만, 일반적으로 와이어 길이를 최소화하는 배치 알고리즘은 데이터패스의 구조적 특징을 고려할 수 없기 때문에 전력, 성능 및 공간적으로 최적화된 배치 결과를 얻기가 어렵다. 이러한 알고리즘적인 문제를 피하기 위해 일반적으로 구조적인 최적화 작업은 메뉴얼(manual)로 이루어진다. 따라서, 구조적으로 최적화된 결과를 얻을 수는 있지만, 이러한 메뉴얼(manual) 배치 방법은 상당한 시간을 요구하게 된다.Recently, the system-on-chip is fabricated through an automated place and route (P & R) technique for designs where datapath logic is often used, such as high performance microprocessors due to its short design schedule. However, it is difficult to obtain power, performance, and spatially optimized placement results because placement algorithms that generally minimize wire length can not account for the structural features of the data path. In order to avoid this algorithmic problem, the structural optimization work is usually done in manual. Thus, although a structurally optimized result can be obtained, this manual layout method requires considerable time.

최근 멀티미디어 기능이 강화된 시스템 온 칩은 병렬 곱셈기를 다수 포함하고 있다. 병렬 곱셈기는 피승수 및 승수를 입력받아 병렬적으로 곱셈 연산을 수행한다. 이러한 병렬 곱셈기는 구조적으로 최적화된 배치를 자동으로 수행 가능한 부분을 포함하고 있다.Recently, system - on - chip with enhanced multimedia function has many parallel multipliers. The parallel multiplier receives the multiplicand and multiplier and performs a multiplication operation in parallel. Such a parallel multiplier includes a portion capable of automatically performing a structurally optimized arrangement.

본 발명의 목적은 병렬 곱셈기에 포함된 부분 곱 생성기 및 최종 곱셈기를 전력, 성능 및 공간적으로 최적화 되도록 구조를 고려하여 배치하는 병렬 곱셈기의 배치 방법을 제공하는 데 있다.It is an object of the present invention to provide a method of arranging a parallel multiplier in which a partial multiplication generator and a final multiplier included in a parallel multiplier are arranged in consideration of a structure so as to optimize power, performance and space.

본 발명에 따른 컴퓨터에서 구동되는 배치-라우팅 툴을 사용한 병렬 곱셈기의 배치 방법은, 상기 병렬 곱셈기에 대한 데이터패스 넷리스트를 수신하는 단계, 구조 분석 모듈을 이용하여 상기 데이터패스 넷리스트로부터 초기 입력 셀들 및 초기 출력 셀들의 위치를 추출하는 단계, 상기 배치-라우팅 툴을 이용하여 상기 초기 입력 셀들 및 상기 초기 출력 셀들을 특정 어레이에 맵핑하는 단계, 그리고 상기 배치-라우팅 툴을 이용하여 상기 초기 입력 셀들의 물리적 크기에 기초하여 상기 초기 입력 셀들 및 상기 초기 출력 셀들의 각 열을 정렬하는 단계를 포함하되, 상기 특정 어레이의 크기는 상기 초기 입력 셀들의 수에 따라 결정된다.A method for arranging a parallel multiplier using a batch-routing tool driven by a computer according to the present invention comprises the steps of: receiving a data passnet list for the parallel multiplier; receiving initial data from the data passnet list And mapping the initial input cells and the initial output cells to a particular array using the placement-routing tool, and using the placement-routing tool to map the initial input cells And aligning each column of the initial input cells and the initial output cells based on a physical size, the size of the particular array being determined by the number of the initial input cells.

본 발명에 따른 컴퓨터에서 구동되는 로직 합성 툴 및 배치-라우팅 툴을 사용한 병렬 곱셈기의 배치 방법은, 상기 로직 합성 툴을 통하여 상기 병렬 곱셈기에 대한 데이터패스 넷리스트를 생성하는 단계, 상기 배치-라우팅 툴에 상기 병렬 곱셈기의 구조에 대한 정보를 입력하는 단계, 상기 병렬 곱셈기에 입력되는 피승수 및 승수를 이용하여 상기 데이터패스 넷리스트로부터 초기 입력 셀들 및 초기 출력 셀들의 위치를 추출하는 단계, 상기 병렬 곱셈기의 구조에 대한 정보을 이용하여 상기 초기 입력 셀들 및 상기 초기 출력 셀들을 특정 어레이에 맵핑하는 단계, 그리고 상기 배치-라우팅 툴을 이용하여 상기 초기 입력 셀들의 물리적 크기에 기초하여 상기 초기 입력 셀들 및 상기 초기 출력 셀들의 각 열을 정렬하는 단계를 포함하되, 상기 특정 어레이의 크기는 상기 초기 입력 셀들의 수에 따라 결정된다.A method for arranging a parallel multiplier using a computer-driven logic synthesis tool and a placement-routing tool according to the present invention includes the steps of: generating a data path netlist for the parallel multiplier through the logic synthesis tool; Extracting positions of initial input cells and initial output cells from the data path netlist using the multiplicand and multiplier input to the parallel multiplier, Mapping the initial input cells and the initial output cells to a specific array using information about the initial input cells and the initial output cells based on the physical size of the initial input cells using the placement- Comprising: arranging each column of cells, Is determined according to the number of initial input cells.

본 발명의 실시 예에 따르면, 병렬 곱셈기에 포함된 부분 곱 생성기 및 최종 곱셈기를 전력, 성능 및 공간적으로 최적화 되도록 구조를 고려하여 배치하는 병렬 곱셈기의 배치 방법을 제공할 수 있다.According to an embodiment of the present invention, it is possible to provide a method of arranging a parallel multiplier in which a partial multiplier generator and a final multiplier included in a parallel multiplier are arranged in consideration of a structure to optimize power, performance and space.

도 1은 본 발명의 실시 예에 따른 병렬 곱셈기 배치 시스템을 보여주는 블록도이다.
도 2는 본 발명의 병렬 곱셈기 배치 방법에 의해 배치되는 병렬 곱셈기를 예시적으로 보여주는 블록도이다.
도 3은 도 2의 부분 곱 생성기 및 최종 덧셈기의 배열을 보여주는 도면이다.
도 4는 본 발명의 실시 예에 따른 병렬 곱셈기 배치 방법을 보여주는 순서도이다.
도 5는 본 발명의 병렬 곱셈기 배치 시스템에 의해 추출되는 초기 입력 셀을 예시적으로 보여주는 도면이다.
도 6은 병렬 곱셈기의 부분 곱 감소 모듈에 포함된 압축 셀들을 예시적으로 보여주는 도면이다.
도 7은 본 발명의 실시 예에 따른 최소 비용 최대 흐름(Min-cost Maximum Flow, MCF) 알고리즘을 예시적으로 보여주는 도면이다.
도 8은 본 발명의 실시 예에 따른 비트-슬라이스 정렬(bit-slice alignment) 알고리즘을 예시적으로 보여주는 도면이다.
도 9는 본 발명의 실시 예에 따른 병렬 곱셈기 배치 방법에 따라 배치된 병렬 곱셈기를 포함하는 논리 회로를 예시적으로 보여주는 도면이다.1 is a block diagram illustrating a parallel multiplier placement system in accordance with an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a parallel multiplier arranged by a parallel multiplier arrangement method according to an embodiment of the present invention. Referring to FIG.
FIG. 3 is a diagram showing the arrangement of the partial product generator and the final adder of FIG. 2. FIG.
4 is a flowchart illustrating a parallel multiplier arrangement method according to an embodiment of the present invention.
5 is an exemplary diagram illustrating an initial input cell extracted by the parallel multiplier arrangement system of the present invention.
6 is an exemplary illustration of the compression cells included in the partial multiplication reduction module of the parallel multiplier.
7 is an exemplary diagram illustrating a Min-cost Maximum Flow (MCF) algorithm according to an embodiment of the present invention.
8 is an exemplary diagram illustrating a bit-slice alignment algorithm according to an embodiment of the present invention.
FIG. 9 is a diagram illustrating a logic circuit including a parallel multiplier arranged according to a parallel multiplier arrangement method according to an embodiment of the present invention. Referring to FIG.

본 발명의 이점 및 특징, 그리고 그것을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 통해 설명될 것이다. 그러나 본 발명은 여기에서 설명되는 실시 예들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 단지, 본 실시 예들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여 제공되는 것이다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and how to accomplish it, will be described with reference to the embodiments described in detail below with reference to the accompanying drawings. However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. The embodiments are provided so that those skilled in the art can easily carry out the technical idea of the present invention to those skilled in the art.

도면들에 있어서, 본 발명의 실시 예들은 도시된 특정 형태로 제한되는 것이 아니며 명확성을 기하기 위하여 과장된 것이다. 또한, 명세서 전체에 걸쳐서 동일한 참조 번호로 표시된 부분들은 동일한 구성 요소를 나타낸다.In the drawings, embodiments of the present invention are not limited to the specific forms shown and are exaggerated for clarity. In addition, like reference numerals designate like elements throughout the specification.

본 명세서에서 특정한 용어들이 사용되었으나, 이는 본 발명을 설명하기 위한 목적에서 사용된 것이며, 의미 한정이나 특허 청구 범위에 기재된 본 발명의 권리 범위를 제한하기 위하여 사용된 것은 아니다. 본 명세서에서 ‘및/또는’이란 표현은 전후에 나열된 구성요소들 중 적어도 하나를 포함하는 의미로 사용된다. 또한, ‘연결되는/결합되는’이란 표현은 다른 구성요소와 직접적으로 연결되거나 다른 구성요소를 통해 간접적으로 연결되는 것을 포함하는 의미로 사용된다. 본 명세서에서 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 또한, 명세서에서 사용되는 ‘포함한다’ 또는 ‘포함하는’으로 언급된 구성 요소, 단계, 동작 및 소자는 하나 이상의 다른 구성요소, 단계, 동작 및 소자의 존재 또는 추가를 의미한다. 이하, 도면들을 참조하여 본 발명의 실시 예에 대해 상세히 설명하기로 한다.Although specific terms are used herein, they are used for the purpose of describing the invention and are not used to limit the scope of the invention as defined in the claims or the meaning of the claims. The expression " and / or " is used herein to mean including at least one of the elements listed before and after. Also, the expression " coupled / connected " is used to mean either directly connected to another component or indirectly connected through another component. The singular forms herein include plural forms unless the context clearly dictates otherwise. Also, as used herein, "comprising" or "comprising" means to refer to the presence or addition of one or more other components, steps, operations and elements. Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 병렬 곱셈기 배치 시스템(100)을 보여주는 블록도이다. 도 1을 참조하면, 병렬 곱셈기 배치 시스템(100)은 CPU(110), 워킹 메모리(130), 입출력 장치(150), 저장 장치(170), 그리고 버스(190)를 포함할 수 있다. 여기서, 병렬 곱셈기 배치 시스템(100)은 병렬 곱셈기를 배치하기 위한 전용 장치로 제공될 수도 있지만, 다양한 배치 툴들이나 설계 툴들을 구동하기 위한 컴퓨터일 수도 있을 것이다.1 is a block diagram illustrating a parallel multiplier placement system 100 in accordance with an embodiment of the present invention. 1, a parallel multiplier placement system 100 may include a CPU 110, a working memory 130, an input / output device 150, a storage device 170, and a bus 190. Here, the parallel multiplier placement system 100 may be provided as a dedicated device for placing a parallel multiplier, but may also be a computer for driving various placement tools or design tools.

CPU(110)는 병렬 곱셈기 배치 시스템(100)에서 수행될 소프트웨어(응용 프로그램, 운영 체제, 장치 드라이버들)를 실행한다. CPU(110)는 워킹 메모리(130)에 로드되는 운영 체제(OS, 미도시됨)를 실행할 것이다. CPU(110)는 운영 체제(OS) 기반에서 구동될 다양한 응용 프로그램들(Application Program)이나 배치 툴들을 실행할 것이다. 예를 들면, CPU(110)는 워킹 메모리(130)에 로드되는 데이터패스(Datapath) 생성 툴들이나, 구조 분석 툴들, 배치/라우팅 툴들을 구동할 수 있다. 특히, 본 발명의 배치 툴로 제공되는 구조 분석 모듈(131)이 CPU(110)에 의해서 구동될 것이다. 특히, 구조 분석 모듈(131)은 병렬 곱셈기에 포함된 로직 셀들의 위치 및 구조적 특징들을 추출할 수 있다. 더불어, CPU(110)는 칩에서의 다양한 로직 셀들을 최적의 위치에 배치하기 위한 배치/라우팅 툴(P&R Tool, 132)을 구동할 수 있다.The CPU 110 executes software (application programs, operating systems, device drivers) to be executed in the parallel multiplier placement system 100. The CPU 110 will execute an operating system (OS, not shown) that is loaded into the working memory 130. The CPU 110 may execute various application programs or batch tools to be operated on an operating system (OS) basis. For example, the CPU 110 may drive datapath generation tools, structure analysis tools, placement / routing tools loaded into the working memory 130, and so on. In particular, the structure analysis module 131 provided by the placement tool of the present invention will be driven by the CPU 110. [ In particular, the structure analysis module 131 may extract the location and structural characteristics of the logic cells included in the parallel multiplier. In addition, the CPU 110 may drive a placement and routing tool (P & R Tool) 132 for placing the various logic cells in the chip in the optimal location.

워킹 메모리(130)에는 운영 체제(OS)나 응용 프로그램들(Application Program)이 로드될 것이다. 병렬 곱셈기 배치 시스템(100)의 부팅시에 저장 장치(170)에 저장된 OS 이미지(미도시됨)가 부팅 시퀀스에 의거하여 워킹 메모리(130)로 로드될 것이다. 운영 체제(OS)에 의해서 병렬 곱셈기 배치 시스템(100)의 제반 입출력 동작들이 지원될 수 있다. 마찬가지로, 사용자의 의하여 선택되거나 기본적인 서비스 제공을 위해서 응용 프로그램들이 워킹 메모리(130)에 로드될 수 있다. 특히, 본 발명의 배치 툴들(131, 132)이 워킹 메모리(130)에 로드될 수 있다.An operating system (OS) and application programs are loaded into the working memory 130. An OS image (not shown) stored in the storage device 170 at the boot time of the parallel multiplier placement system 100 will be loaded into the working memory 130 based on the boot sequence. All operations of the parallel multiplier arrangement system 100 by the operating system (OS) can be supported. Similarly, application programs may be loaded into the working memory 130 for selection by the user or provision of basic services. In particular, the placement tools 131, 132 of the present invention may be loaded into the working memory 130.

특히, 배치 툴로서 구조 분석 모듈(131)이나 배치/라우팅 툴(132)도 저장 장치(170)로부터 워킹 메모리(130)에 로드될 것이다. 도시되지 않았지만, 워킹 메모리(130)에는 병렬 곱셈기의 데이터패스 넷리스트(Datapath Netlist)를 생성하는 로직 합성(Logic Synthesis) 툴들이 더 포함될 수 있을 것이다. 워킹 메모리(130)는 SRAM(Static Random Access Memory)이나 DRAM(Dynamic Random Access Memory)과 같은 휘발성 메모리이거나, PRAM, MRAM, ReRAM, FRAM, NOR 플래시 메모리 등의 비휘발성 메모리일 수 있다.In particular, the structure analysis module 131 or the placement / routing tool 132 as a placement tool will also be loaded from the storage device 170 into the working memory 130. Although not shown, the working memory 130 may further include logic synthesis tools that generate a datapath netlist of parallel multipliers. The working memory 130 may be a volatile memory such as a static random access memory (SRAM) or a dynamic random access memory (DRAM), or a nonvolatile memory such as a PRAM, an MRAM, a ReRAM, a FRAM, and a NOR flash memory.

구조 분석 모듈(131)은 병렬 곱셈기의 구조를 분석할 수 있다. 예를 들면, 구조 분석 모듈(131)은 병렬 곱셈기의 데이터패스 넷리스트를 입력받을 수 있다. 구조 분석 모듈(131)은 데이터패스 넷리스트를 통해 병렬 곱셈기에 포함된 로직 셀들의 위치를 추정할 수 있다. 또한, 구조 분석 모듈(131)은 로직 셀들의 물리적 크기를 고려하여 병렬 곱셈기의 구조 정보를 출력할 수 있다. 배치/라우팅 툴(132)은 구조 분석 모듈(131)에 의해 추출된 로직 셀들의 위치 및 구조 정보를 이용하여 병렬 곱셈기의 로직 셀들을 최적의 위치에 배치할 수 있다.The structure analysis module 131 may analyze the structure of the parallel multiplier. For example, the structure analysis module 131 may receive a data passnet list of a parallel multiplier. The structure analysis module 131 may estimate the location of the logic cells included in the parallel multiplier through the data pathnet list. In addition, the structure analysis module 131 may output the structure information of the parallel multiplier in consideration of the physical size of the logic cells. The placement / routing tool 132 may place the logic cells of the parallel multiplier at optimal locations using the location and structure information of the logic cells extracted by the structure analysis module 131. [

입출력 장치(150)는 사용자 인터페이스 장치들로부터의 사용자 입력 및 출력을 제어한다. 예를 들면, 입출력 장치(150)는 키보드, 마우스, 터치패드와 같은 입력 장치와 모니터 등의 출력 장치를 구비하여 로직 셀들의 구조를 분석하기 위한 구조 배치 정보를 포함하는 템플릿(Templete)을 입력받을 수 있다. 구조 배치 정보는 로직 셀들의 구조적 배치를 유도할 수 있는 특정 셀들, 특정 셀의 위치 및 특정 셀들을 분석할 수 있는 알고리즘 등을 포함할 수 있다. 그리고 입출력 장치(150)는 병렬 곱셈기 배치 시스템(100)의 배치 절차나 배치 결과 등을 표시할 수 있다.The input / output device 150 controls user input and output from the user interface devices. For example, the input / output device 150 may include an input device such as a keyboard, a mouse, and a touch pad and an output device such as a monitor to receive a Template containing structure layout information for analyzing the structure of the logic cells . The structure placement information may include specific cells capable of inducing a structural arrangement of logic cells, a location of a specific cell, an algorithm capable of analyzing specific cells, and the like. The input / output device 150 may display the arrangement procedure of the parallel multiplier arrangement system 100, the layout result, and the like.

저장 장치(170)는 병렬 곱셈기 배치 시스템(100)의 저장 매체(Storage Medium)로서 제공된다. 저장 장치(170)는 응용 프로그램들(Application Program), 운영 체제 이미지(OS Image) 및 각종 데이터를 저장할 수 있다. 저장 장치(170)는 메모리 카드(MMC, eMMC, SD, MicroSD 등)나 하드디스크 드라이브(HDD)로 제공될 수도 있다. 저장 장치(170)는 대용량의 저장 능력을 가지는 낸드 플래시 메모리(NAND-type Flash memory)를 포함할 수 있다. 또는, 저장 장치(170)는 PRAM, MRAM, ReRAM, FRAM 등의 차세대 불휘발성 메모리나 NOR 플래시 메모리를 포함할 수도 있다.The storage device 170 is provided as a storage medium of the parallel multiplier placement system 100. The storage device 170 may store application programs, an OS image, and various data. The storage device 170 may be provided as a memory card (MMC, eMMC, SD, MicroSD, etc.) or a hard disk drive (HDD). The storage device 170 may include a NAND-type flash memory having a large storage capacity. Alternatively, the storage device 170 may include a next generation nonvolatile memory such as PRAM, MRAM, ReRAM, FRAM, or the like, or a NOR flash memory.

시스템 버스(190)는 병렬 곱셈기 배치 시스템(100)의 내부에서 네트워크를 제공하기 위한 인터커넥터로 제공될 것이다. 시스템 버스(190)를 통해서 CPU(110), 워킹 메모리(130), 입출력 장치(150), 그리고 저장 장치(170)가 전기적으로 연결되고 상호 데이터를 교환할 수 있다. 하지만, 시스템 버스(190)의 구성은 상술한 설명에만 국한되지 않으며, 효율적인 관리를 위한 중재 수단들을 더 포함할 수 있다. The system bus 190 may be provided as an interconnector for providing a network within the parallel multiplier placement system 100. The CPU 110, the working memory 130, the input / output device 150, and the storage device 170 are electrically connected through the system bus 190 and exchange data with each other. However, the configuration of the system bus 190 is not limited to the above description, and may further include arbitration means for efficient management.

이상의 설명에 따르면, 병렬 곱셈기 배치 시스템(100)은 입력된 데이터패스 넷리스트를 참조하여 로직 셀들의 위치 및 구조를 분석할 수 있다. 그리고 병렬 곱셈기 배치 시스템(100)은 분석된 위치 및 구조에 따라 병렬 곱셈기의 로직 셀들을 전력, 성능 및 공간을 고려하여 최적의 위치에 배치할 수 있다. 따라서, 병렬 곱셈기는 짧은 시간 안에 디자인되고, 논리 회로(Logic Circuitry) 내에 배치될 수 있다.According to the above description, the parallel multiplier placement system 100 can analyze the location and structure of logic cells by referring to the input data pathnet list. The parallel multiplier arrangement system 100 can arrange the logic cells of the parallel multiplier in an optimum position in consideration of power, performance, and space according to the analyzed position and structure. Thus, the parallel multiplier is designed in a short time and can be placed in a logic circuit.

도 2는 본 발명의 병렬 곱셈기 배치 방법에 의해 배치되는 병렬 곱셈기를 예시적으로 보여주는 블록도이다. 도 2를 참조하면, 병렬 곱셈기(200)는 부분 곱 생성기(210), 부분 곱 감소 모듈(220) 및 최종 덧셈기(230)를 포함할 수 있다. 병렬 곱셈기(200)는 피승수(Multiplicand) 및 승수(Multiplier)를 입력받아 그 둘을 곱하여 최종 곱(Final Product)을 출력한다. 병렬 곱셈기(200)는 피승수 및 승수의 곱을 병렬적으로 연산할 수 있다.FIG. 2 is a block diagram illustrating a parallel multiplier arranged by a parallel multiplier arrangement method according to an embodiment of the present invention. Referring to FIG. Referring to FIG. 2, the parallel multiplier 200 may include a partial product generator 210, a partial product reduction module 220, and a final adder 230. The parallel multiplier 200 receives a multiplicand and a multiplier, multiplies the multiplicand and multiplier, and outputs a final product. The parallel multiplier 200 can calculate the product of the multiplicand and the multiplier in parallel.

부분 곱 생성기(210)는 피승수 및 승수의 부분 곱들을 생성한다. 예를 들면, 8 비트(bit)의 피승수와 8 비트(bit)의 승수를 곱하면, 부분 곱 생성기(210)는 64개의 부분 곱을 생성할 수 있다. 즉, 부분 곱 생성기(210)는 64개의 부분 곱을 생성하는 64개의 로직 셀들을 가질 것이다.The partial product generator 210 generates partial products of the multiplicand and the multiplier. For example, by multiplying a multiplicand of 8 bits by a multiplier of 8 bits, the partial product generator 210 can generate 64 partial products. That is, the partial product generator 210 will have 64 logic cells that produce 64 partial products.

부분 곱 감소 모듈(220)은 생성된 부분 곱들을 누적하여 최종 곱을 생성하기 위한 합 비트(Sum-bit)들 및 캐리 비트(Carry-bit)들을 생성한다. 예를 들면, 부분 곱 감소 모듈(220)의 로직 셀들은 3개의 입력을 수신하여 합 비트 및 캐리 비트를 출력하는 왈리스 트리(Wallace tree)를 사용할 수 있다. 3개의 입력은 이전 행(Row)으로부터의 합 비트 및 캐리 비트, 그리고 부분 곱 생성기(210)로부터의 출력들 중 하나를 포함할 수 있다.The partial product reduction module 220 generates Sum-bits and Carry-bits for accumulating the generated partial products to generate a final product. For example, the logic cells of the partial product reduction module 220 may use a Wallace tree that receives three inputs and outputs a sum bit and a carry bit. The three inputs may include a sum bit and a carry bit from the previous row (Row), and one of the outputs from the partial product generator (210).

최종 덧셈기(230)는 부분 곱 감소 모듈(220)로부터 출력된 합 비트들과 캐리 비트들을 합하여 최종 곱을 출력할 수 있다. 예를 들면, 최종 덧셈기(230)의 로직 셀들은 부분 곱 생성기(210)에 포함된 적어도 하나의 로직 셀과 하나의 열(Column)을 이룰 수 있다.The final adder 230 may sum the sum bits and carry bits output from the partial product reduction module 220 to output the final product. For example, the logic cells of the final adder 230 may form a column with at least one logic cell included in the partial product generator 210.

도 3은 도 2의 부분 곱 생성기(210) 및 최종 덧셈기(230)의 배열을 보여주는 도면이다. 도 3을 참조하면, 부분 곱 생성기(210)는 피승수와 승수의 부분 곱을 계산하는 로직 셀들(a1b1~a8b8)을 포함할 수 있다. 또한, 최종 덧셈기(230)는 합 비트 셀들(Sum1~Sum15) 및 캐리 비트 셀들(Ca1~Ca15)을 포함할 수 있다. 이하에서, 피승수 및 승수는 8비트로 가정한다. 하지만, 피승수 및 승수는 8비트보다 크거나 작을 수 있다. 피승수는 a1~a8의 각 비트를 포함한다. 승수는 b1~b8의 각 비트를 포함한다.FIG. 3 is a diagram illustrating the arrangement of the partial product generator 210 and the final adder 230 of FIG. Referring to FIG. 3, the partial product generator 210 may include logic cells a1b1 through a8b8 that compute a partial product of a multiplicand and a multiplier. Also, the final adder 230 may include sum bit cells Sum1 to Sum15 and carry bit cells Ca1 to Ca15. In the following, the multiplicand and multiplier are assumed to be 8 bits. However, the multiplicand and multiplier may be greater or less than 8 bits. The multiplicand includes each bit of a1 to a8. The multiplier includes bits of b1 to b8.

부분 곱 생성기(210)는 로직 셀들(a1b1~a8b8)의 배열을 포함할 수 있다. 예를 들어, 로직 셀들(a1b1~a8b8)은 2차원 평면 상에 마름모와 유사한 평행사변형의 모양으로 배치될 수 있다. 로직 셀들(a1b1~a8b8)은 각 행(Row1~Row8)에 포함될 수 있다. 로직 셀들(a1b1~a8b8)은 각 열(Col1~Col16)에 포함될 수 있다.The partial product generator 210 may comprise an array of logic cells a1b1 through a8b8. For example, the logic cells a1b1 to a8b8 may be arranged in a parallelogram shape similar to rhombus on a two-dimensional plane. The logic cells a1b1 to a8b8 may be included in each row (Row1 to Row8). The logic cells a1b1 to a8b8 may be included in the columns Col1 to Col16.

부분 곱 감소 모듈(220)은 부분 곱들을 누적하여 합 비트들 및 캐리 비트들을 최종 덧셈기(230)의 합 비트 셀들(Sum1~Sum15) 및 캐리 비트 셀들(Ca1~Ca15)로 전달할 수 있다. 예를 들면, 부분 곱 감소 모듈(220)은 왈리스 트리(Wallace tree)를 사용할 수 있다. 따라서, 도시되지 않았지만, 부분 곱 감소 모듈(220)에 포함된 로직 셀들은 서로 같은 모양과 크기를 가질 수 있다.The partial product reduction module 220 may accumulate the partial products to transfer sum bits and carry bits to the sum bit cells Sum1 to Sum15 and carry bit cells Ca1 to Ca15 of the final adder 230. [ For example, the partial product reduction module 220 may use a Wallace tree. Thus, although not shown, the logic cells included in the partial product reduction module 220 may have the same shape and size as one another.

최종 덧셈기(130)는 합 비트 셀들(Sum1~Sum15) 및 캐리 비트 셀들(Ca1~Ca15)을 포함할 수 있다. 합 비트 셀들(Sum1~Sum15)은 하나의 행(Row)을 이룰 수 있다. 캐리 비트 셀들(Ca1~Ca15)은 하나의 행(Row)을 이룰 수 있다.The final adder 130 may include sum bit cells Sum1 to Sum15 and carry bit cells Ca1 to Ca15. The sum bit cells Sum1 to Sum15 can form one row (Row). The carry bit cells Ca1 to Ca15 can form one row.

시스템 온 칩(SoC)과 같은 논리 회로(Logical Circuitry)에는 다양한 데이터패스 로직(Datapath Logic)들이 사용되고 있다. 병렬 곱셈기(200)는 데이터패스 로직들 중 하나이다. 또한, 논리 회로는 다수의 병렬 곱셈기(200)를 포함하고 있다. 따라서, 병렬 곱셈기(200)를 전력, 성능 및 공간을 고려하여 빠르게 배치하면, 논리 회로를 설계하는 시간은 단축될 수 있다.Various types of datapath logic are used for logic circuitry such as system-on-chip (SoC). The parallel multiplier 200 is one of the datapath logic. In addition, the logic circuit includes a plurality of parallel multipliers (200). Therefore, if the parallel multiplier 200 is arranged quickly considering power, performance, and space, the time for designing the logic circuit can be shortened.

도 4는 본 발명의 실시 예에 따른 병렬 곱셈기 배치 방법을 보여주는 순서도이다. 도 4를 참조하면, 병렬 곱셈기 배치 방법에 따라 병렬 곱셈기(200)는 전력, 성능 및 공간을 고려하여 논리 회로 내에 신속하게 배치될 수 있다.4 is a flowchart illustrating a parallel multiplier arrangement method according to an embodiment of the present invention. Referring to FIG. 4, the parallel multiplier 200 according to the parallel multiplier arrangement method can be quickly disposed in the logic circuit in consideration of power, performance, and space.

S110 단계에서, 병렬 곱셈기 배치 시스템(100)은 병렬 곱셈기(200)의 데이터패스 넷리스트를 수신할 수 있다. 예를 들면, 병렬 곱셈기 배치 시스템(100)은 사용자에 의해 작성된 넷리스트 파일을 입력받을 수 있다. 또한, 병렬 곱셈기 배치 시스템(100)은 로직 합성(Logic Synthesis) 툴을 이용하여 넷리스트를 생성할 수 있다. 로직 합성(Logic Synthesis) 툴은 부분 곱 생성기(210)를 생성하는 데 필요한 파라미터들, 수학적 정보, 및 로직 셀들 사이의 관계 정보를 가지고 있다. In step S110, the parallel multiplier placement system 100 may receive the data passnet list of the parallel multiplier 200. [ For example, the parallel multiplier placement system 100 may receive a netlist file created by a user. In addition, the parallel multiplier placement system 100 may generate a netlist using a logic synthesis tool. The Logic Synthesis tool has the parameters necessary to generate the partial product generator 210, mathematical information, and relationship information between the logic cells.

S120 단계에서, 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀(Primary Input Cell, PI)들의 행(Row) 및 열(Column)을 결정할 수 있다. 예를 들면, 병렬 곱셈기 배치 시스템(100)은 로직 셀들(a1b1~a8b8) 중 일부를 초기 입력 셀(Primary Input Cell, PI)들로 정의할 수 있다. 초기 입력 셀(PI)은 다른 로직 셀들의 구조적 배치를 유도할 수 있다. 초기 입력 셀(PI)은 피승수 및 승수의 입력들의 네트(net)에 연결된 셀로 정의한다.In step S120, the parallel multiplier arrangement system 100 may determine a row and a column of primary input cells (PIs). For example, the parallel multiplier placement system 100 may define some of the logic cells a1b1 through a8b8 as Primary Input Cells (PIs). The initial input cell PI can induce a structural arrangement of other logic cells. The initial input cell (PI) is defined as a cell connected to the net of inputs of multiplicand and multiplier.

초기 입력 셀(PI)을 추출하는 방법은 부분 곱 생성기(210)의 로직 셀의 종류에 따라 달라질 수 있다. 예를 들면, 부분 곱 생성기(210)의 로직 셀은 부스(Booth) 타입 및 비 부스(Non-booth) 타입을 가질 수 있다. 부스 타입의 경우, 초기 입력 셀(PI)은 피승수의 네트(net)만을 이용하여 추출된다. 비 부스 타입의 경우, 초기 입력 셀(PI)는 피승수 및 승수의 네트(net)를 이용하여 추출된다.The method of extracting the initial input cell PI may vary depending on the type of the logic cell of the partial product generator 210. For example, the logic cell of the partial product generator 210 may have a Booth type and a Non-Booth type. For the booth type, the initial input cell (PI) is extracted using only the net of the multiplicand. For the non-Booth type, the initial input cell (PI) is extracted using the net of multiplicand and multiplier.

이렇게 추출된 초기 입력 셀(PI)은 위치를 추정하게 된다. 예를 들면, 행 추론(Row Inference) 알고리즘을 통해 초기 입력 셀(PI)의 행은 결정될 수 있다. 도 3을 참조하면, 로직 셀들(a1b1~a8b8)의 행은 승수(Multiplier)에 의해 결정될 수 있다. 예시적으로 제 1 행(Row1)에 포함된 로직 셀들(a1b1~a8b1)은 승수(b1)가 곱해진 셀들이다. 제 2 행(Row2)에 포함된 로직 셀들(a1b2~a8b2)은 승수(b2)가 곱해진 셀들이다. 제 3 행(Row3)에 포함된 로직 셀들(a1b3~a8b3)은 승수(b3)가 곱해진 셀들이다. 이와 같이 초기 입력 셀(PI)이 포함된 행의 위치는 승수에 의해 결정될 수 있다.The extracted initial input cell (PI) estimates the position. For example, a row of an initial input cell PI may be determined through a row inference algorithm. Referring to FIG. 3, the rows of logic cells a1b1 to a8b8 may be determined by a multiplier. Illustratively, the logic cells a1b1 to a8b1 included in the first row Row1 are cells multiplied by the multiplier b1. The logic cells a1b2 to a8b2 included in the second row Row2 are cells multiplied by the multiplier b2. The logic cells a1b3 to a8b3 included in the third row Row3 are the cells multiplied by the multiplier b3. Thus, the position of the row including the initial input cell PI can be determined by the multiplier.

또한, 열 추론(Column Inference) 알고리즘을 통해 초기 입력 셀(PI)의 열은 결정될 수 있다. 로직 셀들(a1b1~a8b8)의 열은 피승수(Multiplicand) 및 승수(Multiplier)에 의해 결정될 수 있다. 각 열에 포함된 로직 셀들은 피승수의 자릿수와 승수의 자릿수의 합이 동일한 셀들이다. 예시적으로 제 1 열(Col1)에 포함된 로직 셀(a1b1)은 피승수 및 승수의 자릿수 합이 2이다. 제 2 열(Col2)에 포함된 로직 셀들(a2b1, a1b2)은 피승수 및 승수의 자릿수 합이 3이다. 제 3 열(Col3)에 포함된 로직 셀들(a3b1, a2b2, a1b3)은 피승수 및 승수의 자릿수 합이 4이다. 이와 같이 초기 입력 셀(PI)이 포함된 열의 위치는 피승수 및 승수에 의해 결정될 수 있다.In addition, the column of the initial input cell PI can be determined through a column inference algorithm. The columns of the logic cells a1b1 to a8b8 can be determined by a multiplicand and a multiplier. The logic cells included in each column are cells having the same number of digits of the multiplicand and the multiplier of the multiplier. Illustratively, the logical cell a1b1 included in the first column Col1 has a sum of the multiplicand and the multiplier of the multiplier 2. The logic cells a2b1 and a1b2 included in the second column Col2 have a sum of the multiplicand and multiplier of 3. The logic cells a3b1, a2b2 and a1b3 included in the third column Col3 have a sum of the multiplicand and multiplier of four. Thus, the position of the column including the initial input cell PI can be determined by the multiplicand and the multiplier.

S130 단계에서, 병렬 곱셈기 배치 시스템(100)은 각 열에 포함되는 초기 출력 셀(Primary Output Cell, PO)을 결정할 수 있다. 초기 출력 셀(PO)은 넷리스트에서 합 비트들을 추적하면 구할 수 있다. 예를 들면, 초기 입력 셀의 출력은 부분 곱 감소 모듈(220)에 포함된 압축 셀(Compress Cell)에 연결될 수 있다. 압축 셀은 복수의 입력을 수신하여 합 출력(Sum-out) 및 캐리 출력(Carry-out)을 출력할 수 있다. 여기서 합 출력은 열(Col1~Col16)의 변동없이 초기 출력 셀(PO)까지 연결될 수 있다. 따라서, 초기 입력 셀(PI)에 연결된 부분 곱 감소 모듈(220)의 압축 셀들의 합 출력을 추적하면, 초기 출력 셀(PO)의 열(Col1~Col16)을 결정할 수 있다.In step S130, the parallel multiplier placement system 100 may determine an initial output cell (PO) included in each column. The initial output cell (PO) can be obtained by tracing the sum bits in the netlist. For example, the output of the initial input cell may be coupled to a compression cell included in the partial product reduction module 220. The compressed cell can receive a plurality of inputs and output a sum output (Sum-out) and a carry output (Carry-out). Here, the sum output can be connected to the initial output cell PO without variation of the columns Col1 to Col16. Therefore, by tracing the sum output of the compressed cells of the partial product reduction module 220 connected to the initial input cell PI, the columns Col1 to Col16 of the initial output cells PO can be determined.

S140 단계에서, 병렬 곱셈기 배치 시스템(100)은 소정의 어레이에 초기 입력 셀(PI)들 및 초기 출력 셀(PO)들을 맵핑할 수 있다. 예를 들면, 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀들 및 초기 출력 셀들을 맵핑하기 위해 소정의 어레이를 생성한다. 소정의 어레이는 논리적으로 병렬 곱셈기(200)의 구조를 가진다. 만약 모든 로직 셀들이 모두 할당되도록 소정의 어레이의 크기를 정한다면 로직 셀들이 맵핑된 어레이의 종횡비(aspect ratio)가 커져서, 병렬 곱셈기(200)의 결과 품질(Quality of Result, QoR)이 감소할 수 있다. 따라서, 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀의 개수에 따라 소정의 어레이의 크기를 결정한다. 예를 들면, 병렬 곱셈기 배치 시스템(100)은 소정의 어레이의 행의 개수를 미리 설정할 수 있다. 병렬 곱셈기 배치 시스템(100)은 각 열에 맵핑될 초기 입력 셀의 개수를 미리 설정된 행의 개수로 나누어 결정될 수 있다. 다만, 이때 미스-맵핑(Mis-mapping)되는 초기 입력 셀이 발생할 수 있다.In step 140, the parallel multiplier placement system 100 may map initial input cells (PIs) and initial output cells (POs) to a given array. For example, the parallel multiplier placement system 100 creates a predetermined array to map initial input cells and initial output cells. The predetermined array has a structure of a logical parallel multiplier 200 logically. If the size of a given array is allotted so that all of the logic cells are allotted, the aspect ratio of the array to which the logic cells are mapped will increase, and the quality of result (QoR) of the parallel multiplier 200 may decrease have. Thus, the parallel multiplier placement system 100 determines the size of a given array according to the number of initial input cells. For example, the parallel multiplier placement system 100 may preset the number of rows of a given array. The parallel multiplier placement system 100 may be determined by dividing the number of initial input cells to be mapped into each column by the number of rows that are set in advance. However, at this time, mis-mapped initial input cells may occur.

본 발명의 병렬 곱셈기 배치 방법은 초기 입력 셀들의 미스-맵핑(Mis-mapping)을 허용한다. 미스-맵핑(Mis-mapping)은 초기 입력 셀들 및 초기 출력 셀들이 추출된 위치가 아닌 다른 곳에 맵핑되는 것을 말한다. 병렬 곱셈기 배치 시스템(100)은 미스-맵핑(Mis-mapping)되는 로직 셀들을 최소화하면서 맵핑되는 로직 셀들의 개수를 최대화하는 최적화 동작을 수행한다.The parallel multiplier placement method of the present invention allows mis-mapping of initial input cells. Mis-mapping refers to mapping of initial input cells and initial output cells to locations other than the extracted location. The parallel multiplier placement system 100 performs an optimization operation that maximizes the number of logic cells mapped while minimizing mis-mapped logic cells.

병렬 곱셈기 배치 시스템(100)은 초기 입력 셀들의 추정된 위치와 실제 맵핑된 위치의 차이의 합을 최소화하도록 최소 비용 최대 흐름(Min-cost Maximum flow, MCF) 알고리즘에 따라 초기 입력 셀들을 맵핑할 수 있다. 하지만, MCF 알고리즘은 로직 셀들 사이의 네트(net) 연결을 고려하지 않는다. 그래서 병렬 곱셈기 배치 시스템(100)은 하프-페리미터 와이어 랭스(Half-Perimeter Wire Length, HPWL) 알고리즘도 함께 수행하여 로직 셀들 사이의 네트(net) 연결도 고려할 수 있다. 병렬 곱셈기 배치 시스템(100)은 MCF 및 HPWL 알고리즘에 서로 다른 가중치를 줄 수 있다.The parallel multiplier placement system 100 may map the initial input cells according to the Min-cost Maximum Flow (MCF) algorithm to minimize the sum of the difference between the estimated position of the initial input cells and the actual mapped position have. However, the MCF algorithm does not consider a net connection between logic cells. Thus, the parallel multiplier placement system 100 may also implement a Half-Perimeter Wire Length (HPWL) algorithm to account for the net connection between the logic cells. The parallel multiplier placement system 100 can give different weights to the MCF and HPWL algorithms.

S150 단계에서, 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀들 및 초기 출력 셀들의 물리적 크기에 기초하여 소정의 어레이의 열들을 정렬할 수 있다. 예를 들면, S140 단계까지는 초기 입력 셀들 및 초기 출력 셀들 각각의 크기는 고려되지 않는다. 하지만, 초기 입력 셀들 및 초기 출력 셀들은 각각 서로 다른 크기를 가질 수 있다. 따라서, S140 단계에서 맵핑된 어레이는 초기 입력 셀들 및 초기 출력 셀들의 크기를 고려하여 다시 정렬될 필요가 있다.In step S150, the parallel multiplier placement system 100 may align the columns of a given array based on the physical size of the initial input cells and the initial output cells. For example, until step S140, the sizes of the initial input cells and the initial output cells are not considered. However, the initial input cells and the initial output cells may each have a different size. Therefore, the array mapped in step S140 needs to be rearranged considering the sizes of the initial input cells and the initial output cells.

예를 들면, 병렬 곱셈기 배치 시스템(100)은 비트-슬라이스 정렬(Bit-slice Alignment) 알고리즘을 수행할 수 있다. 비트-슬라이스 정렬 알고리즘은 미리 정해진 제한 폭 내에서 미스-정렬(Mis-alignment)이 최소화 되도록 초기 입력 셀들 및 초기 출력 셀들을 조정한다. 미스-정렬(Mis-alignment)은 초기 입력 셀들 및 초기 출력 셀들 각각이 맵핑된 열에서 벗어난 정도를 말한다.For example, the parallel multiplier placement system 100 may perform a Bit-Slice Alignment algorithm. The bit-slice alignment algorithm adjusts initial input cells and initial output cells so that mis-alignment is minimized within a predetermined limit width. Mis-alignment refers to the degree to which each of the initial input cells and the initial output cells deviate from the mapped row.

이상에서 본 발명의 실시 예에 따른 병렬 곱셈기 배치 방법이 설명되었다. 병렬 곱셈기 배치 시스템(100)은 데이터패스 넷리스트를 입력받아 초기 입력 셀들 및 초기 출력 셀들을 추출할 수 있다. 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀들 및 초기 출력 셀들을 배치 알고리즘들에 따라 소정의 어레이에 맵핑할 수 있다. 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀들 및 초기 출력 셀들을 배치한 후 나머지 로직 셀들의 배치를 유도할 수 있다. 따라서, 병렬 곱셈기 배치 시스템(100)은 전력, 성능 및 공간을 고려하여 메뉴얼(Manual)적인 방법보다 빠르게 병렬 곱셈기를 배치할 수 있다.The parallel multiplier arrangement method according to the embodiment of the present invention has been described above. The parallel multiplier placement system 100 may receive the data passnet list and extract initial input cells and initial output cells. The parallel multiplier placement system 100 may map initial input cells and initial output cells to a predetermined array according to placement algorithms. The parallel multiplier placement system 100 may derive the initial input cells and the initial output cells and then the placement of the remaining logic cells. Thus, the parallel multiplier placement system 100 can place the parallel multipliers faster than the manual method considering power, performance, and space.

도 5는 본 발명의 병렬 곱셈기 배치 시스템에 의해 추출되는 초기 입력 셀을 예시적으로 보여주는 도면이다. 도 5를 참조하면, 초기 입력 셀은 부스(Booth) 타입 또는 비 부스(Non-booth) 타입을 가질 수 있다. 하지만, 초기 입력 셀의 종류는 이것에 한정되지 않는다. 초기 입력 셀은 다양한 종류의 형태를 가질 수 있다.5 is an exemplary diagram illustrating an initial input cell extracted by the parallel multiplier arrangement system of the present invention. Referring to FIG. 5, an initial input cell may have a booth type or a non-booth type. However, the type of the initial input cell is not limited to this. The initial input cell may have various types of shapes.

병렬 곱셈기 배치 시스템(100)은 초기 입력 셀의 종류에 따라 위치 추정 알고리즘을 다르게 적용할 수 있다. 예를 들어, 비 부스 타입의 초기 입력 셀의 부분 곱 출력(PP_ij)은 입력들(X_i, Y_j)의 조합으로 만들어진다. 여기서 X는 승수이고, Y는 피승수이고, i는 부분 곱의 제 i 번째 행을 표시하고, j는 부분 곱의 제 j 번째 열을 표시한다. 따라서, 비 부스 타입의 경우, 입력들(X_i, Y_j)을 통해 바로 초기 입력 셀이 속한 행과 열을 알 수 있다. 예를 들어, 부스 타입의 초기 입력 셀의 부분 곱 출력(PP_ij)은 입력들(X_2i _-1, X_2i, X_2i ₊₁, Y_j, Y_j _- ₁)의 조합으로 만들어진다. 부스 타입의 경우, 입력들(X_2i _-1, X_2i, X_2i ₊ ₁)을 2로 나누고 플로어 동작(Floor Operation)을 수행하여 초기 입력 셀의 행의 위치는 추정된다.The parallel multiplier placement system 100 may apply the location estimation algorithm differently depending on the type of initial input cell. For example, the partial product output (PP _ij ) of the initial input cell of the non-booth type is made up of a combination of inputs (X _i , Y _j ). Where X is a multiplier, Y is a multiplicand, i represents the ith row of the partial product, and j represents the jth column of the partial product. Therefore, in the case of the non-booth type, the row and column to which the initial input cell belongs can be known through the inputs X _i and Y _j . For example, the partial products output from the initial input cell of the booth type (PP _ij) is the _input-are made of a combination of _{_{_{(X 2i -1, X 2i,}}} X 2i +1, Y j, Y j 1). In the case of the booth type, the position of the row of the initial input cell is estimated by dividing the inputs (X _2i _-1 , X _2i , X _2i ₊ ₁ ) by 2 and performing a floor operation.

도 6은 병렬 곱셈기의 부분 곱 감소 모듈에 포함된 압축 셀들을 예시적으로 보여주는 도면이다. 도 6을 참조하면, 압축 셀들(221~224) 각각은 3개의 입력을 수신하여 2개의 출력을 전송한다. 예를 들면, 3개의 입력은 이전 행으로부터의 합 출력(SUM) 및 캐리 출력(CA), 그리고 부분 곱 생성기(210)로부터의 출력들 중 하나의 초기 입력 셀 출력(PI)을 포함할 수 있다. 하지만, 압축 셀들(221~224)은 이것에 한정되지 않는다.6 is an exemplary illustration of the compression cells included in the partial multiplication reduction module of the parallel multiplier. Referring to FIG. 6, each of the compressed cells 221 to 224 receives three inputs and transmits two outputs. For example, the three inputs may include an initial input cell output (PI) of one of the sum output SUM and the carry output CA from the previous row and the outputs from the partial product generator 210 . However, the compressed cells 221 to 224 are not limited to this.

도 6에서, 압축 셀들(221, 222)은 제 j 열에 속한다. 압축 셀들(223, 224)은 제 j+1 열에 속한다. 여기서 합 출력(SUM)들은, 열의 변경없이, 하나의 열을 따라서 전송된다. 반면에 캐리 출력(CA)들은 행이 바뀔 때마다 열도 변경된다. 따라서, 병렬 곱셈기 배치 시스템(100)은 초기 입력 셀과 연결된 압축 셀들의 합 비트(SUM)들을 추적하면 초기 입력 셀이 속한 열을 추출할 수 있다.In Fig. 6, the compressed cells 221 and 222 belong to the j-th column. The compressed cells 223 and 224 belong to the (j + 1) th column. Here, sum outputs (SUM) are transmitted along one column, without changing the column. On the other hand, the carry outputs (CAs) change the column every time a row is changed. Accordingly, the parallel multiplier placement system 100 can extract the column to which the initial input cell belongs by tracking the sum bits (SUM) of the compressed cells connected to the initial input cell.

도 7은 본 발명의 실시 예에 따른 최소 비용 최대 흐름(Min-cost Maximum Flow, MCF) 알고리즘을 예시적으로 보여주는 도면이다. 도 7을 참조하면, 초기 입력 셀(PI cell[j])의 맵핑 방법을 보여준다. 초기 입력 셀(PI cell[j])은 제 j 열에 속하는 것으로 추정된 셀이다. 예를 들면, 각 열(j+1, j, j-1)은 각각의 슬롯(Slot[i+1], Slot[i], Slot[i-1])을 포함할 수 있다. 초기 입력 셀(PI cell[i])은 슬롯들(Slot[i+1], Slot[i], Slot[i-1]) 중 하나에 맵핑될 수 있다. 따라서, 초기 입력 셀(PI cell[i])은 인접한 초기 입력 셀들(PI cell[i+1], PI cell[i-1])과 비교될 수 있다. 각각의 화살표는 초기 입력 셀들(PI cell[i+1], PI cell[i], PI cell[i-1])이 슬롯들(Slot[i+1], Slot[i], Slot[i-1])에 맵핑되는 경우들을 보여준다. 각 경우(①~⑦)에 따라 서로 다른 비용(Cost) 및 흐름 능력(Flow Capacity)을 가질 수 있다. 따라서, 병렬 곱셈기 배치 시스템(100)은 비용(Cost)을 최소화하고 흐름 능력(Flow Capacity)을 최대화하도록 초기 입력 셀(PI cell[i])을 소정의 어레이에 맵핑할 수 있다.7 is an exemplary diagram illustrating a Min-cost Maximum Flow (MCF) algorithm according to an embodiment of the present invention. Referring to FIG. 7, a mapping method of an initial input cell (PI cell [j]) is shown. The initial input cell (PI cell [j]) is a cell estimated to belong to the jth column. For example, each column j + 1, j, j-1 may include a respective slot Slot [i + 1], Slot [i], Slot [i-1]. The initial input cell PI cell [i] may be mapped to one of the slots Slot [i + 1], Slot [i], Slot [i-1] Therefore, the initial input cell PI cell [i] can be compared with the adjacent initial input cells PI cell [i + 1] and PI cell [i-1]. Each arrow indicates that the initial input cells (PI cell [i + 1], PI cell [i], PI cell [i-1]) are in the slots Slot [i + 1], Slot [i] 1]). And can have different cost and flow capacity according to each case (① ~ ⑦). Thus, the parallel multiplier placement system 100 may map the initial input cells (PI cell [i]) to a given array to minimize cost and maximize flow capacity.

도 7에서는 예시적으로 임의의 하나의 셀(PI cell[j])에 대하여 설명하였지만, 병렬 곱셈기 배치 시스템(100)은 추출된 모든 초기 입력 셀들에 대하여 MCF 알고리즘을 수행할 수 있다. 병렬 곱셈기 배치 시스템(100)은 MCF 알고리즘을 통하여 모든 초기 입력 셀들에 대한 결과 값들의 합이 최소가 되도록 초기 입력 셀들을 소정의 어레이에 맵핑할 수 있다.Although FIG. 7 exemplarily illustrates a single cell (PI cell [j]), the parallel multiplier placement system 100 can perform the MCF algorithm on all extracted initial input cells. The parallel multiplier placement system 100 may map the initial input cells to a given array so that the sum of the results for all initial input cells is minimized through the MCF algorithm.

도 8은 본 발명의 실시 예에 따른 비트-슬라이스 정렬(bit-slice alignment) 알고리즘을 예시적으로 보여주는 도면이다. 소정의 어레이에 맵핑된 초기 입력 셀들의 크기를 고려하면, 초기 입력 셀들의 모서리는 각 열에서 좌우로 벗어날 수 있다. 그러면 열들의 경계에서 가장 많이 벗어난 셀들의 모서리 사이의 차이를 구할 수 있다. 예를 들면, 제 j+2 열(Col[j+2])과 제 j+1 열(Col[j+1]) 사이에서 최대 모서리 차이(g1)는 셀(C[i-1,j+2])과 셀(C[i,j+1]) 또는 셀(C[i+1,j+1])에 의해 결정될 수 있다. 제 j+1 열(Col[j+1])과 제 j 열(Col[j]) 사이에서 최대 모서리 차이(g2)는 셀(C[i-1,j+1])과 셀(C[i,j])에 의해 결정될 수 있다. 제 j 열(Col[j])과 제 j-1 열(Col[j-1]) 사이에서 최대 모서리 차이(g3)는 셀(C[i,j])과 셀(C[i-1,j-1]) 또는 셀(C[i+1,j-1])에 의해 결정될 수 있다. 제 j-1 열(Col[j-1])과 제 j-2 열(Col[j-2]) 사이에서 최대 모서리 차이(g4)는 셀(C[i+1,j-1])과 셀(C[i,j-2])에 의해 결정될 수 있다. 초기 입력 셀이 맵핑되지 않은 슬롯(Slot)에는 빈 공간 셀(B[i,j+2], B[i+1,j+2])을 할당할 수 있다.8 is an exemplary diagram illustrating a bit-slice alignment algorithm according to an embodiment of the present invention. Considering the size of the initial input cells mapped to a given array, the corners of the initial input cells may deviate laterally in each column. You can then find the difference between the corners of the cells that are most out of the bounds of the columns. For example, the maximum edge difference g1 between the j + 2 column Col [j + 2] and the j + 1 column Col [j + 1] 2] and the cell C [i, j + 1] or the cell C [i + 1, j + 1]. The maximum edge difference g2 between the j + 1 column (Col [j + 1]) and the jth column (Col [j] i, j]). The maximum edge difference g3 between the jth column Col [j] and the jth column Col [j-1] is the sum of the cell C [i, j] j-1] or the cell C [i + 1, j-1]. The maximum edge difference g4 between the j-th column Col [j-1] and the j-2 column Col [j-2] May be determined by the cell C [i, j-2]. The empty space cells B [i, j + 2] and B [i + 1, j + 2] can be allocated to the slots to which the initial input cells are not mapped.

병렬 곱셈기 배치 시스템(100)은 이상과 같이 결정된 최대 모서리 차이들(g1~g4)의 합이 미리 정해진 폭 제한(Width Constraint) 내에서 최소가 되도록 초기 입력 셀들의 위치를 조절할 수 있다. 도 8에서는 예시적으로 3개의 행들과 5개의 열들로 설명하였지만, 병렬 곱셈기 배치 시스템(100)은 동일한 방식으로 맵핑된 초기 입력 셀들에 대하여 비트-슬라이스 정렬(bit-slice alignment) 알고리즘을 수행할 수 있다.The parallel multiplier placement system 100 can adjust the position of the initial input cells so that the sum of the maximum edge differences g1 to g4 determined as described above is minimized within a predetermined width constraint. Although illustratively described with three rows and five columns in FIG. 8, the parallel multiplier placement system 100 can perform a bit-slice alignment algorithm on the initial input cells mapped in the same manner have.

도 9는 본 발명의 실시 예에 따른 병렬 곱셈기 배치 방법에 따라 배치된 병렬 곱셈기를 포함하는 논리 회로를 예시적으로 보여주는 도면이다. 도 9를 참조하면, 논리 회로(1000)는 복수의 로직 셀(Logic Cell)들을 포함할 수 있다. 예를 들면, 논리 회로(1000)는 CPU, GPU, 시스템 온 칩(SoC) 또는 어플리케이션 프로세서(AP) 등을 포함할 수 있다. 로직 셀들을 배치함에 있어서, 본 발명에 따른 병렬 곱셈기 배치 방법에 따라 병렬 곱셈기(200)들이 먼저 배치될 수 있다. 병렬 곱셈기(200)들이 배치된 후, 나머지 로직 셀들은 그 기능에 따라 배치될 수 있다. 따라서, 논리 회로(1000)에서 전력, 성능 및 공간적으로 최적화되도록 로직 셀들이 신속하게 배치될 수 있다.FIG. 9 is a diagram illustrating a logic circuit including a parallel multiplier arranged according to a parallel multiplier arrangement method according to an embodiment of the present invention. Referring to FIG. Referring to FIG. 9, the logic circuit 1000 may include a plurality of logic cells. For example, the logic circuit 1000 may include a CPU, a GPU, a system-on-chip (SoC), or an application processor (AP). In arranging the logic cells, the parallel multipliers 200 may be arranged first according to the parallel multiplier arrangement method of the present invention. After the parallel multipliers 200 are placed, the remaining logic cells may be arranged according to their function. Thus, logic cells can be quickly arranged to optimize power, performance, and space in the logic circuit 1000.

이상에서와 같이 도면과 명세서에서 실시 예가 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허 청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허 청구범위의 기술적 사상에 의해 정해져야 할 것이다.The embodiments have been disclosed in the drawings and specification as described above. Although specific terms have been employed herein, they are used for purposes of illustration only and are not intended to limit the scope of the invention as defined in the claims or the claims. Therefore, those skilled in the art will appreciate that various modifications and equivalent embodiments are possible without departing from the scope of the present invention. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

100 : 병렬 곱셈기 배치 시스템
130 : 워킹 메모리
131 : 구조 분석 모듈
132 : 배치/라우팅 툴
150 : 입출력 장치
170 : 저장 장치
190 : 시스템 버스
200 : 병렬 곱셈기
210 : 부분 곱 생성기
220 : 부분 곱 감소 모듈
230 : 최종 덧셈기
1000 : 논리 회로100: Parallel multiplier placement system
130: Working memory
131: Structural Analysis Module
132: Placement / Routing Tools
150: input / output device
170: Storage device
190: System bus
200: parallel multiplier
210: partial product generator
220: partial product reduction module
230: final adder
1000: logic circuit

Claims

A method of arranging a parallel multiplier using a batch-routing tool driven by a computer, the method comprising:
Receiving a data pathnet list for the parallel multiplier;
Extracting locations of initial input cells and initial output cells from the data passnet list using a structure analysis module;
Mapping the initial input cells and the initial output cells to a particular array using the placement-routing tool; And
And aligning each column of the initial input cells and the initial output cells based on the physical size of the initial input cells using the placement-routing tool,
Wherein the size of the particular array is determined by the number of initial input cells.

The method according to claim 1,
Wherein the structure analyzing module extracts positions of the initial input cells and the initial output cells using a multiplicand and a multiplier input to the parallel multiplier.

The method according to claim 1,
Wherein a row to which the initial input cells belong is estimated using a multiplier input to the parallel multiplier,
Wherein the column to which the initial input cells belong is estimated using the multiplier and the multiplicand input to the parallel multiplier.

The method according to claim 1,
Wherein the column to which the initial output cells belong is determined by tracking the sum output of the compressed cells connected to the initial input cells.

The method according to claim 1,
In the step of mapping to the specific array,
The particular array comprising a plurality of slots,
Wherein the placement-routing tool is configured to provide the initial input cells and the initial output to the plurality of slots such that each of the initial input cells has a minimum cost and a maximum flow capability when mapped to one of the plurality of slots. A mapping method for mapping cells.

6. The method of claim 5,
Wherein the cost is determined by a distance between an estimated location of the initial input cells and a location actually mapped.

6. The method of claim 5,
Wherein the flow capability is proportional to the amount of data per time transmitted from the initial input cells to the initial output cells.

6. The method of claim 5,
Wherein the batch-routing tool applies different weights to the cost and the flow capability to map the initial input cells and the initial output cells to the particular array.

The method according to claim 1,
The method of claim 1, wherein in the step of aligning each of the columns, in the first and second columns adjacent to each other, an edge of a first cell out of the initial input cells included in the first column, Wherein the initial input cells included in the first column and the second column are relocated to minimize a distance between edges of a second cell out of the initial input cells that deviates most in the first column direction.

The method according to claim 1,
Wherein in the step of aligning the columns, empty space cells having a specific size are mapped to slots to which the initial input cells in the specific array are not mapped.