KR20210088652A

KR20210088652A - network interface device

Info

Publication number: KR20210088652A
Application number: KR1020217017269A
Authority: KR
Inventors: 스티븐 레슬리 포프; 닐 터튼; 데이비드 제임스 리도크; 드미트리 키타리에프; 소한 립두만; 데릭 에드워드 로버츠
Original assignee: 자일링크스 인코포레이티드
Priority date: 2018-11-05
Filing date: 2019-11-05
Publication date: 2021-07-14
Also published as: EP3877851A1; CN113272793A; WO2020094664A1; JP2022512879A

Abstract

복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 구비하는 네트워크 인터페이스 디바이스. 복수의 프로세싱 유닛의 각각은 자기 자신의 적어도 하나의 미리 정의된 동작과 관련된다. 컴파일 시간에, 하드웨어 모듈은, 데이터 패킷과 관련하여 기능을 수행하기 위해, 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 동작을 소정의 순서로 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것에 의해 구성된다. 컴파일러는 각각의 프로세싱 유닛에 상이한 프로세싱 스테이지를 할당하기 위해 제공된다. 컨트롤러는, 다른 프로세싱 회로가 컴파일되는 동안 하나의 프로세싱 회로부가 사용될 수도 있도록, 상이한 프로세싱 회로부 사이를 즉석에서 스위칭하기 위해 제공된다.A network interface device comprising a hardware module comprising a plurality of processing units. Each of the plurality of processing units is associated with at least one predefined operation of its own. At compile time, the hardware module by arranging at least some of the plurality of processing units to perform, in a predetermined order, their respective at least one operation with respect to the data packet, to perform a function with respect to the data packet is composed A compiler is provided for allocating a different processing stage to each processing unit. A controller is provided for on-the-fly switching between different processing circuitry so that one processing circuitry may be used while the other processing circuitry is being compiled.

Description

network interface device

본 출원은 데이터 패킷과 관련하여 기능을 수행하기 위한 네트워크 인터페이스 디바이스에 관한 것이다.This application relates to a network interface device for performing a function in connection with a data packet.

네트워크 인터페이스 디바이스가 공지되어 있으며 통상적으로 컴퓨팅 디바이스와 네트워크 사이에서 인터페이스를 제공하기 위해 사용된다. 네트워크 인터페이스 디바이스는 네트워크로부터 수신되는 데이터를 프로세싱하도록 및/또는 네트워크 상에 배치될 데이터를 프로세싱하도록 구성될 수 있다.Network interface devices are known and are typically used to provide an interface between a computing device and a network. The network interface device may be configured to process data received from the network and/or to process data to be placed on the network.

한 양태에 따르면, 호스트 디바이스를 네트워크에 인터페이싱하기 위한 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 제1 인터페이스 - 제1 인터페이스는 복수의 데이터 패킷을 수신하도록 구성됨 - ; 복수의 프로세싱 유닛 - 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련됨 - 을 포함하는 구성 가능한 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.According to an aspect, there is provided a network interface device for interfacing a host device to a network, the network interface device comprising: a first interface, the first interface configured to receive a plurality of data packets; a configurable hardware module comprising a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step, wherein each processing unit comprises a predefined predefined type of operation executable in a single step; type of operation, wherein at least some of the plurality of processing units are associated with a different predefined type of operation, the hardware module being configured to configure a first data processing pipeline for processing one or more of the plurality of data packets. provide to interconnect at least a portion of the plurality of processing units to perform a first function in connection with the one or more of the plurality of data packets.

몇몇 실시형태에서, 제1 기능은 필터링 기능을 포함한다. 몇몇 실시형태에서, 기능은 터널링, 캡슐화(encapsulation) 및 라우팅 기능 중 적어도 하나를 포함한다. 몇몇 실시형태에서, 제1 기능은 확장된 버클리 패킷 필터 기능(extended Berkley packet filter function)을 포함한다.In some embodiments, the first function includes a filtering function. In some embodiments, the functionality includes at least one of tunneling, encapsulation, and routing functionality. In some embodiments, the first function comprises an extended Berkley packet filter function.

몇몇 실시형태에서, 제1 기능은 분산형 서비스 거부 스크러빙 동작(distributed denial of service scrubbing operation)을 포함한다.In some embodiments, the first function comprises a distributed denial of service scrubbing operation.

몇몇 실시형태에서, 제1 기능은 방화벽 동작을 포함한다.In some embodiments, the first function comprises firewall operation.

몇몇 실시형태에서, 제1 인터페이스는 네트워크로부터 제1 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a first data packet from the network.

몇몇 실시형태에서, 제1 인터페이스는 호스트 디바이스로부터 제1 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a first data packet from a host device.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 그들의 관련된 적어도 하나의 미리 정의된 동작을 병렬로 수행하도록 구성된다.In some embodiments, two or more of at least some of the plurality of processing units are configured to perform their associated at least one predefined operation in parallel.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 하드웨어 모듈의 공통 클록 신호에 따라 그들의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, two or more of at least some of the plurality of processing units are configured to perform their associated predefined type of operation according to a common clock signal of the hardware module.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상의 각각은 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에서 자신의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, each of two or more of at least some of the plurality of processing units is configured to perform its associated predefined type of operation within a predefined length of time defined by the clock signal.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은: 미리 정의된 길이의 시간의 시간 기간 내에 제1 데이터 패킷에 액세스하도록; 그리고 미리 정의된 길이의 시간의 종료에 응답하여, 각각의 적어도 하나의 동작의 결과를 다음 번 프로세싱 유닛으로 전송하도록 구성된다.In some embodiments, two or more of at least some of the plurality of processing units are configured to: access the first data packet within a time period of a predefined length of time; and in response to the end of the predefined length of time, send a result of each at least one operation to the next processing unit.

몇몇 실시형태에서, 결과는 다음의 것 중 적어도 하나 이상을 포함한다: 적어도, 복수의 데이터 패킷 중 하나 이상으로부터의 값; 맵 상태에 대한 업데이트; 및 메타데이터.In some embodiments, the result includes at least one or more of the following: at least a value from one or more of the plurality of data packets; update on map state; and metadata.

몇몇 실시형태에서, 복수의 프로세싱 유닛의 각각은 각각의 프로세싱 유닛과 관련된 적어도 하나의 동작을 수행하도록 구성되는 주문형 집적 회로를 포함한다.In some embodiments, each of the plurality of processing units includes an application specific integrated circuit configured to perform at least one operation associated with the respective processing unit.

몇몇 실시형태에서, 프로세싱 유닛의 각각은 필드 프로그래머블 게이트 어레이를 포함한다. 몇몇 실시형태에서, 프로세싱 유닛의 각각은 임의의 다른 타입의 소프트 로직을 포함한다.In some embodiments, each of the processing units includes an array of field programmable gates. In some embodiments, each of the processing units includes any other type of soft logic.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 하나는 디지털 회로 및 디지털 회로에 의해 실행되는 프로세싱에 관련되는 상태를 저장하는 메모리를 포함하되, 디지털 회로는, 메모리와 통신하여, 각각의 프로세싱 유닛과 관련되는 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, at least one of the plurality of processing units includes a memory that stores digital circuitry and state related to processing executed by the digital circuitry, the digital circuitry being, in communication with the memory, associated with each processing unit is configured to perform an action of a predefined type.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 복수의 프로세싱 유닛 중 두 개 이상이 액세스 가능한 메모리를 포함하되, 메모리는 제1 데이터 패킷과 관련되는 상태를 저장하도록 구성되고, 하드웨어 모듈에 의한 제1 기능의 수행 동안, 복수의 프로세싱 유닛 중 두 개 이상은 상태에 액세스하여 수정하도록 구성된다.In some embodiments, the network interface device includes a memory accessible to two or more of the plurality of processing units, wherein the memory is configured to store a state associated with the first data packet, and wherein the hardware module performs a first function. while at least two of the plurality of processing units are configured to access and modify the state.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 제1의 것은 복수의 프로세싱 유닛 중 제2의 것에 의한 상태의 값의 액세스 동안 스톨하도록(stall) 구성된다.In some embodiments, a first of at least some of the plurality of processing units is configured to stall during access of a value of the state by a second of the plurality of processing units.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 하나 이상은, 개별적으로, 그들의 관련된 미리 정의된 타입의 동작에 기초하여, 각각의 파이프라인에 고유한 동작을 수행하도록 구성 가능하다.In some embodiments, one or more of the plurality of processing units are configurable to perform an operation unique to each pipeline, individually, based on their associated predefined type of operation.

몇몇 실시형태에서, 하드웨어 모듈은 명령어를 수신하도록, 그리고 상기 명령어에 응답하여, 다음의 것 중 적어도 하나를 하도록 구성된다: 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하는 것; 상기 복수의 프로세싱 유닛 중 하나 이상으로 하여금 상기 하나 이상의 데이터 패킷과 관련하여 그들의 관련된 미리 정의된 타입의 동작을 수행하게 하는 것; 상기 복수의 프로세싱 유닛 중 하나 이상을 데이터 프로세싱 파이프라인에 추가하는 것; 및 데이터 프로세싱 파이프라인으로부터 상기 복수의 프로세싱 유닛 중 하나 이상을 제거하는 것.In some embodiments, the hardware module is configured to receive the instruction, and in response to the instruction, do at least one of the following: provide a data processing pipeline for processing one or more of the plurality of data packets. interconnecting at least some of the plurality of processing units for causing one or more of the plurality of processing units to perform an operation of their associated predefined type with respect to the one or more data packets; adding one or more of the plurality of processing units to a data processing pipeline; and removing one or more of the plurality of processing units from a data processing pipeline.

몇몇 실시형태에서, 미리 정의된 동작은 다음의 것 중 적어도 하나를 포함한다: 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 데이터 패킷의 적어도 하나의 값을 메모리에 저장하는 것; 및 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, the predefined action includes at least one of: loading at least one value of the first data packet from memory; storing at least one value of the data packet in a memory; and performing a lookup on the lookup table to determine an action to be performed with respect to the data packet.

몇몇 실시형태에서, 하드웨어 모듈은 명령어를 수신하도록 구성되되, 하드웨어 모듈은, 상기 명령어에 응답하여, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하고, 명령어는 제3 프로세싱 파이프라인을 통해 전송되는 데이터 패킷을 포함한다.In some embodiments, a hardware module is configured to receive an instruction, wherein the hardware module is configured to: in response to the instruction, provide a data processing pipeline for processing one or more of the plurality of data packets. configurable to interconnect at least some of the units, the instructions comprising a data packet transmitted via a third processing pipeline.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상은, 상기 명령어에 응답하여, 복수의 데이터 패킷 중 상기 하나 이상의 데이터와 관련하여 그들의 관련된 미리 정의된 타입의 동작 중 선택된 동작을 수행하도록 구성 가능하다.In some embodiments, one or more of at least some of the plurality of processing units are configured to, in response to the instruction, perform a selected one of their associated predefined type of operation with respect to the one or more data of a plurality of data packets. It is possible.

몇몇 실시형태에서, 복수의 컴포넌트는 하드웨어 모듈과는 상이한 회로부(circuitry)에서 제1 기능을 제공하도록 구성되는 복수의 컴포넌트 중 제2의 것을 포함하되, 네트워크 인터페이스 디바이스는, 프로세싱 파이프라인을 통과하는 데이터 패킷으로 하여금, 복수의 컴포넌트 중 제1의 것 및 복수의 컴포넌트 중 제2의 것: 중 하나에 의해 프로세싱되게 하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the plurality of components comprises a second of the plurality of components configured to provide a first function in circuitry different from the hardware module, wherein the network interface device comprises: and at least one controller configured to cause the packet to be processed by one of: a first of the plurality of components and a second of the plurality of components.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하는 명령어를 발행하도록 구성되는 적어도 하나의 컨트롤러를 포함하되, 명령어는, 복수의 컴포넌트 중 제1의 것으로 하여금, 프로세싱 파이프라인에 삽입되게 하도록 구성된다.In some embodiments, the network interface device includes at least one controller configured to issue a command to cause the hardware module to start performing a first function related to the data packet, the command comprising: 1's are configured to be inserted into the processing pipeline.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하는 명령어를 발행하도록 구성되는 적어도 하나의 컨트롤러를 포함하되, 명령어는, 프로세싱 파이프라인을 통해 전송되는 그리고 복수의 컴포넌트 중 제1의 것으로 하여금, 활성화되게 하도록 구성되는 제어 메시지를 포함한다.In some embodiments, the network interface device includes at least one controller configured to issue an instruction to cause the hardware module to begin performing a first function related to the data packet, the instruction being transmitted through a processing pipeline and a control message transmitted and configured to cause a first of the plurality of components to be activated.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상에 대해, 관련된 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 제1 데이터 패킷의 적어도 하나의 값을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, for one or more of at least some of the plurality of processing units, the related at least one operation comprises at least one of: fetching at least one value of the first data packet from a memory of the network interface device. loading; storing at least one value of the first data packet in a memory of the network interface device; and performing a lookup against the lookup table to determine an action to be performed with respect to the first data packet.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상은 자신의 관련된 적어도 하나의 미리 정의된 동작의 적어도 하나의 결과를 제1 프로세싱 파이프라인에서의 다음 번 프로세싱 유닛으로 전달하도록 구성되되, 다음 번 프로세싱 유닛은 적어도 하나의 결과에 의존하여 다음 번 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, one or more of at least some of the plurality of processing units are configured to communicate at least one result of its associated at least one predefined operation to a next processing unit in the first processing pipeline, wherein: The burn processing unit is configured to perform a next predefined operation depending on the at least one result.

몇몇 실시형태에서, 상이한 미리 정의된 타입의 동작의 각각은 상이한 템플릿에 의해 정의된다.In some embodiments, each of the different predefined types of operations is defined by a different template.

몇몇 실시형태에서, 미리 정의된 타입의 동작은 다음의 것 중 적어도 하나를 포함한다: 데이터 패킷에 액세스하는 것; 하드웨어 모듈의 메모리에 저장되는 룩업 테이블에 액세스하는 것; 데이터 패킷으로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것; 및 룩업 테이블로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것.In some embodiments, the predefined type of action includes at least one of: accessing a data packet; accessing a lookup table stored in the memory of the hardware module; performing logical operations on data loaded from data packets; and performing logical operations on data loaded from the lookup table.

몇몇 실시형태에서, 하드웨어 모듈은 라우팅 하드웨어를 포함하되, 하드웨어 모듈은, 제1 데이터 프로세싱 파이프라인에 의해 정의되는 특정한 순서로 복수의 프로세싱 유닛 사이에서 데이터 패킷을 라우팅하도록 라우팅 하드웨어를 구성하는 것에 의해 제1 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, the hardware module includes routing hardware, wherein the hardware module is configured by configuring the routing hardware to route data packets between the plurality of processing units in a particular order defined by the first data processing pipeline. configurable to interconnect at least some of the plurality of processing units to provide a data processing pipeline.

몇몇 실시형태에서, 하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제2 데이터 프로세싱 파이프라인을 제공하여 제1 기능과는 상이한 제2 기능을 수행하기 위해, 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, a hardware module is configured to provide a second data processing pipeline for processing one or more of the plurality of data packets to perform a second function different from the first function, the plurality of processing units being configured to: configurable to interconnect at least some of the

몇몇 실시형태에서, 하드웨어 모듈은, 제1 데이터 프로세싱 파이프라인을 제공하기 위해 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트한 이후, 제2 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, a hardware module, after interconnecting at least some of the plurality of processing units to provide a first data processing pipeline, is configured to provide at least one of the plurality of processing units to provide a second data processing pipeline. Configurable to interconnect some.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈에 별개이며 상기 복수의 데이터 패킷 중 하나 이상에 대해 제1 기능을 수행하도록 구성되는 추가적인 회로부를 포함한다.In some embodiments, the network interface device includes additional circuitry separate from the hardware module and configured to perform a first function for one or more of the plurality of data packets.

몇몇 실시형태에서, 추가적인 회로부는 다음의 것 중 적어도 하나를 포함한다: 필드 프로그래머블 게이트 어레이; 및 복수의 중앙 프로세싱 유닛.In some embodiments, the additional circuitry includes at least one of: a field programmable gate array; and a plurality of central processing units.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 추가적인 회로부는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스(compilation process) 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 컴파일 프로세스의 완료에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, the network interface device includes at least one controller, wherein the additional circuitry is configured to perform a first function with respect to the data packet during a compilation process for a first function to be performed in the hardware module. and the at least one controller is configured to, in response to completion of the compilation process, control the hardware module to start performing a first function relating to the data packet.

몇몇 실시형태에서, 추가적인 회로부는 복수의 중앙 프로세싱 유닛을 포함한다.In some embodiments, the additional circuitry includes a plurality of central processing units.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 추가적인 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller is configured to, in response to the determining that the compilation process for the first function to be performed in the hardware module is complete, control the additional circuitry to stop performing the first function with respect to the data packet do.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 하드웨어 모듈은, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하도록, 그리고, 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 추가적인 회로부를 제어하도록 구성된다.In some embodiments, the network interface device includes at least one controller, wherein the hardware module is configured to perform a first function with respect to the data packet during a compilation process for the first function to be performed in the additional circuitry, wherein the at least one The controller of the is configured to determine that a compilation process for a first function to be performed in the additional circuitry is complete, and in response to the determination, control the additional circuitry to start performing the first function with respect to the data packet. .

몇몇 실시형태에서, 추가적인 회로부는 필드 프로그래머블 게이트 어레이를 포함한다.In some embodiments, the additional circuitry includes a field programmable gate array.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, the at least one controller is configured to, in response to the determining that the compilation process for the first function to be performed in the additional circuitry is complete, control the hardware module to stop performing the first function with respect to the data packet do.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈에서 수행될 제1 기능을 제공하기 위해 컴파일 프로세스를 수행하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the network interface device includes at least one controller configured to perform a compilation process to provide a first function to be performed in a hardware module.

몇몇 실시형태에서, 컴파일 프로세스는, 하드웨어 모듈에서 제어 메시지에 응답하는 제어 평면 인터페이스를 제공하기 위한 명령어를 제공하는 것을 포함한다.In some embodiments, the compilation process includes providing instructions at the hardware module to provide a control plane interface that is responsive to control messages.

다른 양태에 따르면, 제1 양태에 따른 네트워크 인터페이스 디바이스 및 호스트 디바이스를 포함하는 데이터 프로세싱 시스템이 제공되되, 데이터 프로세싱 시스템은, 하드웨어 모듈에서 수행될 제1 기능을 제공하기 위해 컴파일 프로세스를 수행하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.According to another aspect, there is provided a data processing system comprising a network interface device and a host device according to the first aspect, wherein the data processing system is configured to perform a compilation process to provide a first function to be performed in a hardware module at least one controller.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는 다음의 것 중 하나 이상에 의해 제공된다: 네트워크 인터페이스 디바이스; 및 호스트 디바이스.In some embodiments, the at least one controller is provided by one or more of: a network interface device; and host devices.

몇몇 실시형태에서, 컴파일 프로세스는, 제1 기능을 표현하는 컴퓨터 프로그램이 호스트 디바이스의 커널 모드에서의 실행에 대해 안전하다는 적어도 하나의 컨트롤러에 의한 결정에 응답하여 수행된다.In some embodiments, the compilation process is performed in response to a determination by the at least one controller that the computer program representing the first functionality is safe for execution in kernel mode of the host device.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 프로세싱 유닛 중 적어도 일부의 각각에, 컴퓨터 코드 명령어의 시퀀스에 의해 표현되는 복수의 동작으로부터의 적어도 하나의 동작을, 제1 데이터 프로세싱 파이프라인의 특정한 순서로, 수행할 것을 할당하는 것에 의해 컴파일 프로세스를 수행하도록 구성되되, 복수의 동작은 복수의 데이터 패킷 중 하나 이상과 관련하여 제1 기능을 제공한다.In some embodiments, the at least one controller configures, to each of at least a portion of the plurality of processing units, at least one operation from the plurality of operations represented by a sequence of computer code instructions to be specified in the first data processing pipeline. in order, to perform a compilation process by assigning to perform, the plurality of operations providing a first function with respect to one or more of the plurality of data packets.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는 다음의 것을 하도록 구성된다: 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 추가적인 회로부로 하여금 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.In some embodiments, the at least one controller is configured to: send, prior to completion of the compilation process, a first instruction to cause additional circuitry of the network interface device to perform a first function with respect to the data packet to do; and following completion of the compilation process, sending a second instruction to cause the hardware module to start performing a first function related to the data packet.

다른 양태에 따르면, 네트워크 인터페이스 디바이스에서의 구현을 위한 방법이 제공되는데, 그 방법은: 제1 인터페이스에서, 복수의 데이터 패킷을 수신하는 것; 및 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해, 하드웨어 모듈의 복수의 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 하드웨어 모듈을 구성하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련된다.According to another aspect, there is provided a method for implementation in a network interface device, the method comprising: receiving, at a first interface, a plurality of data packets; and a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function in connection with the one or more of the plurality of data packets; configuring hardware modules to interconnect at least some, wherein each processing unit is associated with a predefined type of operation executable in a single step, wherein at least some of the plurality of processing units are of a different predefined type related to action.

다른 양태에 따르면, 네트워크 인터페이스 디바이스로 하여금 방법을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 방법은: 제1 인터페이스에서, 복수의 데이터 패킷을 수신하는 것; 및 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해, 하드웨어 모듈의 복수의 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 하드웨어 모듈을 구성하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련된다.According to another aspect, there is provided a non-transitory computer-readable medium comprising program instructions for causing a network interface device to perform a method, the method comprising: receiving, at a first interface, a plurality of data packets; and a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function in connection with the one or more of the plurality of data packets; configuring hardware modules to interconnect at least some, wherein each processing unit is associated with a predefined type of operation executable in a single step, wherein at least some of the plurality of processing units are of a different predefined type related to action.

다른 양태에 따르면, 프로세싱 유닛이 제공되는데, 프로세싱 유닛은: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하도록; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결되도록; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결되도록; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하도록; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하도록; 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하도록 구성된다.According to another aspect, there is provided a processing unit, configured to: perform at least one predefined operation with respect to a first data packet received at a network interface device; coupled to a first additional processing unit configured to perform a first additional at least one predefined operation in connection with the first data packet; coupled to a second additional processing unit configured to perform a second additional at least one predefined operation with respect to the first data packet; to receive, from the first additional processing unit, a result of the first additional at least one predefined operation; to perform the at least one predefined action depending on a result of the first additional at least one predefined action; and send a result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

몇몇 실시형태에서, 프로세싱 유닛은 적어도 하나의 미리 정의된 동작의 타이밍을 맞추기 위한 클록 신호를 수신하도록 구성되되, 프로세싱 유닛은 클록 신호의 적어도 하나의 사이클에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, the processing unit is configured to receive a clock signal for timing the at least one predefined operation, wherein the processing unit is configured to perform the at least one predefined operation in at least one cycle of the clock signal. do.

몇몇 실시형태에서, 프로세싱 유닛은 클록 신호의 단일의 사이클에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, the processing unit is configured to perform at least one predefined operation in a single cycle of the clock signal.

몇몇 실시형태에서, 적어도 하나의 미리 정의된 동작, 제1 추가적인 적어도 하나의 미리 정의된 동작, 및 제2 추가적인 적어도 하나의 미리 정의된 동작은, 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 수행되는 기능의 일부를 형성한다.In some embodiments, the at least one predefined operation, the first additional at least one predefined operation, and the second additional at least one predefined operation are configured in connection with the first data packet received at the network interface device. Forms part of the function performed.

몇몇 실시형태에서, 제1 데이터 패킷은 호스트 디바이스로부터 수신되되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하도록 구성된다.In some embodiments, the first data packet is received from a host device, wherein the network interface device is configured to interface the host device to a network.

몇몇 실시형태에서, 제1 데이터 패킷은 네트워크로부터 수신되되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하도록 구성된다.In some embodiments, the first data packet is received from a network, wherein the network interface device is configured to interface the host device to the network.

몇몇 실시형태에서, 기능은 필터링 기능이다.In some embodiments, the function is a filtering function.

몇몇 실시형태에서, 필터링 기능은 확장된 버클리 패킷 필터 기능이다.In some embodiments, the filtering function is an extended Berkeley packet filter function.

몇몇 실시형태에서, 프로세싱 유닛은 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 주문형 집적 회로를 포함한다.In some embodiments, the processing unit includes an application specific integrated circuit configured to perform at least one predefined operation.

몇몇 실시형태에서, 프로세싱 유닛은 다음의 것을 포함한다: 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 디지털 회로; 및 실행되는 적어도 하나의 미리 정의된 동작에 관련되는 상태를 저장하는 메모리.In some embodiments, the processing unit comprises: a digital circuit configured to perform at least one predefined operation; and a memory storing state related to at least one predefined operation to be executed.

몇몇 실시형태에서, 프로세싱 유닛은 제1 추가적인 프로세싱 유닛 및 제2 추가적인 프로세싱 유닛이 액세스 가능한 메모리에 액세스하도록 구성되되, 메모리는 제1 데이터 패킷과 관련되는 상태를 저장하도록 구성되고, 적어도 하나의 미리 정의된 동작은 메모리에 저장되는 상태를 수정하는 것을 포함한다.In some embodiments, the processing unit is configured to access a memory accessible to the first additional processing unit and the second additional processing unit, the memory configured to store a state associated with the first data packet, the at least one predefined The performed operation involves modifying the state stored in memory.

몇몇 실시형태에서, 프로세싱 유닛은, 제1 클록 사이클 동안, 메모리로부터 상기 상태의 값을 판독하도록 그리고 제2 추가적인 프로세싱 유닛에 의한 수정을 위해 상기 값을 제2 추가적인 프로세싱 유닛에 제공하도록 구성되되, 프로세싱 유닛은 제1 클록 사이클에 이어지는 제2 클록 사이클 동안 스톨하도록 구성된다.In some embodiments, the processing unit is configured to, during a first clock cycle, read the value of the state from memory and provide the value to a second additional processing unit for modification by the second additional processing unit, wherein the processing unit is configured to: The unit is configured to stall for a second clock cycle following the first clock cycle.

몇몇 실시형태에서, 적어도 하나의 미리 정의된 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷을 로딩하는 것; 제1 데이터 패킷을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, the at least one predefined action includes at least one of: loading a first data packet from a memory of the network interface device; storing the first data packet in a memory of the network interface device; and performing a lookup against the lookup table to determine an action to be performed with respect to the first data packet.

다른 양태에 따르면, 프로세싱 유닛에서 구현되는 방법이 제공되는데, 그 방법은 다음의 것을 포함한다: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결하는 것; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결하는 것; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하는 것; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 및 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하는 것.According to another aspect, there is provided a method implemented in a processing unit, the method comprising: performing at least one predefined operation with respect to a first data packet received at a network interface device; coupling to a first additional processing unit configured to perform a first additional at least one predefined operation in connection with the first data packet; coupling to a second additional processing unit configured to perform a second additional at least one predefined operation with respect to the first data packet; receiving, from the first additional processing unit, a result of the first additional at least one predefined operation; performing the at least one predefined action depending on a result of the first additional at least one predefined action; and sending the result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

다른 양태에 따르면, 프로세싱 유닛에 의해 실행될 때, 프로세싱 유닛으로 하여금 다음의 것을 포함하는 방법을 수행하게 하는 명령어를 저장하는 컴퓨터 판독 가능 비일시적 스토리지 디바이스가 제공된다: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결하는 것; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결하는 것; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하는 것; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 및 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하는 것.According to another aspect, there is provided a computer readable non-transitory storage device that stores instructions that, when executed by a processing unit, cause the processing unit to perform a method comprising: first data received at a network interface device performing at least one predefined action with respect to the packet; coupling to a first additional processing unit configured to perform a first additional at least one predefined operation in connection with the first data packet; coupling to a second additional processing unit configured to perform a second additional at least one predefined operation with respect to the first data packet; receiving, from the first additional processing unit, a result of the first additional at least one predefined operation; performing the at least one predefined action depending on a result of the first additional at least one predefined action; and sending the result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

다른 양태에 따르면, 호스트 디바이스를 네트워크에 인터페이싱하기 위한 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 적어도 하나의 컨트롤러; 제1 인터페이스 - 제1 인터페이스는 데이터 패킷을 수신하도록 구성됨 - ; 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되는 제1 회로부; 및 제2 회로부를 포함하되, 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었는지를 결정하도록 그리고, 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.According to another aspect, there is provided a network interface device for interfacing a host device to a network, the network interface device comprising: at least one controller; a first interface, wherein the first interface is configured to receive a data packet; a first circuitry configured to perform a first function with respect to a data packet received at the first interface; and a second circuitry, wherein the first circuitry is configured to perform a first function with respect to a data packet received at the first interface during a compilation process for the first function to be performed in the second circuitry, the at least one The controller is configured to determine whether a compilation process for a first function to be performed in the second circuitry has been completed and, in response to the determination, to initiate performance of the first function relating to a data packet received at the first interface. is configured to control

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller, in response to the determining that the compilation process for the first function to be performed in the second circuitry is complete, stops performing the first function in connection with the data packet received at the first interface It is configured to control the first circuit unit to do so.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여: 제1 인터페이스에서 수신되는 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록; 그리고 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller is configured to: in response to the determining that the compilation process for the first function to be performed in the second circuitry is complete: a first associated with a data packet of a first data flow received at the first interface to begin performing the function; and control the first circuit unit to stop performing a first function related to the data packet of the first data flow.

몇몇 실시형태에서, 제1 회로부는 적어도 하나의 중앙 프로세싱 유닛을 포함하되, 적어도 하나의 중앙 프로세싱 유닛의 각각은 제1 인터페이스에서 수신되는 적어도 하나의 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuitry includes at least one central processing unit, wherein each of the at least one central processing unit is configured to perform a first function in connection with at least one data packet received at the first interface. .

몇몇 실시형태에서, 제2 회로부는 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 구성되는 필드 프로그래머블 게이트 어레이를 포함한다.In some embodiments, the second circuitry includes a field programmable gate array configured to initiate performance of a first function associated with a data packet received at the first interface.

몇몇 실시형태에서, 제2 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스는 제1 데이터 패킷을 수신하도록 구성되고, 하드웨어 모듈은, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스에 후속하여, 복수의 프로세싱 유닛 중 적어도 일부로 하여금, 제1 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위해 그들의 관련된 적어도 하나의 미리 정의된 동작을 특정한 순서로 수행하게 하도록 구성된다.In some embodiments, the second circuitry includes a hardware module comprising a plurality of processing units, each processing unit associated with at least one predefined operation, the first interface configured to receive the first data packet and the hardware modules, following a compilation process for the first function to be performed in the second circuitry, cause at least some of the plurality of processing units to perform at least their associated at least one function in connection with the first data packet. It is configured to perform one predefined action in a specific order.

몇몇 실시형태에서, 제1 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스는 제1 데이터 패킷을 수신하도록 구성되고, 하드웨어 모듈은, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안, 복수의 프로세싱 유닛 중 적어도 일부로 하여금, 제1 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위해 그들의 관련된 적어도 하나의 미리 정의된 동작을 특정한 순서로 수행하게 하도록 구성된다.In some embodiments, the first circuitry includes a hardware module comprising a plurality of processing units, each processing unit associated with at least one predefined operation, the first interface configured to receive a first data packet and the hardware module, during a compilation process for a first function to be performed in the second circuitry, causes at least a portion of the plurality of processing units to perform a first function in connection with the first data packet. It is configured to perform predefined actions in a specific order.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 컴파일 프로세스를 수행하도록 구성된다.In some embodiments, the at least one controller is configured to perform a compilation process for compiling a first function to be performed by the second circuitry.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는: 컴파일 프로세스의 완료 이전에, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행할 것을 제1 회로부에 지시하도록 구성된다.In some embodiments, the at least one controller is configured to: instruct the first circuitry to perform a first function with respect to a data packet received at the first interface, prior to completion of the compilation process.

몇몇 실시형태에서, 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 컴파일 프로세스는 호스트 디바이스에 의해 수행되되, 적어도 하나의 컨트롤러는, 호스트 디바이스로부터 컴파일 프로세스의 완료의 표시를 수신하는 것에 응답하여 컴파일 프로세스가 완료되었다는 것을 결정하도록 구성된다.In some embodiments, a compilation process for compiling the first function to be performed by the second circuitry is performed by a host device, wherein the at least one controller is configured to: in response to receiving an indication of completion of the compilation process from the host device and determine that the compilation process is complete.

몇몇 실시형태에서, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인을 포함하되, 프로세싱 파이프라인은, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 복수의 기능 중 하나를 수행하도록 각각 구성되는 복수의 컴포넌트를 포함하고, 복수의 컴포넌트 중 제1의 것은 제1 회로부에 의해 제공될 때 제1 기능을 제공하도록 구성되고, 복수의 컴포넌트 중 제2의 것은 제2의 적어도 하나의 프로세싱 유닛에 의해 제공될 때 제1 기능을 제공하도록 구성된다.In some embodiments, a processing pipeline for processing a data packet received at a first interface, wherein the processing pipeline is each configured to perform one of a plurality of functions with respect to a data packet received at the first interface a plurality of components comprising: a first of the plurality of components configured to provide a first function when provided by the first circuitry, and a second of the plurality of components to a second at least one processing unit. and provide the first function when provided by

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 컴포넌트 중 제2의 것을 프로세싱 파이프라인에 삽입하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller controls the second circuitry to begin performing a first function relating to a data packet received at the first interface by inserting a second of the plurality of components into the processing pipeline. configured to do

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 프로세싱 파이프라인으로부터 복수의 컴포넌트 중 제1의 것을 제거하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller, in response to the determining that the compilation process for the first function to be performed in the second circuitry is complete, by removing the first one of the plurality of components from the processing pipeline and control the first circuit unit to stop performing a first function related to a data packet received on the first interface.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 제어 메시지를 전송하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller performs performance of a first function relating to a data packet received at the first interface by sending a control message through the processing pipeline to activate a second one of the plurality of components. and control the second circuitry to start.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 복수의 컴포넌트 중 제2의 것을 비활성화하기 위해 프로세싱 파이프라인을 통해 제어 메시지를 전송하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, the at least one controller controls via the processing pipeline to deactivate a second of the plurality of components in response to the determining that the compilation process for the first function to be performed in the second circuitry is complete. and control the first circuitry to stop performing a first function relating to a data packet received at the first interface by sending the message.

몇몇 실시형태에서, 복수의 컴포넌트 중 제1의 것은, 프로세싱 파이프라인을 통과하는 제1 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성되되, 복수의 컴포넌트 중 제2의 것은 프로세싱 파이프라인을 통과하는 제2 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성된다.In some embodiments, a first of the plurality of components is configured to provide a first function with respect to a data packet of a first data flow passing through a processing pipeline, wherein a second of the plurality of components is configured to provide a processing pipeline and provide a first function with respect to a data packet in a second data flow passing therethrough.

몇몇 실시형태에서, 제1 기능은 데이터 패킷을 필터링하는 것을 포함한다.In some embodiments, the first function comprises filtering the data packet.

몇몇 실시형태에서, 제1 인터페이스는 네트워크로부터 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a data packet from the network.

몇몇 실시형태에서, 제1 인터페이스는 호스트 디바이스로부터 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a data packet from a host device.

몇몇 실시형태에서, 제2 회로부에 대한 제1 기능의 컴파일 시간은 제1 회로부에 대한 제1 기능의 컴파일 시간보다 더 크다.In some embodiments, the compile time of the first function for the second circuitry is greater than the compile time of the first function for the first circuitry.

다른 양태에 따르면, 방법이 제공되는데, 그 방법은: 네트워크 인터페이스 디바이스의 제1 인터페이스에서 데이터 패킷을 수신하는 것; 네트워크 인터페이스 디바이스의 제1 회로부에서, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하는 것을 포함하되; 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 방법은 다음의 것을 포함한다: 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하는 것; 및 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 네트워크 인터페이스 디바이스의 제2 회로부를 제어하는 것.According to another aspect, a method is provided, the method comprising: receiving a data packet at a first interface of a network interface device; performing, in a first circuit portion of the network interface device, a first function with respect to a data packet received at the first interface; The first circuitry is configured to perform a first function with respect to a data packet received at the first interface during a compilation process for a first function to be performed in the second circuitry, the method comprising: a second circuitry determining that the compilation process for the first function to be performed in is completed; and in response to the determination, controlling the second circuitry of the network interface device to begin performing a first function relating to a data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 방법을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 그 방법은: 네트워크 인터페이스 디바이스의 제1 인터페이스에서 데이터 패킷을 수신하는 것; 네트워크 인터페이스 디바이스의 제1 회로부에서, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하는 것을 포함하되, 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 방법은 다음의 것을 포함한다: 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하는 것; 및 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 네트워크 인터페이스 디바이스의 제2 회로부를 제어하는 것.According to another aspect, there is provided a non-transitory computer-readable medium comprising program instructions for causing a data processing system to perform a method, the method comprising: receiving a data packet at a first interface of a network interface device; ; performing, in a first circuitry of the network interface device, a first function with respect to a data packet received at the first interface, wherein the first circuitry comprises: performing a first function during a compilation process for a first function to be performed in a second circuitry; configured to perform a first function with respect to a data packet received at the first interface, the method comprising: determining that a compilation process for the first function to be performed in the second circuitry is completed; and in response to the determination, controlling the second circuitry of the network interface device to begin performing a first function relating to a data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공된다: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위해 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 회로부로 하여금, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.According to another aspect, there is provided a non-transitory computer-readable medium comprising program instructions for causing a data processing system to perform the following: Compiling a first function to be performed by second circuitry of a network interface device to perform the compilation process for; prior to completion of the compilation process, sending a first instruction to cause the first circuitry of the network interface device to perform a first function with respect to a data packet received at the first interface of the network interface device; and following completion of the compilation process, sending a second instruction to cause the second circuitry to begin performing a first function relating to a data packet received at the first interface.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 추가적인 컴파일 프로세스를 수행하게 하여 제1 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 프로그램 명령어를 포함하되, 컴파일 프로세스에 대해 소요되는 시간은 추가적인 컴파일 프로세스에 대해 소요되는 시간보다 더 길다.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to perform an additional compilation process to compile a first function to be performed by first circuitry, wherein: The time taken is longer than the time taken for the additional compilation process.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 호스트 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, the data processing system includes a host device, wherein the network interface device is configured to interface the host device with a network.

몇몇 실시형태에서, 시스템을 포함하는 데이터는 네트워크 인터페이스 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, the data comprising the system comprises a network interface device, wherein the network interface device is configured to interface the host device with the network.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 호스트 디바이스 및 네트워크 인터페이스 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, a data processing system includes a host device and a network interface device, wherein the network interface device is configured to interface the host device with a network.

몇몇 실시형태에서, 제1 기능은 네트워크로부터 제1 인터페이스에서 수신되는 데이터 패킷을 필터링하는 것을 포함한다.In some embodiments, the first function includes filtering data packets received at the first interface from the network.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것.In some embodiments, the non-transitory computer-readable medium includes program instructions for causing a data processing system to perform: Following completion of a compilation process, causing the first circuitry to: receive at a first interface Transmitting a third command to stop performing a function related to the data packet being received.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 제2 회로부로 하여금, 제1 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 명령어를 전송하는 것; 및 제1 회로부로 하여금, 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 중지하게 하기 위한 명령어를 전송하는 것.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to perform: cause a second circuitry to cause a first function with respect to a data packet in a first data flow. sending a command to perform and sending an instruction to cause the first circuitry to stop performing a first function with respect to the data packet in the first data flow.

몇몇 실시형태에서, 제1 회로부는 적어도 하나의 중앙 프로세싱 유닛을 포함하되, 제2 컴파일 프로세스의 완료 이전에, 적어도 하나의 중앙 프로세싱 유닛의 각각은 제1 인터페이스에서 수신되는 적어도 하나의 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuitry includes at least one central processing unit, wherein prior to completion of the second compilation process, each of the at least one central processing unit is associated with at least one data packet received at the first interface. to perform the first function.

몇몇 실시형태에서, 제2 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스에서 수신되는 데이터 패킷은 제1 데이터 패킷을 포함하고, 하드웨어 모듈은, 제2 컴파일 프로세스의 완료에 후속하여, 복수의 프로세싱 유닛 중 적어도 일부의 각각의 프로세싱 유닛이 제1 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하는 것에 의해 제1 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the second circuitry includes a hardware module comprising a plurality of processing units, each processing unit associated with at least one predefined operation, wherein the data packet received at the first interface includes the first data packet, wherein the hardware module is configured such that, subsequent to completion of the second compilation process, each processing unit of at least some of the plurality of processing units performs its respective at least one operation with respect to the first data packet. and perform a first function with respect to the first data packet by

몇몇 실시형태에서, 제1 회로부는, 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성되는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스에서 수신되는 데이터 패킷은 제1 데이터 패킷을 포함하고, 하드웨어 모듈은, 제2 컴파일 프로세스의 완료 이전에, 복수의 프로세싱 유닛 중 적어도 일부의 각각의 프로세싱 유닛이 제1 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하는 것에 의해 제1 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuitry includes a hardware module comprising a plurality of processing units configured to provide a first function in connection with a data packet, each processing unit being associated with at least one predefined operation. wherein the data packet received at the first interface includes the first data packet, and the hardware module is configured to: prior to completion of the second compilation process, each processing unit of at least some of the plurality of processing units with the first data packet and perform a first function in association with the first data packet by performing its respective at least one operation in association.

몇몇 실시형태에서, 컴파일 프로세스는, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련된 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 제2 회로부의 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함한다.In some embodiments, the compilation process comprises assigning, in a particular order, to each of a plurality of processing units of the second circuitry to perform at least one operation associated with one of a plurality of processing stages of a sequence of computer code instructions. include

몇몇 실시형태에서, 제1 회로부에 의해 제공되는 제1 기능은, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인의 컴포넌트로서 제공되되, 제2 회로부에 의해 제공되는 제1 기능은 프로세싱 파이프라인의 컴포넌트로서 제공된다.In some embodiments, a first function provided by the first circuitry is provided as a component of a processing pipeline for processing a data packet received at a first interface, wherein the first function provided by the second circuitry is processing It is provided as a component of the pipeline.

몇몇 실시형태에서, 제1 명령어는 복수의 컴포넌트 중 제1의 것으로 하여금 프로세싱 파이프라인에 삽입되게 하도록 구성되는 명령어를 포함한다.In some embodiments, the first instruction comprises an instruction configured to cause a first one of the plurality of components to be inserted into the processing pipeline.

몇몇 실시형태에서, 제2 명령어는 복수의 컴포넌트 중 제2의 것으로 하여금 프로세싱 파이프라인에 삽입되게 하도록 구성되는 명령어를 포함한다.In some embodiments, the second instruction comprises an instruction configured to cause a second one of the plurality of components to be inserted into the processing pipeline.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것을 수행하게 하기 위한 프로그램 명령어를 포함하되, 제3 명령어는 복수의 컴포넌트 중 제1의 것으로 하여금 프로세싱 파이프라인으로부터 제거되게 하도록 구성되는 명령어를 포함한다.In some embodiments, the non-transitory computer readable medium causes the data processing system to: subsequent to completion of the compilation process, cause the first circuitry to stop performing a first function relating to a data packet received at the first interface. program instructions to cause to perform sending a third instruction to do so, wherein the third instruction comprises instructions configured to cause a first one of the plurality of components to be removed from the processing pipeline.

몇몇 실시형태에서, 제1 명령어는 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 송신될 제어 메시지를 포함한다.In some embodiments, the first instruction includes a control message to be transmitted via the processing pipeline to activate a second one of the plurality of components.

몇몇 실시형태에서, 제2 명령어는 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 송신될 제어 메시지를 포함한다.In some embodiments, the second instruction includes a control message to be transmitted via the processing pipeline to activate a second of the plurality of components.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것을 수행하게 하기 위한 프로그램 명령어를 포함하는데, 제3 명령어는 복수의 컴포넌트 중 제1의 것을 비활성화하기 위한 프로세싱 파이프라인을 통과하는 제어 메시지를 포함한다.In some embodiments, the non-transitory computer-readable medium is configured to cause the data processing system to:, subsequent to completion of a compilation process, cause the first circuitry to cease performing a function relating to a data packet received at the first interface. program instructions to cause sending a third instruction for: a third instruction comprising a control message passing through a processing pipeline to deactivate a first one of the plurality of components.

다른 양태에 따르면, 적어도 하나의 프로세서 및 컴퓨터 프로그램 코드를 포함하는 적어도 하나의 메모리를 포함하는 데이터 프로세싱 시스템이 제공되는데, 적어도 하나의 메모리 및 컴퓨터 프로그램 코드는, 적어도 하나의 프로세서와 함께, 데이터 프로세싱 시스템으로 하여금: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스를 수행하게 하도록; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 기능을 수행할 것을 네트워크 인터페이스 디바이스의 제1 회로부에 지시하게 하도록; 그리고 제2 컴파일 프로세스의 완료에 후속하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작할 것을 제2의 적어도 하나의 프로세싱 유닛에 지시하게 하도록 구성된다.According to another aspect, there is provided a data processing system comprising at least one processor and at least one memory comprising computer program code, wherein the at least one memory and computer program code, together with the at least one processor, comprise a data processing system to: perform a compilation process to compile a function to be performed by the second circuitry of the network interface device; instruct the first circuitry of the network interface device to perform a function with respect to a data packet received at the first interface of the network interface device prior to completion of the compilation process; and instruct the second at least one processing unit to start performing a function related to the data packet received at the first interface following completion of the second compilation process.

다른 양태에 따르면, 데이터 프로세싱 시스템에서의 구현을 위한 방법이 제공되는데, 그 방법은 다음의 것을 포함한다: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 회로부로 하여금, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.According to another aspect, there is provided a method for implementation in a data processing system, the method comprising: performing a compilation process to compile a function to be performed by second circuitry of a network interface device; prior to completion of the compilation process, sending a first instruction to cause the first circuitry of the network interface device to perform a function with respect to a data packet received at the first interface of the network interface device; and following completion of the compilation process, sending a second instruction to cause the second circuitry to begin performing a function relating to the data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지의 각각과 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되되, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.According to another aspect, a method for causing a data processing system to assign to each of a plurality of processing units to perform, in a particular order, at least one operation associated with each of a plurality of processing stages of a sequence of computer code instructions A non-transitory computer-readable medium comprising program instructions is provided, wherein a plurality of processing stages provide a first function in connection with a first data packet received at a first interface of a network interface device, each of the plurality of processing units is configured to perform one of a plurality of types of processing, at least some of the plurality of processing units are configured to perform a different type of processing, and, for each of the plurality of processing units, allocating, is performed in accordance with determining that it is configured to perform a suitable type of processing to perform the at least one operation.

몇몇 실시형태에서, 프로세싱의 타입의 각각은 복수의 템플릿 중 하나에 의해 정의된다.In some embodiments, each type of processing is defined by one of a plurality of templates.

몇몇 실시형태에서, 프로세싱의 타입은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷에 액세스하는 것; 하드웨어 모듈의 메모리에 저장되는 룩업 테이블에 액세스하는 것; 데이터 패킷으로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것; 및 룩업 테이블로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것.In some embodiments, the type of processing includes at least one of: accessing a data packet received at the network interface device; accessing a lookup table stored in the memory of the hardware module; performing logical operations on data loaded from data packets; and performing logical operations on data loaded from the lookup table.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 하드웨어 모듈의 공통 클록 신호에 따라 그들의 관련된 적어도 하나의 동작을 수행하도록 구성된다.In some embodiments, two or more of at least some of the plurality of processing units are configured to perform their associated at least one operation according to a common clock signal of the hardware module.

몇몇 실시형태에서, 할당하는 것은, 복수의 프로세싱 유닛 중 적어도 일부의 두 개 이상의 각각에, 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에 자신의 관련된 적어도 하나의 동작을 수행할 것을 할당하는 것을 포함한다.In some embodiments, assigning comprises assigning to each of two or more of at least some of the plurality of processing units to perform its associated at least one operation within a predefined length of time defined by a clock signal. include

몇몇 실시형태에서, 할당하는 것은, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상에, 미리 정의된 길이의 시간의 시간 기간 내에 제1 데이터 패킷에 액세스할 것을 할당하는 것을 포함한다.In some embodiments, allocating comprises allocating to two or more of the at least some of the plurality of processing units to access the first data packet within a time period of a predefined length of time.

몇몇 실시형태에서, 할당하는 것은, 미리 정의된 길이의 시간의 시간 기간의 종료에 응답하여, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상의 각각에, 각각의 적어도 하나의 동작의 결과를 다음 번 프로세싱 유닛으로 전송할 것을 할당하는 것을 포함한다.In some embodiments, assigning, in response to an end of a time period of a predefined length of time, to each of two or more of the at least some of the plurality of processing units, to each of the at least one Includes allocating what to send to the unit

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 복수의 스테이지 중 적어도 일부에 단일의 클록 사이클을 차지할 것을 할당하는 것.In some embodiments, the non-transitory computer-readable medium contains program instructions for causing a data processing system to perform the following: Allocating to at least some of the plurality of stages to occupy a single clock cycle.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 복수의 프로세싱 유닛 중 두 개 이상에, 병렬로 실행되도록 그들의 할당된 적어도 하나의 동작을 실행할 것을 할당하게 하기 위한 프로그램 명령어를 포함한다.In some embodiments, the non-transitory computer-readable medium contains program instructions for causing a data processing system to assign to two or more of a plurality of processing units to execute at least one of their assigned operations to be executed in parallel. include

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함한다.In some embodiments, the network interface device includes a hardware module that includes a plurality of processing units.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다: 할당을 포함하는 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 복수의 프로세싱 유닛으로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.In some embodiments, a non-transitory computer-readable medium includes computer program instructions for causing a data processing system to: perform a compilation process comprising assignment; prior to completion of the compilation process, sending a first instruction to cause circuitry of the network interface device to perform a first function with respect to a data packet received at the first interface; and, subsequent to completion of the compilation process, sending a second instruction to cause the plurality of processing units to begin performing a first function relating to a data packet received at the first interface.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상에 대해, 할당된 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 제1 데이터 패킷의 적어도 하나의 값을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, for one or more of at least some of the plurality of processing units, the assigned at least one operation comprises at least one of: at least one value of the first data packet from a memory of the network interface device loading ; storing at least one value of the first data packet in a memory of the network interface device; and performing a lookup against the lookup table to determine an action to be performed with respect to the first data packet.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 제1 데이터 패킷과 관련하여 제1 기능을 수행하기 위해, 데이터 프로세싱 시스템으로 하여금, 특정한 순서로 복수의 프로세싱 유닛 사이에서 제1 데이터 패킷을 라우팅하도록 네트워크 인터페이스 디바이스의 라우팅 하드웨어를 구성하기 위한 명령어를 발행하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, the non-transitory computer-readable medium is configured to cause the data processing system to route the first data packet between the plurality of processing units in a particular order to perform a first function in connection with the first data packet. and computer program instructions for causing the computer to issue instructions for configuring routing hardware of the network interface device.

몇몇 실시형태에서, 복수의 프로세싱 유닛에 의해 제공되는 제1 기능은, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인의 컴포넌트로서 제공된다.In some embodiments, the first functionality provided by the plurality of processing units is provided as a component of a processing pipeline for processing a data packet received at the first interface.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 컴포넌트로 하여금 프로세싱 파이프라인으로 삽입되게 하기 위한 명령어를 데이터 프로세싱 시스템으로 하여금 발행하게 하는 것에 의해, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능의 수행을 복수의 프로세싱 유닛으로 하여금 시작하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, the non-transitory computer readable medium is configured to cause a first data packet to be received in connection with a data packet received at the first interface by causing the data processing system to issue instructions for causing a component to be inserted into a processing pipeline. and computer program instructions for causing a plurality of processing units to initiate performance of a function.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 컴포넌트로 하여금 프로세싱 파이프라인에서 활성화되게 하기 위한 명령어를 데이터 프로세싱 시스템으로 하여금 발행하게 하는 것에 의해, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능의 수행을 복수의 프로세싱 유닛으로 하여금 시작하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, the non-transitory computer-readable medium is configured to cause a first data packet to be received at a first interface by causing the data processing system to issue instructions to cause a component to be activated in a processing pipeline. and computer program instructions for causing a plurality of processing units to initiate performance of a function.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 네트워크 인터페이스 디바이스를 포함한다.In some embodiments, the data processing system includes a network interface device.

몇몇 실시형태에서, 데이터 프로세싱 시스템은: 네트워크 인터페이스 디바이스; 및 호스트 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, a data processing system includes: a network interface device; and a host device, wherein the network interface device is configured to interface the host device with the network.

다른 양태에 따르면, 적어도 하나의 프로세서 및 컴퓨터 프로그램 코드를 포함하는 적어도 하나의 메모리를 포함하는 데이터 프로세싱 시스템이 제공되는데, 적어도 하나의 메모리 및 컴퓨터 프로그램 코드는, 적어도 하나의 프로세서와 함께, 데이터 프로세싱 시스템으로 하여금, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하게 하도록 구성되고, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.According to another aspect, there is provided a data processing system comprising at least one processor and at least one memory comprising computer program code, wherein the at least one memory and computer program code, together with the at least one processor, comprise a data processing system assign to each of the plurality of processing units to perform, in a particular order, at least one operation associated with one of the plurality of processing stages of the sequence of computer code instructions, the plurality of processing stages comprising: provide a first function in connection with a first data packet received at a first interface of the interface device, each of the plurality of processing units configured to perform one of a plurality of types of processing, wherein at least a portion of the plurality of processing units is configured to perform a different type of processing, and, for each of the plurality of processing units, the assigning includes determining that the processing unit is configured to perform a type of processing suitable for performing each at least one operation. performed depending on

다른 양태에 따르면, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함하는 방법이 제공되는데, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.According to another aspect, there is provided a method comprising assigning, in a particular order, to each of a plurality of processing units to perform at least one operation related to one of a plurality of processing stages of a sequence of computer code instructions, The plurality of processing stages provides a first function in connection with a first data packet received at a first interface of the network interface device, each of the plurality of processing units being configured to perform one of a plurality of types of processing; wherein at least some of the processing units of are configured to perform different types of processing, and for each of the plurality of processing units, assigning is such that the processing unit performs a type of processing suitable for performing each at least one operation. It is performed depending on determining what constitutes.

하드웨어 모듈의 프로세싱 유닛은 단일의 단계에서 그들의 동작의 타입을 실행하는 것으로서 설명되었다. 그러나, 기술 분야의 숙련된 자는, 이 피쳐가 단지 바람직한 피쳐일 뿐이며 본 발명의 기능에 필수적이거나 또는 필수 불가결한 것은 아니다는 것을 인식할 것이다.The processing units of hardware modules have been described as executing their type of operation in a single step. However, those skilled in the art will recognize that this feature is only a preferred feature and is not essential or essential to the functioning of the present invention.

한 양태에 따르면, 다음의 것을 포함하는 방법이 제공된다: 컴파일러에서, 회로의 비트 파일 디스크립션(bit file description) - 상기 비트 파일 디스크립션은 회로의 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하는 것; 및 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하는 것.According to one aspect, there is provided a method comprising: at a compiler, a bit file description of a circuit, the bit file description comprising a description of routing of a portion of a circuit, and receiving a program that; and compiling the program using the bit file description to output a bit file for the program.

방법은 상기 프로그램과 관련되는 기능을 수행하도록 상기 회로의 상기 일부의 적어도 일부를 구성하기 위해 상기 비트 파일을 사용하는 것을 포함할 수도 있다.The method may include using the bit file to configure at least a portion of the portion of the circuitry to perform a function associated with the program.

비트 파일 디스크립션은 회로의 상기 일부의 복수의 프로세싱 유닛 사이의 라우팅에 관한 정보를 포함할 수도 있다.The bit file description may include information regarding routing between a plurality of processing units of said portion of a circuit.

비트 파일 디스크립션은, 상기 복수의 프로세싱 유닛 중 적어도 하나에 대한, 다음의 것 중 적어도 하나를 나타내는 라우팅 정보를 포함할 수도 있다: 데이터가 어떤 하나 이상의 다른 프로세싱 유닛으로 출력될 수 있는지; 및 데이터가 어떤 하나 이상의 다른 프로세싱 유닛으로부터 수신될 수 있는지.The bit file description may include routing information for at least one of the plurality of processing units indicating at least one of the following: to which one or more other processing units data may be output; and from which one or more other processing units the data may be received.

비트 파일 디스크립션은 두 개 이상의 각각의 프로세싱 유닛 사이의 하나 이상의 루트(route)를 나타내는 라우팅 정보를 포함할 수도 있다.The bit file description may include routing information indicating one or more routes between two or more respective processing units.

비트 파일 디스크립션은, 프로그램에 대한 비트 파일을 제공하기 위해 프로그램을 컴파일할 때 컴파일러에 의해 사용 가능한 루트만을 나타내는 정보를 포함할 수도 있다.The bit file description may include information indicating only the root available by the compiler when compiling the program to provide the bit file for the program.

비트 파일은, 각각의 프로세싱 유닛에 대한, 다음의 것 중 적어도 하나를 나타내는 정보를 포함할 수도 있다: 입력이, 상기 하나 이상의 다른 프로세싱 유닛 중 어떤 하나 이상으로부터, 각각의 프로세싱 유닛에 대한 비트 파일 디스크립션에서 제공되어야 하는지; 출력이, 상기 하나 이상의 다른 프로세싱 유닛 중 어떤 하나 이상으로, 각각의 프로세싱 유닛에 대한 비트 파일 디스크립션에서 제공되어야 하는지.The bit file may include information indicating, for each processing unit, at least one of the following: a bit file description for each processing unit whose input is from any one or more of the one or more other processing units. should be provided from; Output should be provided in the bit file description for each processing unit to which one or more of the one or more other processing units.

회로의 일부는 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부를 포함할 수도 있되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 비트 파일 디스크립션은 복수의 프로세싱 유닛 중 적어도 일부 사이의 라우팅에 관한 정보를 포함하고, 상기 방법은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 하드웨어로 하여금 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하게 하도록 상기 비트 파일을 사용하는 것을 포함할 수도 있다.A portion of the circuit may include at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step, the plurality of processing units at least some of which relate to different predefined types of operation, the bit file description includes information regarding routing between at least some of a plurality of processing units, the method comprising: the bit file to provide a first data processing pipeline for processing to cause hardware to interconnect at least some of the plurality of processing units to perform a first function in connection with the one or more of the plurality of data packets may include the use of

비트 파일 디스크립션은 FPGA의 적어도 일부의 것일 수도 있다.The bit file description may be of at least a portion of the FPGA.

비트 파일 디스크립션은 동적으로 프로그래밍 가능한 FPGA의 일부의 것일 수도 있다.The bit file description may be part of a dynamically programmable FPGA.

프로그램은 eBPF 프로그램 및 P4 프로그램 중 하나를 포함할 수도 있다.The program may include one of an eBPF program and a P4 program.

컴파일러 및 FPGA는 네트워크 인터페이스 디바이스에서 제공될 수도 있다.The compiler and FPGA may be provided in a network interface device.

다른 양태에 따르면, 적어도 하나의 프로세서 및 하나 이상의 프로그램에 대한 컴퓨터 코드를 포함하는 적어도 하나의 메모리를 포함하는 장치가 제공되는데, 적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금 적어도: 비트 파일 디스크립션 - 상기 비트 파일 디스크립션은 회로의 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하게 하도록; 그리고 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하도록 구성된다.According to another aspect, there is provided an apparatus comprising at least one processor and at least one memory comprising computer code for one or more programs, the at least one memory and computer code, together with the at least one processor, into the apparatus to receive at least: a bit file description, the bit file description including a description of routing of a portion of a circuit, and a program; and compile the program using the bit file description to output a bit file for the program.

적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금, 상기 프로그램과 관련되는 기능을 수행하도록 상기 회로의 상기 일부의 적어도 일부를 구성하기 위해 상기 비트 파일을 사용하게 하도록 구성될 수도 있다.At least one memory and computer code, together with at least one processor, may be configured to cause an apparatus to use the bit file to configure at least a portion of the portion of the circuit to perform a function associated with the program. may be

비트 파일 디스크립션은 두 개 이상의 각각의 프로세싱 유닛 사이의 하나 이상의 루트를 나타내는 라우팅 정보를 포함할 수도 있다.The bit file description may include routing information indicative of one or more routes between two or more respective processing units.

회로의 일부는 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부를 포함할 수도 있되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 비트 파일 디스크립션은 복수의 프로세싱 유닛 중 적어도 일부 사이의 라우팅에 관한 정보를 포함하고, 적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 하드웨어로 하여금 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하게 하도록 상기 비트 파일을 사용하게 하도록 구성된다.A portion of the circuit may include at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step, the plurality of processing units at least some of which relate to different predefined types of operation, wherein the bit file description includes information regarding routing between at least some of the plurality of processing units, the at least one memory and computer code comprising: at least one processor together with an apparatus to provide a first data processing pipeline for processing one or more of the plurality of data packets to cause hardware to perform a first function with respect to the one or more of the plurality of data packets. and use the bit file to interconnect at least some of the plurality of processing units.

다른 양태에 따르면, 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 제1 인터페이스 - 제1 인터페이스는 복수의 데이터 패킷을 수신하도록 구성됨 - ; 복수의 프로세싱 유닛 - 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련됨 - 을 포함하는 구성 가능한 하드웨어 모듈; 컴파일러 - 상기 컴파일러는 비트 파일 디스크립션 - 상기 비트 파일 디스크립션은 상기 구성 가능한 하드웨어 모듈의 적어도 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하도록, 그리고 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하도록 구성됨 - 를 포함하되, 상기 하드웨어 모듈은 상기 프로그램과 관련되는 제1 기능을 수행하도록 상기 비트 파일을 사용하여 구성 가능하다.According to another aspect, there is provided a network interface device comprising: a first interface, the first interface configured to receive a plurality of data packets; a configurable hardware module comprising a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step; a compiler, wherein the compiler is configured to receive a bit file description, wherein the bit file description includes a description of routing of at least a portion of the configurable hardware module, and to receive a program, and to output a bit file for the program. configured to compile the program using a description, wherein the hardware module is configurable using the bit file to perform a first function associated with the program.

네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하기 위한 것일 수도 있다.The network interface device may be for interfacing a host device to a network.

상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련될 수도 있다.At least some of the plurality of processing units may be associated with different predefined types of operation.

하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능할 수도 있다.A hardware module is configured to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function in relation to the one or more of the plurality of data packets, wherein the plurality of the processing units are configured to: may be configurable to interconnect at least some of the

몇몇 실시형태에서, 제1 기능은 필터링 기능을 포함한다. 몇몇 실시형태에서, 기능은 터널링, 캡슐화 및 라우팅 기능 중 적어도 하나를 포함한다. 몇몇 실시형태에서, 제1 기능은 확장된 버클리 패킷 필터 기능을 포함한다.In some embodiments, the first function includes a filtering function. In some embodiments, the functionality includes at least one of tunneling, encapsulation, and routing functionality. In some embodiments, the first function comprises an extended Berkeley packet filter function.

몇몇 실시형태에서, 제1 기능은 분산형 서비스 거부 스크러빙 동작을 포함한다.In some embodiments, the first function comprises a distributed denial of service scrubbing operation.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부의 두 개 이상의 각각은 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에서 자신의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, each of two or more of at least some of the plurality of processing units is configured to perform its associated predefined type of operation within a predefined length of time defined by the clock signal.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 제1의 것은 복수의 프로세싱 유닛 중 제2의 것에 의한 상태의 값의 액세스 동안 스톨하도록 구성된다.In some embodiments, a first one of at least some of the plurality of processing units is configured to stall during access of a value of the state by a second one of the plurality of processing units.

몇몇 실시형태에서, 복수의 컴포넌트는 하드웨어 모듈과는 상이한 회로부에서 제1 기능을 제공하도록 구성되는 복수의 컴포넌트 중 제2의 것을 포함하되, 네트워크 인터페이스 디바이스는, 프로세싱 파이프라인을 통과하는 데이터 패킷으로 하여금, 복수의 컴포넌트 중 제1의 것 및 복수의 컴포넌트 중 제2의 것: 중 하나에 의해 프로세싱되게 하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the plurality of components comprises a second of the plurality of components configured to provide a first function in circuitry different from the hardware module, wherein the network interface device causes the data packet passing through the processing pipeline to: , at least one controller configured to cause processing by one of: a first of the plurality of components and a second of the plurality of components.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 추가적인 회로부는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 컴파일 프로세스의 완료에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, the network interface device includes at least one controller, wherein the additional circuitry is configured to perform a first function with respect to a data packet during a compilation process for a first function to be performed in the hardware module, wherein the at least one The controller of the , in response to the completion of the compilation process, is configured to control the hardware module to start performing a first function related to the data packet.

다른 양태에 따르면, 컴퓨터 구현 방법이 제공되는데, 컴퓨터 구현 방법은: 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부에 대한 라우팅 정보를 결정하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 라우팅 정보는 적어도 복수의 프로세싱 유닛 사이의 이용 가능한 루트에 관한 정보를 제공한다.According to another aspect, a computer-implemented method is provided, the computer-implemented method comprising: determining routing information for at least a portion of a configurable hardware module comprising a plurality of processing units, wherein each processing unit performs a single step and wherein at least some of the plurality of processing units are associated with different predefined types of operations, wherein the routing information relates to at least an available route between the plurality of processing units. provides

구성 가능한 하드웨어 모듈은, 실질적으로 정적인 부분 및 실질적으로 동적인 부분을 포함할 수도 있는데, 상기 결정은 상기 실질적으로 동적인 부분에 대한 라우팅 정보를 결정하는 것을 포함한다.A configurable hardware module may include a substantially static portion and a substantially dynamic portion, wherein determining includes determining routing information for the substantially dynamic portion.

상기 실질적으로 동적인 부분에 대한 라우팅 정보를 결정하는 것은, 상기 실질적으로 정적인 부분에서 프로세싱 유닛 중 하나 이상에 의해 사용되는 라우팅을 상기 실질적으로 동적인 부분에서 결정하는 것을 포함할 수도 있다.Determining routing information for the substantially dynamic portion may include determining, in the substantially dynamic portion, routing used by one or more of the processing units in the substantially static portion.

결정하는 것은 상기 라우팅 정보를 결정하기 위해 상기 구성 가능한 하드웨어 모듈의 적어도 일부의 비트 파일 디스크립션을 분석하는 것을 포함할 수도 있다.Determining may include analyzing a bit file description of at least a portion of the configurable hardware module to determine the routing information.

다른 양태에 따르면, 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부에 대한 라우팅 정보를 결정하기 위한: 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 라우팅 정보는 적어도 복수의 프로세싱 유닛 사이의 이용 가능한 루트에 관한 정보를 제공한다.According to another aspect, there is provided a non-transitory computer-readable medium comprising program instructions for determining routing information for at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit comprising: associated with a predefined type of operation executable in the step, wherein at least some of the plurality of processing units are associated with different predefined types of operation, and the routing information relates to at least an available route between the plurality of processing units. provide information.

방법(들)을 수행하도록 적응되는 프로그램 코드 수단을 포함하는 컴퓨터 프로그램이 또한 제공될 수도 있다. 컴퓨터 프로그램은 캐리어 매체에 의해 저장될 수도 있고 및/또는 다르게는 구현될 수도 있다.A computer program comprising program code means adapted to perform the method(s) may also be provided. The computer program may be stored on a carrier medium and/or otherwise embodied.

상기에서, 많은 상이한 실시형태가 설명되었다. 추가적인 실시형태는 상기에서 설명되는 실시형태 중 임의의 두 개 이상의 조합에 의해 제공될 수도 있다는 것이 인식되어야 한다.In the foregoing, many different embodiments have been described. It should be appreciated that additional embodiments may be provided by a combination of any two or more of the embodiments described above.

다양한 다른 양태 및 추가적인 실시형태가 다음의 상세한 설명에서 그리고 첨부된 청구범위에서 또한 설명된다.Various other and additional embodiments are also set forth in the following detailed description and in the appended claims.

이제, 몇몇 실시형태가 첨부의 도면을 참조하여 단지 예로서 설명될 것인데, 첨부의 도면에서:
도 1은 네트워크에 커플링되는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 2는 호스트 컴퓨팅 디바이스 상에서 유저 모드에서 실행되도록 구성되는 필터링 동작 애플리케이션을 포함하는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 3은 호스트 컴퓨팅 디바이스 상에서 커널 모드에서 실행되도록 구성되는 필터링 동작을 포함하는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 4는 데이터 패킷과 관련하여 기능을 수행하기 위한 복수의 CPU를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 5는 데이터 패킷과 관련하여 기능을 수행하기 위한 애플리케이션을 실행하는 필드 프로그래머블 게이트 어레이를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 6은 데이터 패킷과 관련하여 기능을 수행하기 위한 하드웨어 모듈을 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 7은 데이터 패킷과 관련하여 기능을 수행하기 위한 적어도 하나의 프로세싱 유닛 및 필드 프로그래머블 게이트 어레이를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 8은 몇몇 실시형태에 따른 네트워크 인터페이스 디바이스에서 구현되는 방법을 예시한다;
도 9는 몇몇 실시형태에 따른 네트워크 인터페이스 디바이스에서 구현되는 방법을 예시한다;
도 10은 일련의 프로그램에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 11은 복수의 프로세싱 유닛에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 12는 복수의 프로세싱 유닛에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 13은 데이터 패킷을 프로세싱하기 위한 프로세싱 스테이지의 파이프라인의 예를 예시한다;
도 14는 복수의 플러그형(pluggable) 컴포넌트를 갖는 슬라이스 아키텍쳐의 예를 예시한다;
도 15는 복수의 프로세싱 유닛의 프로세싱의 배열 및 순서의 예시적인 표현을 예시한다;
도 16은 기능을 컴파일하는 예시적인 방법을 예시한다;
도 17은 상태 보존형(stateful) 프로세싱 유닛의 예를 예시한다;
도 18은 상태 비보존형(stateless) 프로세싱 유닛의 예를 예시한다;
도 19는 몇몇 실시형태의 방법을 도시한다;
도 20a 및 도 20b는 FPGA에서 슬라이스 사이의 라우팅을 예시한다; 그리고
도 21은 FGPA 상의 파티션을 개략적으로 예시한다.BRIEF DESCRIPTION OF THE DRAWINGS Some embodiments will now be described by way of example only with reference to the accompanying drawings, in which:
1 shows a schematic diagram of a data processing system coupled to a network;
2 shows a schematic diagram of a data processing system including a filtering operation application configured to run in user mode on a host computing device;
3 shows a schematic diagram of a data processing system including a filtering operation configured to run in kernel mode on a host computing device;
4 shows a schematic diagram of a network interface device comprising a plurality of CPUs for performing functions in connection with data packets;
5 shows a schematic diagram of a network interface device including a field programmable gate array executing an application for performing a function in connection with a data packet;
6 shows a schematic diagram of a network interface device comprising hardware modules for performing functions in connection with data packets;
7 shows a schematic diagram of a network interface device comprising at least one processing unit and a field programmable gate array for performing a function in connection with a data packet;
8 illustrates a method implemented in a network interface device in accordance with some embodiments;
9 illustrates a method implemented in a network interface device in accordance with some embodiments;
10 illustrates an example of processing a data packet by a series of programs;
11 illustrates an example of processing a data packet by a plurality of processing units;
12 illustrates an example of processing a data packet by a plurality of processing units;
13 illustrates an example of a pipeline of a processing stage for processing a data packet;
14 illustrates an example of a slice architecture with a plurality of pluggable components;
15 illustrates an exemplary representation of an arrangement and order of processing of a plurality of processing units;
16 illustrates an example method of compiling a function;
17 illustrates an example of a stateful processing unit;
18 illustrates an example of a stateless processing unit;
19 depicts a method of some embodiments;
20A and 20B illustrate routing between slices in an FPGA; And
21 schematically illustrates a partition on FGPA.

다음의 설명은, 임의의 기술 분야의 숙련된 자가 본 발명을 만들고 사용하는 것을 가능하게 하기 위해 제공되며, 특정한 애플리케이션의 맥락에서 제공된다. 개시된 실시형태에 대한 다양한 수정이 기술 분야의 숙련된 자에는 쉽게 명백할 것이다.The following description is provided to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art.

본원에서 정의되는 일반적인 원리는, 본 발명의 취지 및 범위를 벗어나지 않으면서 다른 실시형태 및 애플리케이션에 적용될 수도 있다. 따라서, 본 발명은 도시되는 실시형태로 제한되도록 의도되는 것이 아니라, 본원에서 개시되는 원리 및 피쳐와 일치하는 최광의의 범위를 제공받아야 한다.The generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Accordingly, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

데이터가 네트워크와 같은 데이터 채널을 통해 두 개의 데이터 프로세싱 시스템 사이에서 전송되어야 하는 경우, 데이터 프로세싱 시스템의 각각은, 그 각각이 채널을 통해 통신하는 것을 허용하기 위한 적절한 네트워크 인터페이스를 구비한다. 종종, 네트워크는 이더넷(Ethernet) 기술에 기초한다. 네트워크를 통해 통신해야 하는 데이터 프로세싱 시스템은, 네트워크 프로토콜의 물리적 및 논리적 요건을 지원할 수 있는 네트워크 인터페이스를 구비해야 한다. 네트워크 인터페이스의 물리적 하드웨어 컴포넌트를 네트워크 인터페이스 디바이스 또는 네트워크 인터페이스 카드(network interface card; NIC)로 지칭된다.When data is to be transferred between two data processing systems over a data channel, such as a network, each of the data processing systems is provided with an appropriate network interface to allow each of them to communicate over the channel. Often, networks are based on Ethernet technology. A data processing system that must communicate over a network must have a network interface that can support the physical and logical requirements of the network protocol. The physical hardware component of the network interface is referred to as a network interface device or network interface card (NIC).

대부분의 컴퓨터 시스템은 오퍼레이팅 시스템(OS)을 포함하는데, 유저 레벨 애플리케이션은 그 오퍼레이팅 시스템(OS)을 통해 네트워크와 통신한다. 커널로 공지되어 있는 오퍼레이팅 시스템의 일부는, 애플리케이션과 네트워크 인터페이스 디바이스에 고유한 디바이스 드라이버 사이에서 커맨드 및 데이터를 변환하기 위한 프로토콜 스택을 포함한다. 디바이스 드라이버는 네트워크 인터페이스 디바이스를 직접적으로 제어할 수도 있다. 오퍼레이팅 시스템 커널에서 이들 기능을 제공하는 것에 의해, 네트워크 인터페이스 디바이스 사이의 복잡도 및 차이점이 유저 레벨 애플리케이션에서는 숨겨질 수 있다. 네트워크 하드웨어 및 다른 시스템 리소스(예컨대, 메모리)는 많은 애플리케이션에 의해 안전하게 공유될 수도 있고, 시스템은 결함이 있는 또는 악의적인 애플리케이션에 대해 보호될 수 있다.Most computer systems include an operating system (OS) through which user-level applications communicate with a network. Part of the operating system, known as the kernel, includes a protocol stack for converting commands and data between applications and device drivers specific to network interface devices. The device driver may directly control the network interface device. By providing these functions in the operating system kernel, the complexity and differences between network interface devices can be hidden from user level applications. Network hardware and other system resources (eg, memory) may be securely shared by many applications, and the system may be protected against faulty or malicious applications.

네트워크를 거쳐 송신을 실행하기 위한 통상적인 데이터 프로세싱 시스템(100)이 도 1에서 도시되어 있다. 데이터 프로세싱 시스템(100)은, 호스트를 네트워크(103)에 인터페이싱하도록 배열되는 네트워크 인터페이스 디바이스(102)에 커플링되는 호스트 컴퓨팅 디바이스(101)를 포함한다. 호스트 컴퓨팅 디바이스(101)는 하나 이상의 유저 레벨 애플리케이션(105)을 지원하는 오퍼레이팅 시스템(104)을 포함한다. 호스트 컴퓨팅 디바이스(101)는 또한 네트워크 프로토콜 스택(도시되지 않음)을 포함할 수도 있다. 예를 들면, 프로토콜 스택은 애플리케이션의 컴포넌트일 수도 있거나, 애플리케이션이 링크되는 라이브러리일 수도 있거나, 또는 오퍼레이팅 시스템에 의해 제공될 수도 있다. 몇몇 실시형태에서, 하나보다 더 많은 프로토콜 스택이 제공될 수도 있다.A typical data processing system 100 for effecting transmission over a network is shown in FIG. 1 . The data processing system 100 includes a host computing device 101 coupled to a network interface device 102 arranged to interface a host to a network 103 . The host computing device 101 includes an operating system 104 that supports one or more user level applications 105 . The host computing device 101 may also include a network protocol stack (not shown). For example, a protocol stack may be a component of an application, a library to which an application is linked, or may be provided by an operating system. In some embodiments, more than one protocol stack may be provided.

네트워크 프로토콜 스택은 송신 제어 프로토콜(Transmission Control Protocol; TCP) 스택일 수도 있다. 애플리케이션(105)은, 소켓을 개방하는 것 및 소켓으로 데이터를 기록하고 그로부터 데이터를 판독하는 것에 의해 TCP/IP 메시지를 전송 및 수신할 수 있고, 오퍼레이팅 시스템(104)은 메시지로 하여금 네트워크를 거쳐 전송되게 한다. 예를 들면, 애플리케이션은 소켓을 통한 그리고 그 다음 오퍼레이팅 시스템(104)을 통한 네트워크(103)로의 데이터의 송신을 위해 시스템 호출(syscall)을 호출할 수 있다. 메시지를 송신하기 위한 이 인터페이스를 메시지 전달 인터페이스로서 공지되어 있을 수도 있다.The network protocol stack may be a Transmission Control Protocol (TCP) stack. An application 105 can send and receive TCP/IP messages by opening a socket and writing data to and reading data from the socket, and the operating system 104 causes the message to be sent over the network. make it For example, the application may call a syscall for transmission of data to the network 103 over a socket and then over the operating system 104 . This interface for sending messages may be known as a message passing interface.

호스트(101)에서 스택을 구현하는 대신, 몇몇 시스템은 프로토콜 스택을 네트워크 인터페이스 디바이스(102)로 오프로딩한다. 예를 들면, 스택이 TCP 스택인 경우, 네트워크 인터페이스 디바이스(102)는 TCP 프로토콜 프로세싱을 수행하기 위한 TCP 오프로드 엔진(TCP Offload Engine; TOE)을 포함할 수도 있다. 프로토콜 프로세싱을, 호스트 컴퓨팅 디바이스(101)에서 수행하는 대신, 네트워크 인터페이스 디바이스(102)에서 수행하는 것에 의해, 호스트 시스템의 프로세서(101)에 대한 요구가 감소될 수도 있다. 네트워크를 통해 송신될 데이터는, 커널 TCP/IP 스택을 부분적으로 또는 전체적으로 바이패스하는 것에 의해, TOE 대응 가상 인터페이스 드라이버(TOE-enabled virtual interface driver)를 통해 애플리케이션(105)에 의해 전송될 수도 있다. 따라서, 이 빠른 경로를 따라 전송되는 데이터는 TOE 드라이버의 요건을 충족하도록 포맷되기만 하면 된다.Instead of implementing the stack at the host 101 , some systems offload the protocol stack to the network interface device 102 . For example, when the stack is a TCP stack, the network interface device 102 may include a TCP Offload Engine (TOE) for performing TCP protocol processing. By performing protocol processing at the network interface device 102 instead of performing at the host computing device 101 , the demands on the processor 101 of the host system may be reduced. Data to be transmitted over the network may be transmitted by the application 105 through a TOE-enabled virtual interface driver, by partially or fully bypassing the kernel TCP/IP stack. Therefore, data transmitted along this fast path need only be formatted to meet the requirements of the TOE driver.

호스트 컴퓨팅 디바이스(101)는 하나 이상의 프로세서 및 하나 이상의 메모리를 포함할 수도 있다. 몇몇 실시형태에서, 호스트 컴퓨팅 디바이스(101) 및 네트워크 인터페이스 디바이스(102)는 버스, 예를 들면, 주변장치 컴포넌트 인터커넥트 익스프레스(peripheral component interconnect express)(PCIe 버스)를 통해 통신할 수도 있다.The host computing device 101 may include one or more processors and one or more memory. In some embodiments, host computing device 101 and network interface device 102 may communicate via a bus, eg, a peripheral component interconnect express (PCIe bus).

데이터 프로세싱 시스템의 동작 동안, 네트워크 상으로 송신될 데이터는, 송신을 위해 호스트 컴퓨팅 디바이스(101)로부터 네트워크 인터페이스 디바이스(102)로 전송될 수도 있다. 하나의 예에서, 데이터 패킷은 호스트 프로세서에 의해 호스트로부터 네트워크 인터페이스 디바이스로 직접적으로 전송될 수도 있다. 호스트는 네트워크 인터페이스 디바이스(102) 상에 위치되는 하나 이상의 버퍼(106)에 데이터를 제공할 수도 있다. 네트워크 인터페이스 디바이스(102)는, 그 다음, 데이터 패킷을 준비할 수도 있고 그들을 네트워크(103)를 통해 송신할 수도 있다.During operation of the data processing system, data to be transmitted over a network may be transmitted from the host computing device 101 to the network interface device 102 for transmission. In one example, the data packet may be sent directly from the host to the network interface device by the host processor. The host may provide data to one or more buffers 106 located on the network interface device 102 . The network interface device 102 may then prepare the data packets and transmit them over the network 103 .

대안적으로, 데이터는 호스트 시스템(101) 내의 버퍼(107)에 기록될 수도 있다. 그 다음, 데이터는 네트워크 인터페이스 디바이스에 의해 버퍼(107)로부터 검색될 수도 있고 네트워크(103)를 통해 송신될 수도 있다.Alternatively, the data may be written to the buffer 107 in the host system 101 . The data may then be retrieved from the buffer 107 by the network interface device and transmitted over the network 103 .

이들 경우 둘 모두에서, 데이터는 네트워크를 통한 송신 이전에 하나 이상의 버퍼에 일시적으로 저장된다. 네트워크를 통해 전송되는 데이터는 (룩백(lookback)에서) 호스트로 반환될 수 있다.In both of these cases, the data is temporarily stored in one or more buffers prior to transmission over the network. Data transmitted over the network may be returned to the host (in a lookback).

데이터 패킷이 네트워크(103)를 통해 전송되고 그로부터 수신되는 경우, 네트워크를 통해 송신될 데이터 패킷 또는 네트워크로부터 수신되는 데이터 패킷 중 어느 하나인 데이터 패킷에 대한 동작으로서 표현될 수 있는 많은 프로세싱 작업이 존재한다. 예를 들면, 분산형 서비스 거부(distributed denial of service; DDOS) 필터링으로부터 호스트 시스템(101)을 보호하기 위해, 수신된 데이터 패킷에 대해 필터링 프로세스가 수행될 수도 있다. 그러한 필터링 프로세스는 단순 팩 검사(simple pack examination) 또는 확장된 버클리 패킷 필터(extended Berkley packet filter; eBPF)에 의해 수행될 수도 있다. 다른 예로서, 네트워크(103)를 통해 송신될 데이터 패킷에 대해 캡슐화 및 포워딩이 수행될 수도 있다. 이들 프로세스는 많은 CPU 사이클을 소비할 수도 있고 종래의 OS 아키텍쳐에 대한 부담이 될 수도 있다.When data packets are sent over and received from network 103, there are many processing tasks that can be expressed as operations on data packets that are either data packets to be transmitted over the network or data packets received from the network. . For example, to protect the host system 101 from distributed denial of service (DDOS) filtering, a filtering process may be performed on the received data packets. Such a filtering process may be performed by a simple pack examination or an extended Berkley packet filter (eBPF). As another example, encapsulation and forwarding may be performed on data packets to be transmitted over the network 103 . These processes may consume many CPU cycles and may be a burden on conventional OS architectures.

필터링 동작 또는 다른 패킷 프로세싱 동작이 호스트 시스템(220)에서 구현될 수도 있는 한 가지 방식을 예시하는 도 2에 대한 참조가 이루어진다. 호스트 시스템(220)에 의해 수행되는 프로세스는 유저 공간 또는 커널 공간 중 어느 하나에서 수행되는 것으로 도시된다. 네트워크 인터페이스 디바이스(210)에서 네트워크로부터 수신되는 데이터 패킷을 종단 애플리케이션(terminating application; 250)으로 전달하기 위한 수신 경로가 커널 공간에서 존재한다. 이 수신 경로는 드라이버(235), 프로토콜 스택(240), 및 소켓(245)을 포함한다. 필터링 동작(230)은 유저 공간에서 구현된다. 네트워크 인터페이스 디바이스(210)에 의해 호스트 시스템(220)에 제공되는 착신(incoming) 패킷은 커널(프로토콜 프로세싱이 발생하는 곳)을 우회하고 필터링 동작(230)으로 직접적으로 제공된다.Reference is made to FIG. 2 , which illustrates one way in which a filtering operation or other packet processing operation may be implemented in the host system 220 . Processes performed by the host system 220 are shown to be performed in either user space or kernel space. A reception path for delivering a data packet received from the network in the network interface device 210 to a terminating application 250 exists in the kernel space. This receive path includes a driver 235 , a protocol stack 240 , and a socket 245 . The filtering operation 230 is implemented in user space. Incoming packets provided by the network interface device 210 to the host system 220 bypass the kernel (where protocol processing takes place) and are provided directly to the filtering operation 230 .

필터링 동작(230)은, 데이터 패킷을 호스트 시스템(220) 내의 다른 엘리먼트와 교환하기 위한 가상 인터페이스(이것은 에테르 패브릭 가상 인터페이스(ether fabric virtual interface; EFVI) 또는 데이터 평면 개발 키트(data plane development kit; DPDK) 또는 임의의 다른 적절한 인터페이스일 수도 있음)를 제공받을 수도 있다. 필터링 동작(230)은 DDOS 스크러빙 및/또는 다른 형태의 필터링을 수행할 수도 있다. DDOS 후보로서 쉽게 인식되는 모든 패킷 - 예를 들면, 샘플 패킷, 패킷의 사본, 및 아직 분류되지 않은 패킷 - 에 대해 DDOS 스크러빙 프로세스가 실행될 수도 있다. 필터링 동작(230)으로 전달되지 않는 패킷은, 네트워크 인터페이스로부터 드라이버(235)로 직접적으로 전달될 수도 있다. 동작(230)은 필터링을 수행하기 위한 확장된 버클리 패킷 필터(extended Berkeley packet filter; eBPF)를 제공할 수도 있다. 수신된 패킷이 동작(230)에 의해 제공되는 필터링을 통과하면, 동작(230)은 수신된 패킷을 프로세싱하기 위해 커널 내의 수신 경로에 패킷을 재주입하도록 구성된다. 구체적으로, 패킷은 드라이버(235) 또는 스택(240)으로 제공된다. 그 다음, 패킷은 프로토콜 스택(240)에 의해 프로토콜 프로세싱된다. 그 다음, 패킷은 종단 애플리케이션(250)과 관련되는 소켓(245)으로 전달된다. 종단 애플리케이션(250)은 관련된 소켓의 버퍼로부터 데이터 패킷을 검색하기 위해 recv() 호출을 발행한다.The filtering operation 230 is a virtual interface (this is an ether fabric virtual interface (EFVI) or data plane development kit (DPDK) for exchanging data packets with other elements in the host system 220 ). ) or any other suitable interface). The filtering operation 230 may perform DDOS scrubbing and/or other types of filtering. A DDOS scrubbing process may be run for all packets that are readily recognized as DDOS candidates - for example, sample packets, copies of packets, and packets that have not yet been classified. Packets that are not transferred to the filtering operation 230 may be directly transferred from the network interface to the driver 235 . Operation 230 may provide an extended Berkeley packet filter (eBPF) for performing filtering. If the received packet passes the filtering provided by operation 230 , operation 230 is configured to re-inject the packet into a receive path within the kernel for processing the received packet. Specifically, the packet is provided to the driver 235 or stack 240 . The packet is then protocol processed by the protocol stack 240 . The packet is then forwarded to the socket 245 associated with the end application 250 . The end application 250 issues a recv() call to retrieve the data packet from the buffer of the associated socket.

그러나, 이 접근법에는 여러 가지 문제가 있다. 먼저, 필터링 동작(230)은 호스트 CPU 상에서 실행된다. 필터링(230)을 실행하기 위해, 호스트 CPU는, 데이터 패킷을, 그들이 네트워크로부터 수신되는 레이트에서 프로세싱해야 한다. 데이터가 네트워크로 전송되고 그로부터 수신되는 레이트가 높은 경우, 이것은 호스트 CPU의 프로세싱 리소스에 대한 큰 낭비를 구성할 수 있다. 필터링 동작(230)에 대한 높은 데이터 유량은 다른 제한된 리소스 - 예컨대 I/O 대역폭 및 내부 메모리/캐시 대역폭 - 의 대량의 소비를 초래할 수도 있다.However, there are several problems with this approach. First, the filtering operation 230 is executed on the host CPU. In order to perform filtering 230, the host CPU must process data packets at the rate at which they are received from the network. When data is sent to and received from the network at a high rate, this can constitute a large waste of processing resources of the host CPU. High data flow rates for filtering operation 230 may result in large consumption of other limited resources, such as I/O bandwidth and internal memory/cache bandwidth.

커널로의 데이터 패킷의 재주입을 수행하기 위해서는, 필터링 동작(230)에 재주입을 수행하기 위한 특권이 있는 API를 제공하는 것이 필요하다. 재주입 프로세스는 패킷 순서에 대한 주의를 필요하여 번거로울 수도 있다. 재주입을 수행하기 위해, 동작(230)은 많은 경우에 전용 CPU 코어를 필요로 할 수도 있다.In order to perform the re-injection of the data packet into the kernel, it is necessary to provide the filtering operation 230 with a privileged API for performing the re-injection. The re-injection process can be cumbersome as it requires attention to packet order. To perform reinjection, operation 230 may require a dedicated CPU core in many cases.

동작에 데이터를 제공하고 재주입하는 단계는 데이터가 메모리로 복사되거나 또는 메모리로부터 복사되는 것을 필요로 한다. 이 복사는 시스템에 대한 리소스 부담이 된다.Providing and re-injecting data into the operation requires that the data be copied to or from memory. This copy becomes a resource burden on the system.

네트워크를 통해 전송/수신될 데이터에 대한 필터링 이외의 다른 타입의 동작을 제공할 때, 유사한 문제가 발생할 수도 있다.A similar problem may arise when providing other types of operation other than filtering for data to be transmitted/received over a network.

몇몇 동작(예컨대, DPDK 타입 동작)은 프로세싱된 패킷을 네트워크로 다시 포워딩하는 것을 필요로 할 수도 있다.Some operations (eg, DPDK type operations) may require forwarding the processed packets back to the network.

다른 접근법을 예시하는 도 3에 대한 참조가 이루어진다. 동일한 엘리먼트는 동일한 참조 번호를 사용하여 참조된다. 이 예에서, 익스프레스 데이터 경로(express data path; XDP)(310)로 공지되어 있는 추가적인 계층이 커널 내의 송신 및 수신 경로에 삽입된다. XDP(310)에 대한 확장은 송신 경로로의 삽입을 허용한다. XDP 헬퍼(helper)는 (수신 동작의 결과로서) 패킷이 송신되는 것을 허용한다. XDP(310)는 오퍼레이팅 시스템의 드라이버 레벨에서 삽입되고, 데이터 패킷이 스택(240)에 의해 프로토콜 프로세싱되기 이전에, 네트워크로부터 수신되는 데이터 패킷에 대해 동작을 수행하기 위해 프로그램이 이 레벨에서 실행되는 것을 허용한다. XDP(310)는 또한, 네트워크를 통해 전송될 데이터 패킷에 대해 동작을 수행하기 위해 프로그램이 이 레벨에서 실행되는 것을 허용한다. 따라서, eBPF 프로그램 및 다른 프로그램은 송신 및 수신 경로에서 동작할 수 있다.Reference is made to FIG. 3 which illustrates another approach. Like elements are referenced using like reference numbers. In this example, an additional layer known as the express data path (XDP) 310 is inserted into the transmit and receive paths within the kernel. Extensions to XDP 310 allow insertion into the transmit path. The XDP helper allows packets to be transmitted (as a result of a receive operation). XDP 310 is inserted at the driver level of the operating system, and before the data packets are protocol-processed by stack 240, programs are executed at this level to perform operations on data packets received from the network. allow XDP 310 also allows programs to run at this level to perform operations on data packets to be transmitted over the network. Thus, eBPF programs and other programs can operate in the transmit and receive paths.

도 3에서 예시되는 바와 같이, XDP(310)의 일부인 프로그램(330)을 형성하기 위해, 필터링 동작(320)은 유저 공간으로부터 XDP로 삽입될 수도 있다. 동작(320)은, 수신 경로 상의 패킷에 대해 필터링 동작(예를 들면, DDOS 스크러빙)을 수행하는 프로그램(330)을 제공하기 위해 데이터 수신 경로 상에서 실행될 XDP 제어 평면을 사용하여 삽입된다. 그러한 프로그램(330)은 eBPF 프로그램일 수도 있다.As illustrated in FIG. 3 , a filtering operation 320 may be inserted into the XDP from user space to form a program 330 that is part of the XDP 310 . Operation 320 is inserted using the XDP control plane to be executed on the data receive path to provide a program 330 that performs filtering operations (eg, DDOS scrubbing) on packets on the receive path. Such a program 330 may be an eBPF program.

프로그램(330)은 드라이버(235)와 프로토콜 스택(240) 사이의 커널에 삽입되는 것으로 도시되어 있다. 그러나, 다른 예에서, 프로그램(330)은 커널의 수신 경로 내의 다른 지점에서 삽입될 수도 있다. 프로그램(330)은 데이터 패킷을 수신하는 별개의 제어 경로의 일부일 수도 있다. 프로그램(330)은, 그 애플리케이션에 대한 소켓(245)의 애플리케이션 프로그래밍 인터페이스(application programming interface; API)에 대한 확장을 제공하는 것에 의해 애플리케이션에 의해 제공될 수도 있다.Program 330 is shown to be inserted into the kernel between driver 235 and protocol stack 240 . However, in other examples, the program 330 may be inserted at another point in the receive path of the kernel. Program 330 may be part of a separate control path for receiving data packets. Program 330 may be provided by an application by providing an extension to an application programming interface (API) of socket 245 for that application.

이 프로그램(330)은, 추가적으로 또는 대안적으로, 송신 경로를 통해 전송되고 있는 데이터에 대해 하나 이상의 동작을 수행할 수도 있다. 그 다음, XDP(310)는, 네트워크 인터페이스 디바이스(210)를 통해 네트워크를 통해 데이터를 전송하기 위해, 드라이버(235)의 송신 기능을 호출한다. 이 경우, 프로그램(330)은 네트워크를 통해 전송될 데이터 패킷과 관련하여 부하 분산 또는 라우팅 동작을 제공할 수도 있다. 프로그램(330)은 네트워크를 통해 전송될 데이터 패킷과 관련하여 세그먼트 재캡슐화 및 포워딩 동작을 제공할 수도 있다.The program 330 may additionally or alternatively perform one or more operations on data being transmitted over the transmission path. The XDP 310 then calls the transmit function of the driver 235 to transmit data over the network via the network interface device 210 . In this case, the program 330 may provide a load balancing or routing operation with respect to the data packet to be transmitted over the network. Program 330 may provide segment re-encapsulation and forwarding operations with respect to data packets to be transmitted over the network.

프로그램(330)은 방화벽 및 가상 스위칭 또는 프로토콜 종료 또는 애플리케이션 프로세싱을 필요로 하지 않는 다른 동작을 위해 사용될 수도 있다.Program 330 may also be used for firewall and virtual switching or other operations that do not require protocol termination or application processing.

이러한 방식의 XDP(310)의 사용의 한 가지 이점은, 프로그램(330)이 중간 사본 없이, 드라이버에 의해 핸들링되는 메모리 버퍼에 직접적으로 액세스할 수 있다는 것이다.One advantage of using XDP 310 in this manner is that program 330 can directly access memory buffers handled by the driver, without an intermediate copy.

이러한 방식으로 커널에서의 동작을 위해 프로그램(330)을 삽입하기 위해서는, 프로그램(330)이 안전하다는 것을 보장하는 것이 필요하다. 안전하지 않은 프로그램이 커널에 삽입되면, 이것은 다음과 같은 소정의 위험을 제시한다: 커널을 망가뜨릴 수 있는 무한 루프; 버퍼 오버 플로우, 초기화되지 않는 변수, 컴파일러 에러, 대형 프로그램에 의해 야기되는 성능 문제.In order to insert the program 330 for operation in the kernel in this way, it is necessary to ensure that the program 330 is secure. When an unsafe program is inserted into the kernel, it presents certain risks: an infinite loop that can break the kernel; Performance problems caused by buffer overflows, uninitialized variables, compiler errors, and large programs.

이러한 방식으로 XDP(310)로의 삽입 이전에 프로그램(330)이 안전하다는 것을 보장하기 위해, 프로그램(330)의 안전성을 검증하기 위해 호스트 시스템(220) 상에서 검증자(verifier)가 실행될 수도 있다. 검증자는, 어떠한 루프도 존재하지 않는다는 것을 보장하도록 구성될 수도 있다. 루프를 야기하지 않는다면 역방향 점프 동작이 허용될 수도 있다. 검증자는, 프로그램(330)이 미리 정의된 수(예를 들면, 4000) 이하의 명령어를 갖는다는 것을 보장하도록 구성될 수도 있다. 검증자는 프로그램(330)의 데이터 경로를 통해 통과하는 것에 의해 레지스터 사용의 유효성에 대한 체크를 수행할 수도 있다. 가능한 경로가 너무 많으면, 프로그램(330)은 커널 모드에서 실행되기에 안전하지 않은 것으로 거부될 것이다. 예를 들면, 1000 개보다 더 많은 분기가 있는 경우, 프로그램(330)은 거부될 수도 있다.To ensure that the program 330 is secure prior to insertion into the XDP 310 in this manner, a verifier may be run on the host system 220 to verify the safety of the program 330 . The verifier may be configured to guarantee that no loops exist. A reverse jump operation may be allowed if it does not cause a loop. The verifier may be configured to ensure that the program 330 has no more than a predefined number (eg, 4000) of instructions. The verifier may perform a check on the validity of the register usage by passing through the data path of the program 330 . If there are too many possible paths, the program 330 will be rejected as unsafe to run in kernel mode. For example, if there are more than 1000 branches, the program 330 may be rejected.

XDP - 이것에 의해 안전한 프로그램(330)은 커널에 설치될 수도 있음 - 가 하나의 예이다는 것, 및 이것이 달성될 수 있는 다른 방식이 존재한다는 것이 기술 분야의 숙련된 자에 의해 인식될 것이다.It will be appreciated by those skilled in the art that XDP, whereby the secure program 330 may be installed in the kernel, is one example, and that there are other ways in which this may be accomplished.

예를 들면, 커널에서 코드를 실행하는 데 필요한 안전한(또는 샌드박스형(sandboxed)) 언어로 동작이 표현될 수 있다면, 도 3과 관련하여 상기에서 논의되는 접근법은, 도 2와 관련하여 상기에서 논의되는 접근법만큼 효율적일 수도 있다. eBPF 언어는 x86 프로세서 상에서 효율적으로 실행될 수 있고 JIT(Just in Time; 적시) 컴파일 기술은 eBPF 프로그램이 네이티브 머신 코드(native machine code)로 컴파일되는 것을 가능하게 한다. 언어는 안전하도록 설계된다, 예를 들면, 상태는, 공유된 데이터 구조(예컨대, 해시 테이블)인 구성만을 매핑하도록 제한된다. 제한된 루핑이 허용되고, 대신 하나의 eBPF 프로그램이 다른 프로그램을 테일콜(tail-call)하도록 허용된다. 상태 공간은 제한된다.For example, if an operation can be expressed in a secure (or sandboxed) language required to execute code in the kernel, the approach discussed above with respect to FIG. 3 is the approach discussed above with respect to FIG. 2 . It may be as effective as the approach The eBPF language can run efficiently on x86 processors and Just in Time (JIT) compilation technology enables eBPF programs to be compiled into native machine code. The language is designed to be secure, eg, state is limited to mapping only constructs that are shared data structures (eg, hash tables). Limited looping is allowed, instead allowing one eBPF program to tail-call another program. The state space is limited.

그러나, 몇몇 구현예에서, 이러한 접근법에서 호스트 시스템(220)의 리소스(예를 들면, I/O 대역폭 및 내부 메모리/캐시 대역폭, 호스트 CPU)에 대한 큰 낭비가 있을 수도 있다. 데이터 패킷에 대한 동작은, 데이터가 전송/수신되고 있는 레이트에서 그러한 동작을 수행하도록 요구받는 호스트 CPU에 의해 여전히 수행되고 있다.However, in some implementations, there may be a significant waste of resources (eg, I/O bandwidth and internal memory/cache bandwidth, host CPU) of the host system 220 in this approach. Operations on data packets are still being performed by the host CPU required to perform those operations at the rate at which data is being transmitted/received.

다른 제안은 상기 논의된 동작을, 호스트 시스템에서 수행하는 대신, 네트워크 인터페이스 디바이스에서 수행하는 것이다. 그렇게 하는 것은, 소비되는 I/O 대역폭, 메모리 및 캐시 대역폭 외에도, 동작을 실행할 때 호스트 CPU에서 사용되는 CPU 사이클을 확보할 수도 있다. 프로세싱 동작의 실행을 호스트로부터 네트워크 인터페이스 디바이스의 하드웨어로 이동하는 것은 어떤 도전 과제를 제시할 수도 있다.Another proposal is to perform the operations discussed above in the network interface device instead of in the host system. Doing so may free up CPU cycles used by the host CPU when executing operations, in addition to consumed I/O bandwidth, memory and cache bandwidth. Moving the execution of processing operations from the host to the hardware of the network interface device may present certain challenges.

네트워크 하드웨어에서 프로세싱을 구현하기 위한 한 가지 제안은, 패킷 프로세싱 및/또는 조작 동작에 대해 특화되는, 복수의 CPU를 포함하는 네트워크 프로세싱 유닛(network processing unit; NPU)을 네트워크 인터페이스 디바이스에서 제공하는 것이다.One proposal for implementing processing in network hardware is to provide, in a network interface device, a network processing unit (NPU) comprising a plurality of CPUs, which is specialized for packet processing and/or manipulation operations.

중앙 프로세싱 유닛(central processing unit; CPU), 예를 들면, CPU(420)의 어레이(410)를 포함하는 네트워크 인터페이스 디바이스(400)의 예를 예시하는 도 4에 대한 참조가 이루어진다. CPU는 네트워크로 전송되고 그로부터 수신되는 데이터 패킷을 필터링하는 것과 같은 기능을 수행하도록 구성된다. CPU의 어레이(410)의 각각의 CPU는 NPU일 수도 있다. 도 4에서 도시되지는 않지만, CPU는, 추가적으로 또는 대안적으로, 네트워크를 통한 송신을 위해 호스트로부터 수신되는 데이터 패킷에 대해 부하 밸런싱과 같은 동작을 수행하도록 구성될 수도 있다. 이들 CPU는 그러한 패킷 프로세싱/조작 동작에 대해 특화되어 있다. CPU는 그러한 패킷 프로세싱/조작 동작에 대해 최적화되는 명령어 세트를 실행한다.Reference is made to FIG. 4 , which illustrates an example of a network interface device 400 including an array 410 of a central processing unit (CPU), eg, CPU 420 . The CPU is configured to perform functions such as filtering data packets sent to and received from the network. Each CPU in the array 410 of CPUs may be an NPU. Although not shown in FIG. 4 , the CPU may additionally or alternatively be configured to perform operations, such as load balancing, on data packets received from a host for transmission over a network. These CPUs are specialized for such packet processing/manipulation operations. The CPU executes an instruction set that is optimized for such packet processing/manipulation operations.

네트워크 인터페이스 디바이스(400)는, CPU의 어레이(410) 사이에서 공유되며 CPU의 어레이(410)가 액세스 가능한 메모리(도시되지 않음)를 추가적으로 포함한다.The network interface device 400 further includes memory (not shown) shared among the arrays 410 of CPUs and accessible to the arrays 410 of CPUs.

네트워크 인터페이스 디바이스(400)는 네트워크 인터페이스 디바이스(400)를 네트워크와 인터페이싱하기 위한 네트워크 매체 액세스 제어(medium access control; MAC) 계층(430)을 포함한다. MAC 계층(430)은 네트워크로부터 데이터 패킷을 수신하고 네트워크를 통해 데이터 패킷을 전송하도록 구성된다.The network interface device 400 includes a network medium access control (MAC) layer 430 for interfacing the network interface device 400 with a network. MAC layer 430 is configured to receive data packets from the network and transmit data packets over the network.

네트워크 인터페이스 디바이스(400)에서 수신되는 패킷에 대한 동작은 CPU를 통해 병렬화된다. 도시되는 바와 같이, 데이터 플로우가 MAC 계층(430)에서 수신되는 경우, 그것은 확산 기능부(440)로 전달되는데, 확산 기능부(440)는 플로우로부터 데이터 패킷을 추출하도록 그리고 그들을 NPU(410) 내의 복수의 CPU에 걸쳐 분배하여, CPU가 이들 데이터 패킷의 프로세싱, 예를 들면, 필터링을 수행하도록 구성된다. 확산 기능부(440)는, 수신된 데이터 패킷이 속하는 데이터 플로우를 식별하기 위해 수신된 데이터 패킷을 파싱할 수도 있다. 확산 기능부(440)는, 각각의 패킷에 대해, 그것이 속하는 데이터 플로우에서의 각각의 패킷의 위치의 표시를 생성한다. 표시는, 예를 들면, 태그일 수도 있다. 확산 기능부(440)는 각각의 패킷의 관련된 메타데이터에 각각의 표시를 추가한다. 각각의 데이터 패킷에 대한 관련된 메타데이터는 데이터 패킷에 부가될 수도 있다. 관련된 메타데이터는, 측대역 제어 정보(side-band control information)로서 확산 기능부(440)로 전달될 수 있다. 표시는, 데이터 패킷이 속하는 플로우에 의존하여 추가되고, 그 결과, 임의의 특정한 플로우에 대한 데이터 패킷의 순서는 재구성될 수도 있다.Operations on packets received at the network interface device 400 are parallelized through the CPU. As shown, when a data flow is received at the MAC layer 430 , it is passed to a spreading function 440 , which extracts data packets from the flow and puts them into the NPU 410 . Distributed across a plurality of CPUs, the CPUs are configured to perform processing, eg, filtering, of these data packets. The spreading function 440 may parse the received data packet to identify a data flow to which the received data packet belongs. Spreading function 440 generates, for each packet, an indication of the position of each packet in the data flow to which it belongs. The indication may be, for example, a tag. The spreading function 440 adds each indication to the associated metadata of each packet. Associated metadata for each data packet may be appended to the data packet. The associated metadata may be passed to the spreading function 440 as side-band control information. Indications are added depending on the flow to which the data packet belongs, and as a result, the order of the data packet for any particular flow may be reconstructed.

복수의 CPU(410)에 의한 프로그래밍 이후, 데이터 패킷은, 그 다음, 데이터 플로우의 패킷을 호스트 인터페이스 계층(460)으로 전달하기 이전에 데이터 플로우의 패킷을 그들의 정확한 순서로 재정렬하는 재정렬 기능부(450)로 전달된다. 재정렬 기능부(450)는 데이터 패킷의 순서를 재구성하기 위해 플로우의 데이터 패킷 내의 표시(예를 들면, 태그)를 비교하는 것에 의해 플로우 내의 데이터 패킷을 재정렬할 수도 있다. 그 다음, 재정렬된 데이터 패킷은 호스트 인터페이스(460)를 통과하여 호스트 시스템(220)으로 전달된다.After programming by the plurality of CPUs 410 , the data packets are then reordered by a reordering function 450 that rearranges the packets of the data flow into their correct order before passing the packets of the data flow to the host interface layer 460 . ) is transferred to The reordering function 450 may reorder data packets within a flow by comparing the indications (eg, tags) within the data packets of the flows to reconstruct the order of the data packets. The reordered data packet is then passed through the host interface 460 to the host system 220 .

도 4가 네트워크로부터 수신되는 데이터 패킷에 대해서만 동작하는 CPU의 어레이(410)를 예시하지만, 유사한 원리(확산 및 재정렬을 포함함)가 네트워크를 통한 송신을 위해 호스트로부터 수신되는 데이터 패킷에 대해서도 수행될 수도 있는데, CPU의 어레이(410)는 호스트로부터 수신되는 이들 데이터 패킷에 대해 기능(예를 들면, 부하 밸런싱)을 수행한다.Although Figure 4 illustrates an array 410 of a CPU that operates only on data packets received from the network, similar principles (including spreading and reordering) may be performed for data packets received from a host for transmission over the network. Alternatively, the CPU's array 410 performs a function (eg, load balancing) on these data packets received from the host.

CPU에 의해 실행되는 프로그램은, 도 3과 관련하여 상기에서 설명되는 예에서 호스트 CPU 상에서 실행될 프로그램의 컴파일된 또는 트랜스코딩된 버전일 수도 있다. 다시 말하면, 동작을 수행하기 위해 호스트 CPU 상에서 실행될 명령어 세트는, 네트워크 인터페이스(400) 내의 특수 CPU의 어레이의 각각의 CPU 상에서의 실행을 위해 변환된다.The program executed by the CPU may be a compiled or transcoded version of the program to be executed on the host CPU in the example described above with respect to FIG. 3 . In other words, the set of instructions to be executed on the host CPU to perform the operation is translated for execution on each CPU of the array of special CPUs in the network interface 400 .

CPU를 통한 병렬화를 달성하기 위해, 프로그램의 다수의 인스턴스가 컴파일되고 다수의 CPU 상에서 병렬로 실행된다. 프로그램의 각각의 인스턴스는 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷의 상이한 세트를 프로세싱하는 것을 담당할 수도 있다. 그러나, 각각의 개개의 데이터 패킷은, 그 데이터 패킷과 관련하여 프로그램의 기능을 제공할 때 단일의 CPU에 의해 프로세싱된다. 병렬 프로그램 실행의 전체적인 효과는 호스트 CPU 상에서의 단일의 프로그램(예를 들면, 프로그램(330))의 실행과 동일할 수도 있다.To achieve parallelism across CPUs, multiple instances of a program are compiled and executed in parallel on multiple CPUs. Each instance of the program may be responsible for processing a different set of data packets received at the network interface device. However, each individual data packet is processed by a single CPU when providing the functions of the program in relation to that data packet. The overall effect of parallel program execution may be the same as the execution of a single program (eg, program 330 ) on the host CPU.

특수 CPU 중 하나는 초당 5 천만 패킷 정도의 데이터 패킷을 프로세싱할 수도 있다. 이 동작 속도는 호스트 CPU의 동작 속도보다 더 낮을 수도 있다. 따라서, 호스트 CPU 상에서 동등한 프로그램을 실행하여 달성될 것과 동일한 성능을 달성하기 위해 병렬화가 사용될 수도 있다. 병렬화를 수행하기 위해, 데이터 패킷은 CPU에 걸쳐 분산되고, 그 다음, CPU에 의한 프로세싱 이후 재정렬된다. 재정렬 단계(450)와 함께 각각의 플로우의 데이터 패킷을 순서대로 프로세싱하는 요건은 병목 현상을 도입할 수도 있고, 메모리 리소스 오버헤드를 증가시킬 수도 있고, 디바이스의 이용 가능한 스루풋을 제한할 수도 있다. 이러한 요건 및 재정렬 단계(450)는, 프로세싱 스루풋이 네트워크 트래픽의 내용 및 병렬성이 적용될 수 있는 정도에 따라 변동될 수도 있기 때문에, 디바이스의 지터(jitter)를 증가시킬 수도 있다.One of the special CPUs can even process data packets as high as 50 million packets per second. This operating speed may be lower than the operating speed of the host CPU. Thus, parallelization may be used to achieve the same performance that would be achieved by running an equivalent program on the host CPU. To perform parallelization, data packets are distributed across the CPU and then reordered after processing by the CPU. The requirement to process each flow's data packets in order in conjunction with the reordering step 450 may introduce a bottleneck, increase memory resource overhead, and limit the available throughput of the device. This requirement and reordering step 450 may increase the jitter of the device as processing throughput may vary depending on the content of the network traffic and the degree to which parallelism can be applied.

그러한 특수 CPU의 사용의 한 가지 이점은 짧은 컴파일 시간일 수도 있다. 예를 들면, 그러한 CPU 상에서 1 초 이내에 실행되도록 필터링 애플리케이션을 컴파일하는 것이 가능할 수도 있다.One advantage of using such a special CPU may be shorter compile times. For example, it may be possible to compile a filtering application to run in less than a second on such a CPU.

이 접근법이 더 높은 링크 속도로 확장될 때, CPU의 어레이의 사용에서 문제가 있을 수도 있다. 호스트 네트워크 인터페이스는, 가까운 미래에 테라비트/s 속도에 도달할 것을 요구받을 수도 있다. CPU의 그러한 어레이(410)를 이들 더 높은 속도로 확장할 때, 필요로 되는 전력의 양이 문제가 될 수 있다.When this approach scales to higher link speeds, there may be problems in the use of the CPU's array. Host network interfaces may be required to reach terabit/s speeds in the near future. When scaling such an array 410 of CPUs to these higher speeds, the amount of power required can be an issue.

다른 제안은, 네트워크 인터페이스 디바이스 내에, 필드 프로그래머블 게이트 어레이(FPGA)를 포함하는 것 및 FPGA를 사용하여 네트워크로부터 수신되는 데이터 패킷에 대해 동작을 수행하는 것이다.Another proposal is to include, within a network interface device, a field programmable gate array (FPGA) and use an FPGA to perform operations on data packets received from the network.

네트워크 인터페이스 디바이스(500)에서 수신되는 데이터 패킷에 대해 동작을 수행하기 위한 FPGA 애플리케이션(515)을 구비하는 FPGA(510)의, 네트워크 인터페이스 디바이스(500)에서의 사용의 예를 예시하는 도 5에 대한 참조가 이루어진다. 도 4에서의 것들과 동일한 엘리먼트는 동일한 참조 번호를 사용하여 참조된다.5 , which illustrates an example of use in a network interface device 500 of an FPGA 510 having an FPGA application 515 for performing operations on data packets received at the network interface device 500 . Reference is made Elements that are the same as those in FIG. 4 are referenced using the same reference numerals.

도 5가 네트워크로부터 수신되는 데이터 패킷에 대해서만 동작하는 FPGA 애플리케이션(515)을 예시하고 있지만, 그러한 FPGA 애플리케이션(515)은, 네트워크를 통한 송신을 위해 또는 호스트 또는 시스템 상의 다른 네트워크 인터페이스로의 되송신을 위해 호스트로부터 수신되는 이들 데이터 패킷에 대해 기능(예를 들면, 부하 밸런싱 및/또는 방화벽 기능)을 수행하기 위해 사용될 수도 있다.Although FIG. 5 illustrates an FPGA application 515 that only operates on data packets received from a network, such FPGA application 515 can be used for transmission over a network or back to another network interface on a host or system. It may also be used to perform functions (eg, load balancing and/or firewall functions) on these data packets received from the host for this purpose.

FPGA 애플리케이션(515)은, FPGA(510) 상에서 실행되도록 C 또는 C++ 또는 스칼라와 같은 공통 시스템 레벨 언어로 작성되는 프로그램을 컴파일하는 것에 의해 제공될 수도 있다.The FPGA application 515 may be provided by compiling a program written in a common system level language such as C or C++ or Scala to run on the FPGA 510 .

그 FPGA(510)는 네트워크 인터페이스 기능성(functionality) 및 FPGA 기능성을 가질 수도 있다. FPGA 기능성은, 네트워크 인터페이스 디바이스 유저의 필요에 따라 FPGA(510)로 프로그래밍될 수도 있는 FPGA 애플리케이션(515)을 제공할 수도 있다. FPGA 애플리케이션(515)은, 예를 들면, 네트워크(230)로부터 호스트로의 수신 경로 상의 메시지의 필터링을 제공할 수도 있다. FPGA 애플리케이션(515)은 방화벽을 제공할 수도 있다.The FPGA 510 may have network interface functionality and FPGA functionality. The FPGA functionality may provide an FPGA application 515 that may be programmed into the FPGA 510 according to the needs of a network interface device user. FPGA application 515 may provide, for example, filtering of messages on a receive path from network 230 to a host. The FPGA application 515 may provide a firewall.

FPGA(510)는 FPGA 애플리케이션(515)을 제공하도록 프로그래밍 가능할 수도 있다. 네트워크 인터페이스 디바이스 기능성 중 일부는 FPGA(510) 내에서 "하드(hard)" 로직으로서 구현될 수도 있다. 예를 들면, 하드 로직은 주문형 집적 회로(application specific integrated circuit; ASIC) 게이트일 수도 있다. FPGA 애플리케이션(515)은 "소프트" 로직으로서 구현될 수도 있다. 소프트 로직은 FPGA LUT(look up table; 룩업 테이블)를 프로그래밍하는 것에 의해 제공될 수도 있다. 하드 로직은 소프트 로직과 비교하여 더 높은 레이트에서 클록킹될(clocked) 수 있을 수도 있다.FPGA 510 may be programmable to provide FPGA applications 515 . Some of the network interface device functionality may be implemented as “hard” logic within the FPGA 510 . For example, the hard logic may be an application specific integrated circuit (ASIC) gate. FPGA application 515 may be implemented as “soft” logic. Soft logic may be provided by programming an FPGA look up table (LUT). Hard logic may be clocked at a higher rate compared to soft logic.

네트워크 인터페이스 디바이스(500)는 호스트와 데이터를 전송 및 수신하도록 구성되는 호스트 인터페이스(505)를 포함한다. 네트워크 인터페이스 디바이스(520)는 네트워크와 데이터를 전송 및 수신하도록 구성되는 네트워크 매체 액세스 제어(MAC) 인터페이스(520)를 포함한다.Network interface device 500 includes a host interface 505 configured to transmit and receive data with a host. The network interface device 520 includes a network medium access control (MAC) interface 520 that is configured to transmit and receive data with a network.

데이터 패킷이 MAC 인터페이스(520)에서 네트워크로부터 수신되는 경우, 데이터 패킷은, 데이터 패킷과 관련하여, 필터링과 같은 기능을 수행하도록 구성되는 FPGA 애플리케이션(515)으로 전달된다. 데이터 패킷은 (그것이 임의의 필터링을 통과하면) 그 다음, 호스트 인터페이스(505)로 전달되고, 그것은 여기서 호스트 인터페이스(505)로부터 호스트로 전달된다. 대안적으로, 데이터 패킷 FPGA 애플리케이션(515)은 데이터 패킷을 드랍할 것을 또는 재송신할 것을 결정할 수도 있다.When a data packet is received from the network at the MAC interface 520 , the data packet is passed to an FPGA application 515 that is configured to perform a function, such as filtering, with respect to the data packet. The data packet (if it passes any filtering) is then forwarded to the host interface 505, where it is forwarded from the host interface 505 to the host. Alternatively, the data packet FPGA application 515 may decide to drop or retransmit the data packet.

데이터 패킷과 관련하여 기능을 수행하기 위해 FPGA를 사용하는 이러한 접근법의 한 가지 문제는, 상대적으로 긴 컴파일 시간이 필요로 된다는 것이다. FPGA는, AND, OR, NOT, 등등과 같은 프리미티브 논리 연산을 개별적으로 나타내는 많은 로직 엘리먼트(예를 들면, 로직 셀)로 구성된다. 이들 로직 엘리먼트는 프로그래머블 인터커넥트를 사용하여 매트릭스로 배열된다. 기능을 제공하기 위해, 이들 로직 셀은 회로 정의 및 동기식 클록 타이밍 제약을 구현하기 위해 함께 동작할 필요가 있을 수도 있다. 각각의 로직 셀을 배치하는 것 및 셀 사이에서 라우팅하는 것은 알고리즘적으로 어려운 도전 과제일 수도 있다. 더 낮은 레벨의 활용도를 갖는 FPGA 상에서 컴파일하는 경우, 컴파일 시간은 10 분 미만일 수도 있다. 그러나, FPGA 디바이스가 다양한 애플리케이션에 의해 더 많이 활용되게 됨에 따라, 배치 및 루트(place and route)의 도전 과제는 증가할 수도 있고, 그 결과, 주어진 기능을 FPGA 상으로 컴파일하기 위한 시간은 증가한다. 그러한 만큼, 자신의 라우팅 리소스의 대부분이 이미 소비된 FPGA에 추가적인 로직을 추가하는 것은, 수 시간의 컴파일 시간이 걸릴 수도 있다.One problem with this approach of using FPGAs to perform functions with respect to data packets is that they require relatively long compile times. FPGAs are made up of many logic elements (eg, logic cells) that individually represent primitive logic operations such as AND, OR, NOT, and the like. These logic elements are arranged in a matrix using programmable interconnects. To provide functionality, these logic cells may need to work together to implement circuit definitions and synchronous clock timing constraints. Placing each logic cell and routing between cells may be an algorithmically difficult challenge. Compilation times may be less than 10 minutes when compiling on FPGAs with lower levels of utilization. However, as FPGA devices become more utilized by a variety of applications, the challenges of place and route may increase and, as a result, the time to compile a given function onto the FPGA increases. As such, adding additional logic to an FPGA where most of its routing resources have already been consumed can take hours of compilation time.

한 가지 접근법은, 파싱, 매치 및 액션 프리미티브와 같은 특정한 프로세싱 프리미티브를 사용하여 하드웨어를 설계하는 것이다. 이들은, 모든 패킷이 세 가지 프로세스의 각각을 거치는 프로세싱 파이프라인을 구성하기 위해 사용될 수도 있다. 첫째, 프로토콜 헤더의 메타데이터 표현을 구성하기 위해 패킷이 파싱된다. 둘째, 패킷은 테이블에서 유지되는 규칙에 대해 유연하게 매치된다. 마지막으로, 매치가 발견되면, 매치 동작에서 선택되는 테이블로부터의 엔트리에 의존하여 패킷이 처리된다(actioned).One approach is to design the hardware using specific processing primitives such as parsing, match, and action primitives. These may be used to construct a processing pipeline where every packet goes through each of the three processes. First, the packet is parsed to construct a metadata representation of the protocol header. Second, packets are flexibly matched against rules maintained in the table. Finally, if a match is found, the packet is acted upon depending on the entry from the table selected in the match action.

파싱/매치/액션 모델을 사용하여 기능을 구현하기 위해, P4 프로그래밍 언어(또는 유사한 언어)가 사용될 수도 있다. P4 프로그래밍 언어는 타겟 독립적인데, P4로 작성되는 프로그램은, CPU, FPGA, ASIC, NPU, 등등과 같은 상이한 타입의 하드웨어에서 실행되도록 컴파일될 수 있다는 것을 의미한다. 각각의 상이한 타입의 타겟은, P4 소스 코드를 적절한 타겟 스위치 모델로 매핑하는 자기 자신의 컴파일러를 구비한다.The P4 programming language (or similar language) may be used to implement functions using the parsing/match/action model. The P4 programming language is target independent, meaning that programs written in P4 can be compiled to run on different types of hardware, such as CPU, FPGA, ASIC, NPU, etc. Each different type of target has its own compiler that maps the P4 source code to the appropriate target switch model.

P4는, 하이 레벨 프로그램이 패킷 프로세싱 파이프라인에 대한 패킷 프로세싱 동작을 표현하는 것을 허용하는 프로그래밍 모델을 제공하기 위해 사용될 수도 있다. 이 접근법은, 자기 자신을 선언적 스타일로 자연스럽게 표현하는 동작에 대해 잘 작용한다. P4 언어에서, 프로그래머는 파싱, 매칭, 및 액션 스테이지를 수신된 데이터 패킷에 대해 수행될 동작으로서 표현한다. 이들 동작은 전용 하드웨어가 효율적으로 수행하도록 함께 모인다. 그러나, 이 선언적 스타일은, eBPF 프로그램과 같은 명령적 성격(imperative nature)의 프로그램을 표현하는 데 적합하지 않을 수도 있다.P4 may be used to provide a programming model that allows high-level programs to express packet processing operations for the packet processing pipeline. This approach works well for movements that naturally express themselves in a declarative style. In the P4 language, the programmer expresses the parsing, matching, and action stages as operations to be performed on received data packets. These operations are brought together so that dedicated hardware performs them efficiently. However, this declarative style may not be suitable for expressing a program of an imperative nature, such as an eBPF program.

네트워크 인터페이스 디바이스에서, eBPF 프로그램의 시퀀스는 직렬로(serially) 실행될 것을 요구받을 수도 있다. 이 경우, 하나가 다른 것을 호출하는, eBPF 프로그램의 체인이 생성된다. 각각의 프로그램은 상태를 수정할 수 있고, 출력은, 마치 프로그램의 전체 체인이 직렬로 실행되는 것과 같다. 컴파일러가 모든 파싱, 매칭 및 액션 단계를 수집하는 것은 어려울 수도 있다. 그러나, 심지어 eBPF 프로그램의 체인이 이미 설치된 경우에도, 체인을 설치, 제거, 또는 수정하는 것이 필요할 수도 있는데, 이것은 추가적인 도전 과제를 제시할 수도 있다.In a network interface device, a sequence of eBPF programs may be required to be executed serially. In this case, a chain of eBPF programs is created, one calling the other. Each program can modify its state, and the output is as if the entire chain of programs were running serially. It can be difficult for the compiler to collect all the parsing, matching and action steps. However, even if a chain of eBPF programs is already installed, it may be necessary to install, uninstall, or modify the chain, which may present additional challenges.

반복 실행을 필요로 하는 그러한 프로그램의 예를 제공하기 위해, 데이터 패킷을 프로세싱하도록 구성되는 프로그램(e₁, e₂, e₃)의 시퀀스의 예를 예시하는 도 10에 대한 참조가 이루어진다. 예를 들면, 프로그램의 각각은 eBPF 프로그램일 수도 있다. 프로그램의 각각은, 수신 데이터 패킷을 파싱하도록, 테이블(1010)에 대한 룩업을 수행하여 테이블(1010) 내의 매치하는 엔트리에서의 액션을 결정하도록, 그 다음, 데이터 패킷과 관련하여 액션을 수행하도록 구성된다. 액션은 패킷을 수정하는 것을 포함할 수도 있다. eBPF 프로그램의 각각은 로컬 및 공유된 상태에 의존하여 액션을 또한 수행할 수도 있다. 데이터 패킷(P₀)은, 파이프라인에서의 다음 번 프로그램(e₂)으로 전달, 수정되기 이전에, eBPF 프로그램(e₁)에 의해 초기에 프로세싱된다. 프로그램의 시퀀스의 출력은, 파이프라인에서의 최종 프로그램, 즉 e₃의 출력이다.To provide an example of such a program that requires repeated execution, reference is made to FIG. 10 , which illustrates an example of a sequence of _{programs e 1} , e ₂ , e _{3 configured to process data packets.} For example, each of the programs may be an eBPF program. Each of the programs is configured to perform a lookup on the table 1010 to determine an action on a matching entry in the table 1010 to parse the received data packet, and then perform an action with respect to the data packet. do. The action may include modifying the packet. Each of the eBPF programs may also perform actions depending on local and shared state. The data packet P ₀ is initially processed by the _{eBPF program e 1} before being forwarded and modified to the next program e _{2 in the pipeline.} The output of the sequence of programs is the output of the last program in the pipeline, ie e ₃ .

n 개의 그러한 프로그램의 각각의 효과를 단일의 P4 프로그램으로 결합하는 것은 컴파일러에 복잡할 수도 있다. 추가적으로, 소정의 프로그래밍 모델(예컨대 XDP)은, 변화하는 상황에 응답하여 프로그램의 시퀀스의 임의의 지점에서, 프로그램이 재빨리 동적으로 삽입되고 제거되는 것을 필요로 할 수도 있다.Combining the effects of each of n such programs into a single P4 program may be complex for the compiler. Additionally, certain programming models (eg, XDP) may require that programs be quickly and dynamically inserted and removed at any point in the sequence of programs in response to changing circumstances.

애플리케이션의 몇몇 실시형태에 따르면, 복수의 프로세싱 유닛을 포함하는 네트워크 인터페이스 디바이스가 제공된다. 각각의 프로세싱 유닛은 하드웨어에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 각각의 프로세싱 유닛은 그 자신의 로컬 상태를 저장하는 메모리를 포함한다. 각각의 프로세싱 유닛은 이 상태를 수정하는 디지털 회로를 포함한다. 디지털 회로는 주문형 집적 회로일 수도 있다. 각각의 프로세싱 유닛은, 각각의 복수의 동작을 수행하기 위해, 구성 가능한 파라미터를 포함하는 프로그램을 실행하도록 구성된다. 각각의 프로세싱 단위는 최소 단위(atom)일 수도 있다. 최소 단위는 미리 정의된 템플릿의 특정한 프로그래밍 및 라우팅에 의해 정의된다. 이것은, 연결된 복수의 프로세싱 유닛에 의해 제공되는 플로우에서의 그것의 특정한 동작 거동 및 논리적 장소를 정의한다. 본 명세서에서 용어 '최소 단위'가 사용되는 경우, 이것은, 단일의 단계에서 자신의 동작을 실행하도록 구성되는 데이터 프로세싱 유닛을 지칭하는 것으로 이해될 수도 있다. 다시 말하면, 최소 단위는 자신의 동작을 최소 단위 동작으로서 실행한다.According to some embodiments of the application, a network interface device comprising a plurality of processing units is provided. Each processing unit is configured to perform at least one predefined operation in hardware. Each processing unit includes a memory that stores its own local state. Each processing unit includes digital circuitry that modifies this state. The digital circuit may be an application specific integrated circuit. Each processing unit is configured to execute a program comprising configurable parameters to perform a respective plurality of operations. Each processing unit may be a smallest atom. The smallest unit is defined by specific programming and routing of predefined templates. It defines its specific operational behavior and logical place in a flow provided by a plurality of connected processing units. When the term 'minimum unit' is used herein, it may be understood to refer to a data processing unit configured to execute its operation in a single step. In other words, the minimum unit executes its operation as the minimum unit operation.

최소 단위는, 하나 이상의 입력을 취하고 하나 이상의 출력을 생성하는, 다양한 종류의 계산 중 하나를 반복적으로 수행하도록 구성될 수 있는 하드웨어 구조물의 모음으로서 간주될 수도 있다.A minimal unit may be thought of as a collection of hardware structures that can be configured to iteratively perform one of various kinds of calculations, taking one or more inputs and producing one or more outputs.

최소 단위는 하드웨어에 의해 제공된다. 최소 단위는 컴파일러에 의해 구성될 수도 있다. 최소 단위는 계산을 수행하도록 구성될 수도 있다.The smallest unit is provided by the hardware. The minimum unit may be configured by the compiler. The smallest unit may be configured to perform calculations.

컴파일 동안, 복수의 프로세싱 유닛 중 적어도 일부는, 복수의 프로세싱 유닛 중 적어도 일부에 의해 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷과 관련하여 기능이 수행되도록 동작을 수행하도록 배열된다. 복수의 프로세싱 유닛 중 적어도 일부의 각각은, 데이터 패킷과 관련하여 기능을 수행하기 위해, 자신의 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 다시 말하면, 연결된 프로세싱 유닛이 수행하도록 구성되는 동작은 수신된 데이터 패킷과 관련하여 수행된다. 동작은 복수의 프로세싱 유닛 중 적어도 일부에 의해 순차적으로 수행된다. 집합적으로, 복수의 동작의 각각의 성능은 수신된 패킷과 관련하여, 기능, 예를 들면, 필터링을 제공한다.During compilation, at least some of the plurality of processing units are arranged to perform an operation such that a function is performed in relation to a data packet received at the network interface device by the at least some of the plurality of processing units. Each of at least some of the plurality of processing units is configured to perform its respective at least one predefined operation to perform a function with respect to the data packet. In other words, the operations the connected processing unit is configured to perform are performed in relation to the received data packets. The operations are sequentially performed by at least some of the plurality of processing units. Collectively, each performance of the plurality of operations provides functionality, eg, filtering, with respect to received packets.

기능을 수행하기 위해 그들 각각의 적어도 하나의 미리 정의된 동작을 실행하도록 최소 단위의 각각을 배열하는 것에 의해, 도 5와 관련하여 상기에서 설명되는 FPGA 애플리케이션 예와 비교하여 컴파일 시간은 감소될 수도 있다. 더구나, 하드웨어에서 특정한 동작을 수행하는 것으로 구체적으로 전용되는 프로세싱 유닛을 사용하여 기능을 수행하는 것에 의해, 도 4와 관련하여 상기에서 논의되는 바와 같은 각각의 데이터 패킷에 대한 기능을 수행하기 위해 네트워크 인터페이스 디바이스에서 소프트웨어를 실행하는 CPU를 사용하는 것과 관련하여, 기능이 수행될 수 있는 속도는 향상될 수도 있다.By arranging each of the smallest units to execute at least one predefined operation of each of them to perform a function, the compile time may be reduced compared to the FPGA application example described above with respect to FIG. 5 . . Moreover, the network interface to perform the function for each data packet as discussed above with respect to FIG. 4 by performing the function using a processing unit specifically dedicated to performing a particular operation in hardware. With respect to using a CPU to run software in a device, the speed at which functions can be performed may be improved.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(600)의 예를 예시하는 도 6에 대한 참조가 이루어진다. 네트워크 인터페이스 디바이스는, 네트워크 인터페이스 디바이스(600)의 인터페이스에서 수신되는 데이터 패킷의 프로세싱을 수행하도록 구성되는 하드웨어 모듈(610)을 포함한다. 도 6이 수신 경로 상의 데이터 패킷에 대한 기능(예를 들면, 필터링)을 수행하는 하드웨어 모듈(610)을 예시하지만, 하드웨어 모듈(610)은, 호스트로부터 수신되는 송신 경로 상의 데이터 패킷에 대한 기능(예를 들면, 부하 밸런싱 또는 방화벽)을 수행하기 위해 또한 사용될 수도 있다.Reference is made to FIG. 6 , which illustrates an example of a network interface device 600 according to an embodiment of the present application. The network interface device includes a hardware module 610 configured to perform processing of data packets received at an interface of the network interface device 600 . Although FIG. 6 illustrates a hardware module 610 that performs functions (eg, filtering) on data packets on a receive path, the hardware module 610 performs functions on data packets on a transmit path that are received from a host ( It may also be used to perform load balancing or firewalls, for example.

네트워크 인터페이스 디바이스(600)는, 호스트와 데이터 패킷을 전송 및 수신하기 위한 호스트 인터페이스(620) 및 네트워크와 데이터 패킷을 전송 및 수신하기 위한 네트워크 MAC 인터페이스(630)를 포함한다.The network interface device 600 includes a host interface 620 for transmitting and receiving data packets with a host and a network MAC interface 630 for transmitting and receiving data packets with the network.

네트워크 인터페이스 디바이스(600)는 복수의 프로세싱 유닛(640a, 640b, 640c, 640d)을 포함하는 하드웨어 모듈(610)을 포함한다. 프로세싱 유닛의 각각은 최소 단위 프로세싱 유닛일 수도 있다. 용어 최소 단위는 프로세싱 유닛을 지칭하기 위해 설명에서 사용된다. 프로세싱 유닛의 각각은 하드웨어에서 적어도 하나의 동작을 수행하도록 구성된다. 프로세싱 유닛의 각각은 적어도 하나의 동작을 수행하도록 구성되는 디지털 회로(645)를 포함한다. 디지털 회로(645)는 주문형 집적 회로일 수도 있다. 프로세싱 유닛의 각각은 상태 정보를 저장하는 메모리(650)를 추가적으로 포함한다. 디지털 회로(645)는 각각의 복수의 동작을 실행할 때 상태 정보를 업데이트한다. 로컬 메모리에 추가하여, 프로세싱 유닛의 각각은, 복수의 프로세싱 유닛의 각각이 액세스 가능한 상태 정보를 또한 저장할 수도 있는 공유 메모리(660)에 액세스할 수 있다.The network interface device 600 includes a hardware module 610 including a plurality of processing units 640a, 640b, 640c, 640d. Each of the processing units may be a minimum unit processing unit. The term smallest unit is used in the description to refer to a processing unit. Each of the processing units is configured to perform at least one operation in hardware. Each of the processing units includes digital circuitry 645 configured to perform at least one operation. The digital circuit 645 may be an application specific integrated circuit. Each of the processing units further includes a memory 650 for storing state information. The digital circuit 645 updates state information as it executes each of the plurality of operations. In addition to local memory, each of the processing units may access a shared memory 660 that may also store state information accessible to each of the plurality of processing units.

공유 메모리(660) 내의 상태 정보 및/또는 프로세싱 유닛의 메모리(650) 내의 상태 정보는 다음의 것 중 적어도 하나를 포함할 수도 있다: 프로세싱 유닛 사이에 전달되는 메타데이터, 임시 변수, 데이터 패킷의 콘텐츠, 하나 이상의 공유된 맵의 콘텐츠.State information in shared memory 660 and/or state information in memory 650 of a processing unit may include at least one of the following: metadata, temporary variables, and content of data packets passed between processing units. , the content of one or more shared maps.

정리하면, 복수의 프로세싱 유닛은 네트워크 인터페이스 디바이스(600)에서 수신되는 데이터 패킷과 관련하여 수행될 기능을 제공할 수 있다. 컴파일러는, 각각의 유입하는 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것에 의해 유입하는 데이터 패킷과 관련하여 기능을 수행하도록 하드웨어 모듈(610)을 구성하기 위한 명령어를 출력한다. 이것은, 연결된 프로세싱 유닛의 각각이 각각의 유입하는 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 동작을 수행하도록, 프로세싱 유닛(640a, 640b, 640c, 640d) 중 적어도 일부를 함께 체인화(즉, 연결)하는 것에 의해 달성될 수도 있다. 프로세싱 유닛의 각각은 기능을 수행하기 위해 그들 각각의 적어도 하나의 동작을 특정한 순서로 수행한다. 순서는 프로세싱 유닛 중 두 개 이상이 서로 병렬로, 즉, 동시에 실행되도록 하는 그러한 것일 수도 있다. 예를 들면, 하나의 프로세싱 유닛은, 제2 프로세싱 유닛이 동일한 데이터 패킷 내의 상이한 위치로부터 또한 판독하는 시간 기간(하드웨어 모듈(610)의 주기적 신호(예를 들면, 클록 신호)에 의해 정의됨) 동안 데이터 패킷으로부터 판독할 수도 있다.In summary, the plurality of processing units may provide functions to be performed in connection with data packets received at the network interface device 600 . The compiler is configured to: a hardware module to perform a function in relation to an incoming data packet by arranging at least some of the plurality of processing units to perform at least one predefined operation in each of them with respect to each incoming data packet Outputs a command for configuring 610 . This involves chaining (ie, concatenating) at least some of the processing units 640a , 640b , 640c , 640d together such that each of the connected processing units performs at least one of their respective operations with respect to each incoming data packet. It may be achieved by doing Each of the processing units performs their respective at least one operation in a particular order to perform a function. The order may be such that two or more of the processing units are executed in parallel with each other, ie concurrently. For example, one processing unit may be configured for a period of time (defined by a periodic signal (eg, a clock signal) of the hardware module 610 ) during which a second processing unit also reads from a different location within the same data packet. It can also read from data packets.

몇몇 실시형태에서, 데이터 패킷은 시퀀스에서 프로세싱 유닛에 의해 표현되는 각각의 스테이지를 통과한다. 이 경우, 각각의 프로세싱 유닛은, 데이터 패킷을 다음 번 프로세싱 유닛의 프로세싱을 수행하기 위해 다음 번 프로세싱 유닛으로 전달하기 이전에, 자신의 프로세싱을 완료한다.In some embodiments, the data packet passes through each stage represented by a processing unit in the sequence. In this case, each processing unit completes its processing before forwarding the data packet to the next processing unit to perform the processing of the next processing unit.

도 6에서 도시되는 예에서, 프로세싱 유닛(640a, 640b, 및 640d)은 컴파일시에 함께 연결되고, 그 결과, 그들의 각각은, 수신된 데이터 패킷과 관련하여 기능, 예를 들면, 필터링을 수행하기 위해 그들 각각의 적어도 하나의 동작을 수행한다. 프로세싱 유닛(640a, 640b, 640d)은 데이터 패킷을 프로세싱하기 위한 파이프라인을 형성한다. 데이터 패킷은 동일한 시간 기간을 각각 갖는 스테이지에서 이 파이프라인을 따라 이동할 수도 있다. 시간 기간은 기간 신호 또는 비트에 따라 정의될 수도 있다. 시간 기간은 클록 신호에 의해 정의될 수도 있다. 클록의 여러 가지 기간은 파이프라인의 각각의 스테이지에 대한 하나의 시간 기간을 정의할 수도 있다. 데이터 패킷은 반복하는 시간 기간의 각각의 발생의 끝에서 파이프라인에서 하나의 스테이지를 따라 이동한다. 시간 기간은 고정된 간격일 수도 있다. 대안적으로, 파이프라인에서의 스테이지에 대한 각각의 시간 기간은 가변적인 양의 시간을 필요로 할 수도 있다. 이전 프로세싱 스테이지가 동작을 완료한 경우 파이프라인에서 다음 번 스테이지를 나타내는 신호가 생성될 수도 있는데, 이것은 가변적인 양의 시간을 필요로 할 수도 있다. 어떤 미리 결정된 양의 시간 동안 신호를 지연시키는 것에 의해 파이프라인에서의 임의의 스테이지에서 스톨(stall)이 도입될 수도 있다.In the example shown in FIG. 6 , processing units 640a , 640b , and 640d are coupled together at compile time, so that each of them performs a function, eg, filtering, in relation to the received data packet. each of them performs at least one action for The processing units 640a, 640b, 640d form a pipeline for processing data packets. Data packets may travel along this pipeline in stages each having the same period of time. The time period may be defined according to a period signal or bit. The time period may be defined by a clock signal. The various periods of the clock may define one time period for each stage of the pipeline. The data packet travels along one stage in the pipeline at the end of each occurrence of a repeating time period. The time period may be a fixed interval. Alternatively, each time period for a stage in the pipeline may require a variable amount of time. When the previous processing stage has completed its operation, a signal may be generated indicative of the next stage in the pipeline, which may require a variable amount of time. A stall may be introduced at any stage in the pipeline by delaying the signal for some predetermined amount of time.

프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 공유 메모리(660)에 액세스하도록 구성될 수도 있다. 프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 서로 사이에서 메타데이터를 전달하도록 구성될 수도 있다. 프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 네트워크로부터 수신되는 데이터 패킷에 액세스하도록 구성될 수도 있다.Each of the processing units 640a , 640b , 640d may be configured to access the shared memory 660 as part of at least one operation of their respective one. Each of the processing units 640a , 640b , 640d may be configured to pass metadata between each other as part of their respective at least one operation. Each of the processing units 640a , 640b , 640d may be configured to access a data packet received from the network as part of at least one operation of their respective one.

이 예에서, 프로세싱 유닛(640c)은, 기능을 제공하기 위해 수신된 데이터 패킷의 프로세싱을 수행하도록 사용되는 것이 아리나, 파이프라인으로부터 생략된다.In this example, processing unit 640c is omitted from the pipeline, although it is not used to perform processing of received data packets to provide functionality.

네트워크 MAC 계층(630)에서 수신되는 데이터 패킷은 프로세싱을 위해 하드웨어 모듈(610)로 전달될 수도 있다. 도 6에서 도시되지는 않지만, 하드웨어 모듈(610)에 의해 수행되는 프로세싱은, 하드웨어 모듈(610)에 의해 제공되는 기능 외에 데이터 패킷과 관련하여 추가적인 기능을 제공하는 더 큰 프로세싱 파이프라인의 일부일 수도 있다. 이것은 도 14와 관련하여 예시되며, 하기에서 더욱 상세하게 설명될 것이다.Data packets received at the network MAC layer 630 may be forwarded to the hardware module 610 for processing. Although not shown in FIG. 6 , the processing performed by the hardware module 610 may be part of a larger processing pipeline that provides additional functionality with respect to data packets in addition to the functionality provided by the hardware module 610 . . This is illustrated with respect to FIG. 14 and will be described in more detail below.

제1 프로세싱 유닛(640a)은 데이터 패킷과 관련하여 적어도 하나의 제1 동작을 수행하도록 구성된다. 이 제1의 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함할 수도 있다: 데이터 패킷으로부터의 판독, 메모리(660)에서 공유된 상태에 대한 판독 및 기록, 및/또는 액션을 결정하기 위한 테이블에 대한 룩업의 수행. 그 다음, 제1 프로세싱 유닛(640a)은 자신의 적어도 하나의 동작으로부터 결과를 생성하도록 구성된다. 결과는 메타데이터의 형태일 수도 있다. 결과는 데이터 패킷에 대한 수정을 포함할 수도 있다. 결과는 메모리(660)의 공유된 상태에 대한 수정을 포함할 수도 있다. 제2 프로세싱 유닛(640b)은, 제1 프로세싱 유닛(640a)에 의해 실행되는 동작으로부터의 결과에 의존하여 제1 데이터 패킷과 관련하여 자신의 적어도 하나의 동작을 수행하도록 구성된다. 제2 프로세싱 유닛(640b)은 자신의 적어도 하나의 동작으로부터 결과를 생성하고, 그 결과를, 제1 데이터 패킷과 관련하여 자신의 적어도 하나의 동작을 수행하도록 구성되는 제3 프로세싱 유닛(640d)으로 전달한다. 제1 프로세싱 유닛(640a), 제2 프로세싱 유닛(640b) 및 제3 프로세싱 유닛(640d)은, 함께, 데이터 패킷과 관련하여 기능을 제공하도록 구성된다. 그 다음, 데이터 패킷은 호스트 인터페이스(620)로 전달될 수도 있는데, 이곳으로부터 그것은 호스트 시스템으로 전달된다.The first processing unit 640a is configured to perform at least one first operation with respect to the data packet. This first at least one operation may include at least one of: reading from a data packet, reading and writing to a shared state in memory 660 , and/or a table for determining an action. Perform a lookup on . The first processing unit 640a is then configured to generate a result from its at least one operation. The result may be in the form of metadata. The results may include modifications to the data packet. The results may include modifications to the shared state of memory 660 . The second processing unit 640b is configured to perform its at least one operation with respect to the first data packet depending on a result from the operation performed by the first processing unit 640a. The second processing unit 640b generates a result from its at least one operation and sends the result to a third processing unit 640d configured to perform its at least one operation with respect to the first data packet. transmit The first processing unit 640a , the second processing unit 640b and the third processing unit 640d are, together, configured to provide a function in connection with a data packet. The data packet may then be forwarded to a host interface 620 from where it is forwarded to the host system.

따라서, 연결된 프로세싱 유닛은 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 파이프라인을 형성한다는 것을 알 수도 있다. 이 파이프라인은 eBPF 프로그램의 프로세싱을 제공할 수도 있다. 파이프라인은 복수의 eBPF 프로그램의 프로세싱을 제공할 수도 있다. 파이프라인은 순서대로 실행되는 복수의 모듈의 프로세싱을 제공할 수도 있다.Accordingly, it may be seen that the connected processing unit forms a pipeline for processing data packets received at the network interface device. This pipeline may provide processing of the eBPF program. The pipeline may provide for processing of a plurality of eBPF programs. A pipeline may provide processing of a plurality of modules that are executed in sequence.

하드웨어 모듈(610)에서 프로세싱 유닛을 함께 연결하는 것은, 하드웨어 모듈(610)의 미리 합성된 상호 접속 패브릭(interconnection fabric)의 라우팅 기능을 프로그래밍하는 것에 의해 수행될 수도 있다. 이 상호 접속 패브릭은 하드웨어 모듈(610)의 다양한 프로세싱 유닛 사이의 연결을 제공한다. 상호 접속 패브릭은 패브릭에 의해 지원되는 토폴로지(topology)에 따라 프로그래밍된다. 가능한 예시적인 토폴로지가 도 15를 참조하여 하기에서 논의된다.Connecting the processing units together in the hardware module 610 may be performed by programming the routing function of a pre-assembled interconnection fabric of the hardware module 610 . This interconnect fabric provides connections between the various processing units of the hardware module 610 . The interconnect fabric is programmed according to the topology supported by the fabric. A possible exemplary topology is discussed below with reference to FIG. 15 .

하드웨어 모듈(610)은 적어도 하나의 버스 인터페이스를 지원한다. 적어도 하나의 버스 인터페이스는 (예를 들면, 호스트 또는 네트워크로부터) 하드웨어 모듈(610)에서 데이터 패킷을 수신한다. 적어도 하나의 버스 인터페이스는 하드웨어 모듈(610)로부터 (예를 들면, 호스트 또는 네트워크로) 데이터 패킷을 출력한다. 적어도 하나의 버스 인터페이스는 하드웨어 모듈(610)에서 제어 메시지를 수신한다. 제어 메시지는 하드웨어 모듈(610)을 구성하기 위한 것일 수도 있다.The hardware module 610 supports at least one bus interface. At least one bus interface receives data packets from the hardware module 610 (eg, from a host or network). At least one bus interface outputs data packets from the hardware module 610 (eg, to a host or network). At least one bus interface receives a control message from the hardware module 610 . The control message may be for configuring the hardware module 610 .

도 6에서 도시되는 예는 도 5에서 도시되는 FPGA 애플리케이션(515)과 관련하여 감소된 컴파일 시간의 이점을 갖는다. 예를 들면, 도 6의 하드웨어 모듈(610)은 필터링 기능을 컴파일하는 데 10 초 미만이 필요할 수도 있다. 도 6에서 도시되는 예는 도 4에서 도시되는 CPU의 어레이의 예와 관련하여 향상된 프로세싱 속도의 이점을 갖는다.The example shown in FIG. 6 has the benefit of reduced compile time with respect to the FPGA application 515 shown in FIG. 5 . For example, the hardware module 610 of FIG. 6 may require less than 10 seconds to compile the filtering function. The example shown in FIG. 6 has the advantage of improved processing speed with respect to the example of the array of CPUs shown in FIG. 4 .

애플리케이션은 일반 프로그램(또는 다수의 프로그램)을 미리 합성된 데이터 경로에 매핑하는 것에 의해 그러한 하드웨어 모듈(610)에서의 실행을 위해 컴파일될 수도 있다. 컴파일러는, 임의적인 수의 프로세싱 스테이지 인스턴스를 연결하는 것에 의해 데이터 경로를 구축하는데, 여기서 각각의 인스턴스는 미리 합성된 프로세싱 스테이지 최소 단위 중 하나로부터 구축된다.An application may be compiled for execution in such a hardware module 610 by mapping a generic program (or multiple programs) to a pre-synthesized data path. The compiler builds a data path by concatenating an arbitrary number of processing stage instances, each instance being built from one of the pre-synthesized processing stage minimal units.

최소 단위의 각각은 회로로부터 구축된다. 각각의 회로는 RTL(register transfer language; 레지스터 전송 언어) 또는 하이 레벨 언어를 사용하여 정의될 수도 있다. 각각의 회로는 컴파일러 또는 도구 체인(tool chain)을 사용하여 합성된다. 최소 단위는 하드 로직으로 합성될 수도 있고 따라서 네트워크 인터페이스 디바이스의 하드웨어 모듈에서 하드(ASIC) 리소스로서 이용 가능할 수도 있다. 최소 단위는 소프트 로직으로 합성될 수도 있다. 소프트 로직의 최소 단위는, 물리적 디바이스 상에서의 합성된 로직의 배치 및 루트 정보를 할당하고 유지하는 제약을 구비할 수도 있다. 최소 단위는 최소 단위의 거동을 명시하는 구성 가능한 파라미터를 사용하여 설계될 수도 있다. 각각의 파라미터는, 프로세싱 파이프라인의 클록 사이클 동안 프로세싱 유닛에 의해 수행될 적어도 하나의 동작을 명시할 수도 있는 변수, 또는 심지어 동작의 시퀀스(마이크로 프로그램)일 수도 있다. 최소 단위를 구현하는 로직은 동기식 또는 비동기식으로 클록될 수도 있다.Each of the smallest units is built from a circuit. Each circuit may be defined using a register transfer language (RTL) or high-level language. Each circuit is synthesized using a compiler or tool chain. The smallest unit may be synthesized in hard logic and thus may be available as a hard (ASIC) resource in a hardware module of the network interface device. The minimum unit may be synthesized by soft logic. The smallest unit of soft logic may have the constraint of allocating and maintaining route information and placement of the synthesized logic on the physical device. The minimum unit may be designed using configurable parameters that specify the behavior of the minimum unit. Each parameter may be a variable, or even a sequence of operations (microprogram), that may specify at least one operation to be performed by the processing unit during a clock cycle of the processing pipeline. The logic implementing the smallest unit may be clocked synchronously or asynchronously.

최소 단위 그 자체의 프로세싱 파이프라인은 주기적인 신호에 따라 동작하도록 구성될 수도 있다. 이 경우, 메타데이터 및 데이터 패킷의 각각은 신호의 각각의 발생에 응답하여 파이프라인을 따라 하나의 스테이지를 이동한다. 프로세싱 파이프라인은 비동기 방식으로 동작될 수도 있다. 이 경우, 파이프라인의 더 높은 레벨에서의 역압은, 각각의 다운스트림 스테이지로 하여금, 업스트림 스테이지로부터의 데이터가 자신에 제공된 경우에만 프로세싱을 시작하게 할 것이다.The processing pipeline of the smallest unit itself may be configured to operate according to a periodic signal. In this case, each of the metadata and data packets moves one stage along the pipeline in response to each occurrence of the signal. The processing pipeline may be operated in an asynchronous manner. In this case, backpressure at higher levels of the pipeline will cause each downstream stage to start processing only when data from the upstream stage has been provided to it.

복수의 그러한 최소 단위에 의해 실행될 기능을 컴파일할 때, 컴퓨터 코드 명령어의 시퀀스는 복수의 동작으로 분리되는데, 그 각각은 단일의 최소 단위로 매핑된다. 각각의 동작은 컴퓨터 코드 명령어에서의 분해된 명령어의 단일의 라인을 나타낼 수도 있다. 각각의 동작은 최소 단위 중 하나에 의해 실행되도록 최소 단위 중 하나에 할당된다. 컴퓨터 코드 명령어에서는 표현당 하나의 최소 단위가 있을 수도 있다. 각각의 최소 단위는 동작의 타입과 관련되며, 자신의 관련된 타입의 동작에 기초하여 컴퓨터 코드 명령어에서 적어도 하나의 동작을 실행하도록 선택된다. 예를 들면, 데이터 패킷으로부터 로드 동작을 수행하도록 최소 단위가 미리 구성될 수도 있다. 따라서, 그러한 최소 단위는 컴퓨터 코드의 데이터 패킷으로부터 로드 동작을 나타내는 명령어를 실행하도록 지정된다.When compiling functions to be executed by a plurality of such minimal units, the sequence of computer code instructions is divided into a plurality of operations, each mapped to a single minimal unit. Each operation may represent a single line of decomposed instructions in computer code instructions. Each operation is assigned to one of the smallest units to be executed by one of the smallest units. There may be one minimum unit per representation in computer code instructions. Each smallest unit is associated with a type of operation and is selected to execute at least one operation in computer code instructions based on its associated type of operation. For example, a minimum unit may be preconfigured to perform a load operation from a data packet. Accordingly, such smallest unit is designated to execute instructions representing a load operation from a data packet of computer code.

컴퓨터 코드 명령어에서 라인당 하나의 최소 단위가 선택될 수도 있다. 따라서, 그러한 최소 단위를 포함하는 하드웨어 모듈에서 기능을 구현할 때, 그러한 최소 단위가 100 개가 있을 수도 있는데, 각각은 그 데이터 패킷과 관련하여 기능을 수행하기 위해 그들 각각의 동작을 각각 수행한다.One minimum unit per line in computer code instructions may be selected. Accordingly, when implementing a function in a hardware module including such a minimum unit, there may be 100 such minimum unit, each performing their respective operation to perform a function with respect to the data packet.

각각의 최소 단위는, 자신의 관련된 동작의 타입을 결정하는 프로세싱 스테이지 템플릿의 세트 중 하나에 따라 구성될 수도 있다. 컴파일 프로세스는, 특정한 적어도 하나의 동작을 수행하도록 각각의 최소 단위를, 그것의 관련된 타입에 기초하여, 제어하기 위한 명령어를 생성하도록 구성된다. 예를 들면, 최소 단위가 패킷 액세스 동작을 수행하도록 미리 구성되는 경우, 컴파일 프로세스는, 그 최소 단위에, 패킷의 헤더로부터 소정의 정보(예를 들면, 패킷의 소스 ID)를 로딩하기 위한 동작을 할당할 수도 있다. 컴파일 프로세스는 하드웨어 모듈로 명령어를 전송하도록 구성되는데, 여기서 최소 단위는 컴파일 프로세스에 의해 그들에 할당되는 동작을 수행하도록 구성된다.Each minimal unit may be constructed according to one of a set of processing stage templates that determine its associated type of operation. The compilation process is configured to generate instructions for controlling each minimal unit, based on its associated type, to perform at least one particular operation. For example, if the smallest unit is preconfigured to perform a packet access operation, the compilation process performs an operation for loading, in that smallest unit, predetermined information (eg, the source ID of the packet) from the header of the packet. can also be assigned. The compilation process is configured to send instructions to the hardware modules, where the smallest unit is configured to perform operations assigned to them by the compilation process.

최소 단위의 거동을 명시하는 프로세싱 스테이지 템플릿은 로직 스테이지 템플릿(logic stage template)(예를 들면, 레지스터, 스크래치 패드 메모리(scratch pad memory), 및 스택뿐만 아니라 분기에 걸친 동작을 제공함), 패킷 액세스 상태 템플릿(예를 들면, 패킷 데이터 로드 및/또는 패킷 데이터 저장소를 제공함), 및 맵 액세스 스테이지 템플릿(예를 들면, 맵 룩업 알고리즘, 맵 테이블 사이즈)이다.A processing stage template that specifies the minimum level of behavior is a logic stage template (eg, providing operations across registers, scratch pad memory, and stack as well as branches), packet access state templates (eg, to provide packet data loads and/or packet data storage), and map access stage templates (eg, map lookup algorithms, map table sizes).

패킷 액세스 스테이지는 다음의 것 중 적어도 하나를 포함할 수 있다: 데이터 패킷으로부터 바이트의 시퀀스를 판독하는 것; 데이터 패킷에서 바이트의 하나의 시퀀스를 바이트의 상이한 시퀀스로 대체하는 것; 데이터 패킷에 바이트를 삽입하는 것; 및 데이터 패킷에서 바이트를 삭제하는 것.The packet access stage may include at least one of: reading a sequence of bytes from the data packet; replacing one sequence of bytes in a data packet with a different sequence of bytes; inserting bytes into data packets; and deleting bytes from the data packet.

맵 액세스 스테이지는, 직접 색인 어레이(direct indexed array) 및 연상 어레이(associative array)를 비롯한, 상이한 타입의 맵(예를 들면, 룩업 테이블)에 액세스하기 위해 사용될 수 있다. 맵 액세스 스테이지는 다음의 것 중 적어도 하나를 포함할 수도 있다: 위치로부터 값을 판독하는 것; 위치에 값을 기록하는 것; 맵 내의 한 위치에서의 값을 상이한 값으로 대체하는 것. 맵 액세스 스테이지는, 값이 맵 내의 한 위치로부터 판독되고 상이한 값과 비교되는 비교 동작을 포함할 수도 있다. 위치로부터 판독되는 값이 상이한 값보다 더 작으면, 그러면 제1 액션(예를 들면, 아무것도 하지 않음, 그 위치에서의 값을 상이한 값과 교환함, 또는 값을 함께 더함)이 수행될 수도 있다. 그렇지 않으면, 제2 액션(예를 들면, 아무것도 하지 않음, 값을 교환함, 또는 값을 추가함)이 수행될 수도 있다. 어느 경우든, 위치로부터 판독되는 값은 다음 번 프로세싱 스테이지로 제공될 수도 있다.The map access stage can be used to access different types of maps (eg, lookup tables), including direct indexed arrays and associative arrays. The map access stage may include at least one of: reading a value from a location; writing a value to a location; Replacing a value at one location in the map with a different value. The map access stage may include a compare operation in which a value is read from a location in the map and compared to a different value. If the value read from the location is less than the different value, then a first action (eg, do nothing, exchange the value at that location for a different value, or add the values together) may be performed. Otherwise, a second action (eg, do nothing, exchange a value, or add a value) may be performed. In either case, the value read from the location may be provided to the next processing stage.

각각의 맵 액세스 스테이지는 상태 보존형 프로세싱 유닛에서 구현될 수도 있다. 맵 액세스 스테이지의 프로세싱을 수행하도록 구성되는 최소 단위에 포함될 수도 있는 회로부(1700)의 예를 예시하는 도 17에 대한 참조가 이루어진다. 회로부(1700)는, 룩업 테이블에 대한 입력으로서 사용되는 입력 값의 해시를 수행하도록 구성되는 해시 기능(1710)을 포함할 수도 있다. 회로부(1700)는 최소 단위의 동작에 관련되는 상태를 저장하도록 구성되는 메모리(1720)를 포함한다. 회로부(1700)는 연산을 수행하도록 구성되는 산술 로직 유닛(1730)을 포함한다.Each map access stage may be implemented in a stateful processing unit. Reference is made to FIG. 17 , which illustrates an example of circuitry 1700 that may be included in the smallest unit configured to perform processing of the map access stage. Circuitry 1700 may include a hash function 1710 configured to perform a hash of an input value used as an input to a lookup table. The circuit unit 1700 includes a memory 1720 configured to store states related to a minimum unit of operation. The circuitry 1700 includes an arithmetic logic unit 1730 configured to perform operations.

로직 스테이지(logic stage)가 이전 스테이지에 의해 제공되는 값에 대한 계산을 수행할 수도 있다. 로직 스테이지를 구현하도록 구성되는 프로세싱 유닛은 상태 비보존형 프로세싱 유닛일 수도 있다. 각각의 상태 비보존형 프로세싱 유닛은 간단한 산술 연산을 수행할 수 있다. 각각의 프로세싱 유닛은, 예를 들면, 8 비트 연산을 수행할 수도 있다.A logic stage may perform calculations on values provided by previous stages. The processing unit configured to implement the logic stage may be a stateless processing unit. Each stateless processing unit can perform simple arithmetic operations. Each processing unit may perform an 8-bit operation, for example.

각각의 로직 스테이지는 상태 비보존형 프로세싱 유닛에서 구현될 수도 있다. 로직 스테이지의 프로세싱을 수행하도록 구성되는 최소 단위에 포함될 수도 있는 회로부(1800)의 예를 예시하는 도 18에 대한 참조가 이루어진다. 회로부(1800)는 산술 로직 유닛(ALU) 및 멀티플렉서의 어레이를 포함한다. ALU 및 멀티플렉서는 계층에서 배열되는데, ALU에 의한 프로세싱의 하나의 계층의 출력은, ALU의 다음 번 계층으로 입력을 제공하기 위해 멀티플렉서에 의해 사용된다.Each logic stage may be implemented in a stateless processing unit. Reference is made to FIG. 18 , which illustrates an example of circuitry 1800 that may be included in the smallest unit configured to perform processing of a logic stage. Circuitry 1800 includes an arithmetic logic unit (ALU) and an array of multiplexers. ALUs and multiplexers are arranged in layers, where the output of one layer of processing by the ALU is used by the multiplexer to provide an input to the next layer of the ALU.

하드웨어 모듈에서 구현되는 스테이지의 파이프라인은, 제1 패킷 액세스 스테이지(pkt0), 후속되는 제1 로직 스테이지(logic0), 후속되는 제1 맵 액세스 스테이지(map0), 후속되는 제2 로직 스테이지(logic1), 후속되는 제2 패킷 액세스 스테이지(pkt1), 및 등등을 포함할 수도 있다. 따라서, 그것은 다음의 형태를 취할 수도 있다:The pipeline of stages implemented in the hardware module includes a first packet access stage (pkt0), a first logical stage following (logic0), a first map access stage following (map0), and a subsequent second logic stage (logic1). , followed by a second packet access stage pkt1 , and the like. Thus, it may take the form:

Pkt0 -> logic0 -> map0 -> logic1 -> pkt1Pkt0 -> logic0 -> map0 -> logic1 -> pkt1

몇몇 예에서, 스테이지(pkt0)는 패킷으로부터 필요한 정보를 추출한다. 스테이지(pkt0)는 이 정보를 스테이지(logic0)로 전달한다. 스테이지(logic0)는 패킷이 유효한 IP 패킷인지의 여부를 결정한다. 몇몇 경우에, logic0이 맵 요청을 형성하고, 맵 동작을 실행하는 map0에 맵 요청을 전송한다. 스테이지(map0)는 룩업 테이블에 대한 업데이트를 수행할 수도 있다. 그 다음, 스테이지(logic1)는 맵 동작으로부터 결과를 수집하고 결과로서 패킷을 드랍할지의 여부를 결정한다.In some examples, stage pkt0 extracts the necessary information from the packet. The stage pkt0 passes this information to the stage logic0. A stage (logic0) determines whether the packet is a valid IP packet. In some cases, logic0 forms a map request and sends the map request to map0, which executes the map operation. The stage map0 may perform an update to the lookup table. The stage logic1 then collects the result from the map operation and decides whether to drop the packet as a result.

몇몇 경우에, 맵 요청은, 이 패킷에 대해 맵 동작을 수행되지 않아야 하는 경우를 커버하기 위해 디스에이블된다. 맵 동작이 수행되지 않는 경우, logic0은, logic1에, 패킷이 유효한 IP 패킷인지 또는 아닌지의 여부에 의존하여 패킷이 드랍되어야 하는지 또는 아닌지의 여부를 나타낸다. 몇몇 예에서, 룩업 테이블은 256 개의 엔트리를 포함하는데, 여기서 각각의 엔트리는 8 비트 값이다.In some cases, MAP requests are disabled to cover cases where MAP operations should not be performed on this packet. If the map operation is not performed, logic0 indicates to logic1 whether the packet should be dropped or not, depending on whether the packet is a valid IP packet or not. In some examples, the lookup table includes 256 entries, where each entry is an 8-bit value.

설명되는 이 예는 단지 다섯 개의 스테이지만을 포함한다. 그러나, 언급되는 바와 같이, 더 많은 것이 사용될 수도 있다. 더구나, 모든 동작은 모두가 순차적으로 실행될 필요는 없지만, 그러나, 동일한 데이터 패킷과 관련한 몇몇 동작은 상이한 프로세싱 유닛에 의해 동시적으로 실행될 수도 있다.This example described includes only five stages. However, as noted, more may be used. Moreover, not all operations need to all be executed sequentially, however, some operations involving the same data packet may be executed concurrently by different processing units.

도 6에서 도시되는 하드웨어 모듈(610)은 데이터 패킷과 관련하여 기능을 수행하기 위한 최소 단위의 단일의 파이프라인을 예시한다. 그러나, 하드웨어 모듈(610)은 데이터 패킷을 프로세싱하기 위한 복수의 파이프라인을 포함할 수도 있다. 복수의 파이프라인의 각각은 데이터 패킷과 관련하여 상이한 기능을 수행할 수도 있다. 하드웨어 모듈(610)은 하드웨어 모듈(610)의 최소 단위의 제1 세트를 인터커넥트하여 제1 데이터 프로세싱 파이프라인을 형성하도록 구성 가능하다. 하드웨어 모듈(610)은 또한, 하드웨어 모듈(610)의 최소 단위의 제2 세트를 인터커넥트하여 제2 데이터 프로세싱 파이프라인을 형성하도록 구성 가능하다.The hardware module 610 shown in FIG. 6 exemplifies a single pipeline of a minimum unit for performing a function in relation to a data packet. However, the hardware module 610 may include multiple pipelines for processing data packets. Each of the plurality of pipelines may perform a different function with respect to the data packet. The hardware module 610 is configurable to interconnect a first set of minimal units of the hardware module 610 to form a first data processing pipeline. The hardware module 610 is also configurable to interconnect a second set of minimum units of the hardware module 610 to form a second data processing pipeline.

복수의 프로세싱 유닛을 포함하는 하드웨어 모듈에서 구현될 기능을 컴파일하기 위해, 컴퓨터 코드의 시퀀스로부터 시작하는 일련의 단계가 실행될 수도 있다. 호스트 디바이스 상의 또는 네트워크 인터페이스 디바이스 상의 프로세서 상에서 실행될 수도 있는 컴파일러는, 컴퓨터 코드의 분해된 시퀀스에 액세스할 수 있다.A series of steps starting from a sequence of computer code may be executed to compile a function to be implemented in a hardware module comprising a plurality of processing units. A compiler, which may be executing on a processor on a host device or on a network interface device, has access to the decomposed sequence of computer code.

첫째, 컴파일러는 컴퓨터 코드 명령어 시퀀스를 별개의 스테이지로 분할하도록 구성된다. 각각의 스테이지는 상기에서 설명되는 프로세싱 스테이지 템플릿 중 하나에 따른 동작을 포함할 수도 있다. 예를 들면, 하나의 스테이지는 데이터 패킷으로부터의 판독을 제공할 수도 있다. 하나의 스테이지는 맵 데이터의 업데이트를 제공할 수도 있다. 다른 스테이지는 패스 드랍 결정(pass drop)을 내릴 수도 있다. 컴파일러는 코드에 의해 표현되는 복수의 동작의 각각을 복수의 스테이지 중 하나에 할당한다.First, the compiler is configured to split a sequence of computer code instructions into distinct stages. Each stage may include an operation according to one of the processing stage templates described above. For example, one stage may provide a read from a data packet. One stage may provide an update of map data. Other stages may make pass drop decisions. The compiler assigns each of the plurality of operations represented by the code to one of the plurality of stages.

둘째, 컴파일러는 상이한 프로세싱 유닛에 의해 수행될 코드로부터 결정되는 프로세싱 스테이지의 각각을 할당하도록 구성된다. 이것은, 프로세싱 스테이지의 각각의 적어도 하나의 동작의 각각이 상이한 프로세싱 스테이지에 의해 실행된다는 것을 의미한다. 그 다음, 컴파일러의 출력은, 프로세싱 유닛으로 하여금, 기능을 수행하기 위해 각각의 스테이지의 동작을 특정한 순서로 수행하게 하기 위해 사용될 수 있다.Second, the compiler is configured to allocate each of the processing stages determined from code to be executed by a different processing unit. This means that each of each at least one operation of the processing stage is executed by a different processing stage. The output of the compiler can then be used to cause the processing unit to perform the operations of each stage in a particular order to perform the function.

컴파일러의 출력은, 하드웨어 모듈의 프로세싱 유닛으로 하여금 각각의 프로세싱 스테이지와 관련되는 동작을 실행하게 하기 위해 사용되는 생성된 명령어를 포함한다.The output of the compiler includes generated instructions that are used to cause the processing units of the hardware modules to execute operations associated with each processing stage.

컴파일러의 출력은 또한, 하드웨어 모듈(610)을 구성하기 위한 제어 메시지에 응답하는 로직을 하드웨어 모듈에서 생성하기 위해 사용될 수도 있다. 그러한 제어 메시지는 도 14와 관련하여 하기에서 더욱 상세하게 설명된다.The output of the compiler may also be used to generate logic in the hardware module that responds to control messages for configuring the hardware module 610 . Such control messages are described in more detail below with respect to FIG. 14 .

네트워크 인터페이스 디바이스(600) 상에서 실행될 기능을 컴파일하기 위한 컴파일 프로세스는, 그 기능을 제공하기 위한 프로세스가 호스트 디바이스의 커널에서의 실행에 안전하다는 것을 결정하는 것에 응답하여 수행될 수도 있다. 프로그램의 안전성의 결정은 도 3과 관련하여 상기에서 설명되는 바와 같이 적절한 검증자에 의해 실행될 수도 있다. 일단 프로세스가 커널에서의 실행에 안전한 것으로 결정되면, 프로세스는 네트워크 인터페이스 디바이스에서의 실행을 위해 컴파일될 수도 있다.A compilation process for compiling a function to be executed on the network interface device 600 may be performed in response to determining that the process for providing the function is safe for execution in the kernel of the host device. The determination of the safety of the program may be performed by an appropriate verifier as described above with respect to FIG. 3 . Once a process is determined to be safe for execution in the kernel, the process may be compiled for execution on a network interface device.

데이터 패킷과 관련하여 기능을 수행하기 위해 그들 각각의 적어도 하나의 동작을 수행하는 복수의 프로세싱 유닛 중 적어도 일부의 표현을 예시하는 도 15에 대한 참조가 이루어진다. 그러한 표현은 컴파일러에 의해 생성될 수도 있고, 기능을 수행하도록 하드웨어 모듈을 구성하기 위해 사용될 수도 있다. 표현은, 동작이 실행될 수도 있는 순서 및 프로세싱 유닛 중 일부가 그들의 동작을 병렬로 수행하는 방법을 나타낸다.Reference is made to FIG. 15 , which illustrates a representation of at least some of a plurality of processing units, each of which performs at least one operation to perform a function with respect to the data packet. Such representations may be generated by a compiler and used to configure hardware modules to perform functions. Representations indicate the order in which the operations may be executed and how some of the processing units perform their operations in parallel.

표현(1500)은 행과 열을 갖는 테이블의 형태이다. 테이블의 엔트리 중 일부는, 그들 각각의 동작을 수행하도록 구성되는 최소 단위, 예를 들면, 최소 단위(1510a)를 나타낸다. 프로세싱 유닛이 속하는 행은, 특정한 데이터 패킷과 관련하여 그 프로세싱 유닛에 의해 수행되는 동작의 타이밍을 나타낸다. 각각의 행은 클록 신호의 하나 이상의 사이클에 의해 표현되는 단일의 시간 기간에 대응할 수도 있다. 동일한 행에 속하는 프로세싱 유닛은 그들의 동작을 병렬로 수행한다.Representation 1500 is in the form of a table with rows and columns. Some of the entries in the table indicate a minimum unit configured to perform their respective operations, for example, a minimum unit 1510a. The row to which a processing unit belongs indicates the timing of operations performed by that processing unit with respect to a particular data packet. Each row may correspond to a single period of time represented by one or more cycles of the clock signal. Processing units belonging to the same row perform their operations in parallel.

로직 스테이지에 대한 입력은, 행 0에서 제공되고, 계산은 나중의 행을 향해 순방향으로 흐른다. 디폴트로, 최소 단위는 그 자신과 동일한 열에 있는 그러나 이전 행에 있는 최소 단위에 의한 프로세싱으로부터 결과를 수신한다. 예를 들면, 최소 단위(1510b)는 최소 단위(1510a)에 의한 프로세싱으로부터 결과를 수신하고, 이들 결과에 의존하여 그 자신의 프로세싱을 수행한다.The input to the logic stage is provided in row 0, and the computation flows forward towards the later row. By default, the smallest unit receives results from processing by the smallest unit in the same column as itself but in the previous row. For example, the minimum unit 1510b receives results from processing by the minimum unit 1510a and relies on these results to perform its own processing.

로컬 라우팅 리소스를 사용하는 경우, 최소 단위는, 열 번호가 두 개 이하만큼 상이한 이전 행에 있는 최소 단위로부터의 출력에 또한 액세스할 수도 있다. 예를 들면, 최소 단위(1510d)는 최소 단위(1510c)에 의해 수행되는 프로세싱으로부터 결과를 수신할 수도 있다.When using local routing resources, the smallest unit may also access the output from the smallest unit in the previous row where the column numbers differ by no more than two. For example, the smallest unit 1510d may receive a result from the processing performed by the smallest unit 1510c.

글로벌 라우팅 리소스를 사용하는 경우, 최소 단위는 이전 두 개의 행에 있는 그리고 임의의 열에 있는 최소 단위로부터의 출력에 또한 액세스할 수도 있다. 이것은 글로벌 라우팅 리소스를 사용하여 수행될 수도 있다. 예를 들면, 최소 단위(1510f)는 최소 단위(1510e)에 의해 수행되는 프로세싱으로부터 결과를 수신할 수도 있다.When using global routing resources, the smallest unit may also access the output from the smallest unit in the previous two rows and in any column. This may be done using global routing resources. For example, the smallest unit 1510f may receive a result from the processing performed by the smallest unit 1510e.

최소 단위 사이의 라우팅에 관한 이들 제약은 예로서 주어지며 다른 제약이 적용될 수도 있다. 더욱 제한적인 구속(restraint)을 적용하는 것은, 최소 단위 사이의 정보의 라우팅을 더 쉽게 만들 수도 있다. 덜 제한적인 구속을 적용하는 것은, 스케줄링을 더 쉽게 만들 수도 있다. 주어진 타입(예를 들면, 맵, 로직 또는 패킷 액세스)의 최소 단위의 수가 소진되거나 또는 최소 단위 사이의 라우팅이 이루어질 수 없는 경우, 그러면, 하드웨어 모듈로의 기능의 컴파일은 실패할 것이다.These constraints regarding routing between minimum units are given as examples and other constraints may apply. Applying more restrictive constraints may make routing of information between the smallest units easier. Applying less restrictive constraints may make scheduling easier. If the number of minimum units of a given type (eg, map, logic or packet access) is exhausted or routing between the minimum units cannot be made, then compilation of the function into a hardware module will fail.

특정한 제약은, 하드웨어 모듈에 의해 지원되는 상호 접속 패브릭에 의해 지원되는 토폴로지에 의해 결정된다. 상호 접속 패브릭은, 하드웨어 모듈의 최소 단위로 하여금 그들의 동작을 특정한 순서로 실행하게 하도록 그리고 제약 조건 내에서 서로 사이에서 데이터를 제공하도록 프로그래밍된다. 도 15는 상호 접속 패브릭이 어떻게 그렇게 프로그래밍될 수도 있는지의 하나의 특정한 예를 도시한다.Specific constraints are determined by the topology supported by the interconnect fabric supported by the hardware module. The interconnect fabric is programmed to cause the smallest units of hardware modules to perform their operations in a particular order and to provide data between each other within constraints. 15 shows one specific example of how an interconnect fabric may be programmed as such.

(도 5에서 예시되는 바와 같은) FPGA 상으로의 FPGA 애플리케이션(515)의 합성 동안 배치 및 루트 알고리즘이 사용된다. 그러나, 이 경우, 솔루션 공간이 제한되고, 따라서, 알고리즘은 짧은 경계의 실행 시간을 갖는다.A placement and route algorithm is used during synthesis of the FPGA application 515 onto the FPGA (as illustrated in FIG. 5 ). However, in this case, the solution space is limited, and thus the algorithm has a short bounded execution time.

프로세싱 속도 또는 효율성과 컴파일 시간 사이에는 트레이드오프(trade-off)가 존재한다. 본 출원의 실시형태에 따르면, 수신된 데이터 패킷과 관련하여 기능을 제공하기 위한 적어도 하나의 프로세싱 유닛(이것은 도 6과 관련하여 상기에서 설명되는 바와 같이 최소 단위 또는 CPU일 수도 있음) 상에서 프로그램을 초기에 컴파일하고 실행하는 것이 바람직할 수도 있다. 그 다음, 적어도 하나의 프로세싱 유닛은 제1 시간 기간 동안 수신된 데이터 패킷과 관련하여 기능을 실행하고 수행할 수도 있다. 네트워크 인터페이스 디바이스의 동작 동안, 데이터 패킷과 관련하여 기능을 수행하기 위해 제2의 적어도 하나의 프로세싱 유닛(이것은 도 6과 관련하여 상기에서 설명되는 바와 같이 FPGA 애플리케이션 또는 템플릿 타입의 프로세싱 유닛일 수도 있음)이 구성될 수도 있다. 그 다음, 제2의 적어도 하나의 프로세싱 유닛이 네트워크 인터페이스 디바이스에서 후속하는 수신된 데이터 패킷에 대한 기능을 수행하도록, 기능은, 그 다음, 제1의 적어도 하나의 프로세싱 유닛으로부터 제2의 적어도 하나의 프로세싱 유닛으로 마이그레이션될 수 있다. 따라서, 제2의 적어도 하나의 프로세싱 유닛의 더 느린 컴파일 시간은, 기능이 제2의 적어도 하나의 프로세싱 유닛에 대해 컴파일 되기 이전에, 네트워크 인터페이스 디바이스가 데이터 패킷과 관련하여 기능을 수행하는 것을 방지하지 못하는데, 그 이유는, 제1의 적어도 하나의 프로세싱 유닛이 더 빨리 컴파일될 수 있고, 기능이 제2의 적어도 하나의 프로세싱 유닛에 대해 컴파일되는 동안 데이터 패킷과 관련하여 기능을 수행하기 위해 사용될 수 있기 때문이다. 제2의 적어도 하나의 프로세싱 유닛이 통상적으로 더 빠른 프로세싱 시간을 가지기 때문에, 컴파일될 때 제2의 적어도 하나의 프로세싱 유닛으로 이동하는 것은 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷의 더 빠른 프로세싱을 허용한다.There is a trade-off between processing speed or efficiency and compile time. According to an embodiment of the present application, a program is initialized on at least one processing unit (which may be a minimal unit or a CPU as described above with respect to FIG. 6 ) for providing a function in relation to a received data packet. It may be desirable to compile and run the . The at least one processing unit may then execute and perform a function with respect to the data packet received during the first time period. a second at least one processing unit (which may be a processing unit of the FPGA application or template type as described above with respect to FIG. 6 ) for performing a function in connection with the data packet during operation of the network interface device This may be configured. The function is then configured to: then perform the function, from the first at least one processing unit, to the second at least one It can be migrated to a processing unit. Thus, the slower compile time of the second at least one processing unit does not prevent the network interface device from performing a function with respect to the data packet before the function is compiled for the second at least one processing unit. The reason is that the first at least one processing unit may be compiled faster, and the function may be used to perform a function in connection with a data packet while the function is compiled for the second at least one processing unit. Because. Since the second at least one processing unit typically has faster processing times, moving to the second at least one processing unit when compiled allows for faster processing of data packets received at the network interface device.

본 출원의 실시형태에 따르면, 컴파일 프로세스는 데이터 프로세싱 시스템의 적어도 하나의 프로세서 상에서 실행되도록 구성될 수도 있는데, 적어도 하나의 프로세서는, 적절한 시간에 데이터 패킷과 관련하여 적어도 하나의 기능을 수행하기 위해, 제1의 적어도 하나의 프로세싱 유닛 및 제2의 적어도 하나의 프로세싱 유닛에 대한 명령어를 전송하도록 구성된다. 적어도 하나의 프로세서는 호스트 CPU를 포함할 수도 있다. 적어도 하나의 프로세서는 네트워크 인터페이스 디바이스 상에서 제어 프로세서를 포함할 수도 있다. 적어도 하나의 프로세서는 호스트 시스템 상의 하나 이상의 프로세서와 네트워크 인터페이스 디바이스 상의 하나 이상의 프로세서의 조합을 포함할 수도 있다.According to an embodiment of the present application, the compilation process may be configured to run on at least one processor of the data processing system, the at least one processor to perform at least one function with respect to the data packet at an appropriate time; and transmit instructions to the first at least one processing unit and the second at least one processing unit. The at least one processor may include a host CPU. The at least one processor may include a control processor on the network interface device. The at least one processor may include a combination of one or more processors on the host system and one or more processors on the network interface device.

따라서, 적어도 하나의 프로세서는 네트워크 인터페이스 디바이스의 제1의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 제1 컴파일 프로세스를 수행하도록 구성된다. 적어도 하나의 프로세싱 유닛은 또한, 네트워크 인터페이스 디바이스의 제2의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 제2 컴파일 프로세스를 수행하도록 구성된다. 제2 컴파일 프로세스의 완료 이전에, 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련하여 기능을 수행할 것을 제1의 적어도 하나의 프로세싱 유닛에 지시한다. 후속하여, 제2 컴파일 프로세스의 완료에 후속하여, 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작할 것을 제2의 적어도 하나의 프로세싱 유닛에 지시한다.Accordingly, the at least one processor is configured to perform a first compilation process to compile a function to be performed by the first at least one processing unit of the network interface device. The at least one processing unit is further configured to perform a second compilation process to compile a function to be performed by a second at least one processing unit of the network interface device. Prior to completion of the second compilation process, the at least one processing unit instructs the first at least one processing unit to perform a function with respect to a data packet received from the network. Subsequently, following completion of the second compilation process, the at least one processing unit instructs the second at least one processing unit to start performing a function relating to a data packet received from the network.

이들 단계를 수행하는 것은, 제2 컴파일 프로세스가 완료되기를 대기하는 동안, 네트워크 인터페이스 디바이스가 제1의 적어도 하나의 프로세싱 유닛(이것은 더 짧은 컴파일 시간을 가질 수도 있지만 그러나 더 느린 및/또는 덜 효율적인 프로세싱을 가질 수도 있음)를 사용하여 기능을 수행하는 것을 가능하게 한다. 제2 컴파일 프로세스가 완료되면, 네트워크 인터페이스 디바이스는, 그 다음, 제1의 적어도 하나의 프로세싱 유닛에 더하여 또는 그 대신, 제2의 적어도 하나의 프로세싱 유닛(이것은 더 긴 컴파일 시간을 가질 수도 있지만 그러나 더 빠른 및/또는 더 효율적인 프로세싱을 가질 수도 있음)를 사용하여 기능을 수행할 수도 있다.Performing these steps means that, while waiting for the second compilation process to complete, the network interface device causes the first at least one processing unit (which may have a shorter compile time but may have slower and/or less efficient processing). may have) to make it possible to perform a function. When the second compilation process is complete, the network interface device then, in addition to or instead of the first at least one processing unit, a second at least one processing unit (which may have a longer compile time but more may have faster and/or more efficient processing).

본 출원의 실시형태에 따른 예시적인 네트워크 인터페이스 디바이스(700)를 예시하는 도 7에 대한 참조가 이루어진다. 이전 도면에서 도시되는 것들과 동일한 참조 엘리먼트는 동일한 참조 번호를 사용하여 나타내어진다.Reference is made to FIG. 7 , which illustrates an exemplary network interface device 700 in accordance with an embodiment of the present application. Reference elements that are the same as those shown in the previous figures are denoted by the same reference numbers.

네트워크 인터페이스 디바이스는 제1의 적어도 하나의 프로세싱 유닛(710)을 포함한다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 복수의 프로세싱 유닛을 포함하는 도 6에서 도시되는 하드웨어 모듈(610)을 포함할 수도 있다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 도 4에서 도시되는 바와 같이, 하나 이상의 CPU를 포함할 수도 있다.The network interface device comprises a first at least one processing unit 710 . The first at least one processing unit 710 may include a hardware module 610 shown in FIG. 6 including a plurality of processing units. The first at least one processing unit 710 may include one or more CPUs, as shown in FIG. 4 .

제1 시간 기간 동안, 네트워크로부터 수신되는 데이터 패킷과 관련하여 제1의 적어도 하나의 프로세싱 유닛(710)에 의해 기능이 수행되도록, 기능은 제1의 적어도 하나의 프로세싱 유닛(710) 상에서 실행되도록 컴파일된다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 제2의 적어도 하나의 프로세싱 유닛에 대한 제2 컴파일 프로세스의 완료 이전에, 네트워크로부터 수신되는 데이터 패킷과 관련하여 기능을 수행하도록 적어도 하나의 프로세서에 의해 지시받는다.the function is compiled to be executed on the first at least one processing unit 710 such that, during the first time period, the function is performed by the first at least one processing unit 710 in connection with a data packet received from the network do. The first at least one processing unit 710 is configured to, prior to completion of a second compilation process for the second at least one processing unit, to the at least one processor to perform a function with respect to a data packet received from the network. directed by

네트워크 인터페이스 디바이스는 제2의 적어도 하나의 프로세싱 유닛(720)을 포함한다. 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같은) FPGA 애플리케이션을 구비하는 FPGA를 포함할 수도 있거나 또는 복수의 프로세싱 유닛을 포함하는 도 6에서 도시되는 하드웨어 모듈(610)을 포함할 수도 있다.The network interface device includes a second at least one processing unit 720 . The second at least one processing unit 720 may comprise an FPGA with an FPGA application (as illustrated in FIG. 5 ) or a hardware module 610 shown in FIG. 6 comprising a plurality of processing units. may include

제1 시간 기간 동안, 제2 컴파일 프로세스는 제2의 적어도 하나의 프로세싱 유닛 상에서의 실행을 위한 기능을 컴파일하기 위해 실행된다. 즉, 네트워크 인터페이스 디바이스는 FPGA 애플리케이션(515)을 즉석에서 컴파일하도록 구성된다.During the first time period, a second compilation process is executed to compile the function for execution on a second at least one processing unit. That is, the network interface device is configured to compile the FPGA application 515 on-the-fly.

제1 시간 기간에 후속하여(즉, 제2 컴파일 프로세스의 완료에 후속하여), 제2의 적어도 하나의 프로세싱 유닛(720)은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하도록 구성된다.Subsequent to the first time period (ie, following completion of the second compilation process), the second at least one processing unit 720 is configured to begin performing a function relating to a data packet received from the network.

제1 시간 기간에 후속하여, 제1의 적어도 하나의 프로세싱 유닛(710)은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 중지할 수도 있다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 데이터 패킷과 관련한 기능의 수행을, 부분적으로, 중지할 수도 있다. 예를 들면, 제1의 적어도 하나의 프로세싱 유닛이 복수의 CPU를 포함하는 경우, 제1 시간 기간에 후속하여, CPU 중 하나 이상은 네트워크로부터 수신되는 데이터 패킷과 관련하여 프로세싱의 수행을 중지할 수도 있는데, 복수의 CPU 중 나머지 CPU는 계속해서 프로세싱을 수행한다.Subsequent to the first time period, the first at least one processing unit 710 may stop performing a function relating to a data packet received from the network. In some embodiments, the first at least one processing unit 710 may stop, in part, from performing a function related to the data packet. For example, if the first at least one processing unit includes a plurality of CPUs, subsequent to the first time period, one or more of the CPUs may stop performing processing with respect to a data packet received from the network. However, the remaining CPUs among the plurality of CPUs continue to perform processing.

제1의 적어도 하나의 프로세싱 유닛(710)은 제1 데이터 플로우의 데이터 패킷과 관련하여 기능을 수행하도록 구성될 수도 있다. 제2 컴파일 프로세스가 완료되면, 제2의 적어도 하나의 프로세싱 유닛(720)은 제1 데이터 플로우의 데이터 패킷과 관련한 기능의 수행을 시작할 수도 있다. 제2 컴파일 프로세스가 완료되면, 제1의 적어도 하나의 프로세싱 유닛은 제1 데이터 플로우의 데이터 패킷과 관련한 기능의 수행을 중지할 수도 있다.The first at least one processing unit 710 may be configured to perform a function in relation to a data packet of a first data flow. When the second compilation process is completed, the second at least one processing unit 720 may start performing a function related to the data packet of the first data flow. When the second compilation process is completed, the first at least one processing unit may stop performing a function related to the data packet of the first data flow.

제1의 적어도 하나의 프로세싱 유닛 및 제2의 적어도 하나의 프로세싱 유닛에 대해 상이한 조합도 가능하다. 예를 들면, 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 4에서 예시되는 바와 같이) 복수의 CPU를 포함하고, 한편 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 6에서 예시되는 바와 같이) 복수의 프로세싱 유닛을 구비하는 하드웨어 모듈을 포함한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 4에서 예시되는 바와 같이) 복수의 CPU를 포함하고, 한편, 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같이) FPGA를 포함한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 6에서 예시되는 바와 같이) 복수의 프로세싱 유닛을 구비하는 하드웨어 모듈을 포함하고, 한편, 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같이) FPGA를 포함한다.Different combinations are also possible for the first at least one processing unit and for the second at least one processing unit. For example, in some embodiments, the first at least one processing unit 710 includes a plurality of CPUs (as illustrated in FIG. 4 ), while the second at least one processing unit 720 comprises: A hardware module comprising a plurality of processing units (as illustrated in FIG. 6 ). In some embodiments, the first at least one processing unit 710 includes a plurality of CPUs (as illustrated in FIG. 4 ), while the second at least one processing unit 720 (as illustrated in FIG. 5 ) as exemplified in) an FPGA. In some embodiments, the first at least one processing unit 710 comprises a hardware module comprising a plurality of processing units (as illustrated in FIG. 6 ), while the second at least one processing unit ( 720 (as illustrated in FIG. 5) includes an FPGA.

연결된 복수의 프로세싱 유닛(640a, 640b, 640d)이 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행할 수도 있는 방법을 예시하는 도 11에 대한 참조가 이루어진다. 프로세싱 유닛의 각각은 수신된 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하도록 구성된다.Reference is made to FIG. 11 , which illustrates how a coupled plurality of processing units 640a , 640b , 640d may perform their respective at least one operation with respect to a data packet. Each of the processing units is configured to perform its respective at least one operation with respect to the received data packet.

각각의 프로세싱 유닛의 적어도 하나의 동작은 기능(예를 들면, eBPF 프로그램의 기능)에서의 로직 스테이지를 나타낼 수도 있다. 각각의 프로세싱 유닛의 적어도 하나의 동작은 프로세싱 유닛에 의해 실행되는 명령어에 의해 표현 가능할 수도 있다. 명령어는 최소 단위의 거동을 결정할 수도 있다.At least one operation of each processing unit may represent a logic stage in a function (eg, a function of an eBPF program). At least one operation of each processing unit may be expressible by instructions executed by the processing unit. The instruction may determine the behavior of the smallest unit.

도 11은, 패킷(P₀)이 각각의 프로세싱 유닛에 의해 구현되는 프로세싱 스테이지를 따라 진행되는 방법을 예시한다.11 illustrates how a packet P ₀ proceeds along a processing stage implemented by each processing unit.

각각의 프로세싱 유닛은 컴파일러에 의해 명시되는 특정한 순서로 패킷과 관련하여 프로세싱을 수행한다. 순서는, 프로세싱 유닛 중 일부가 그들의 프로세싱을 병렬로 수행하도록 구성되도록 하는 그러한 것일 수도 있다. 이 프로세싱은 메모리에서 유지되는 패킷의 적어도 일부에 액세스하는 것을 포함할 수도 있다. 추가적으로 또는 대안적으로, 이 프로세싱은 패킷에 대해 수행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것을 포함할 수도 있다. 추가적으로 또는 대안적으로, 이 프로세싱은 상태(1110)를 수정하는 것을 포함할 수도 있다.Each processing unit performs processing with respect to the packets in a specific order specified by the compiler. The order may be such that some of the processing units are configured to perform their processing in parallel. This processing may include accessing at least a portion of the packet maintained in memory. Additionally or alternatively, this processing may include performing a lookup against a lookup table to determine an action to be performed on the packet. Additionally or alternatively, this processing may include modifying state 1110 .

프로세싱 유닛은 메타데이터(M₀, M₁ M₂, M₃)를 서로 교환한다. 제1 프로세싱 유닛(640a)은, 자신의 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 그리고 응답으로 메타데이터(M₁)를 생성하도록 구성된다. 제1 프로세싱 유닛(640a)은 메타데이터(M₁)를 제2 프로세싱 유닛(640b)으로 전달하도록 구성된다.The processing units _{exchange metadata M 0} , M ₁ M ₂ , M ₃ with each other. The first processing unit 640a is configured to perform its respective at least one predefined operation and to generate _{metadata M 1 in response.} The first processing unit 640a _{is configured to pass the metadata M 1} to the second processing unit 640b.

프로세싱 유닛 중 적어도 일부는 다음의 것 중 적어도 하나에 의존하여 그들 각각의 적어도 하나의 동작을 수행한다: 데이터 패킷의 콘텐츠, 자기 자신의 저장된 상태, 글로벌 공유된 상태, 및 데이터 패킷과 관련되는 메타데이터(예를 들면, M₀, M₁, M₂, M₃). 프로세싱 유닛 중 일부는 상태 비보존형일 수도 있다.At least some of the processing units perform their respective at least one operation depending on at least one of the following: the content of the data packet, its own stored state, a global shared state, and metadata associated with the data packet. (eg, M ₀ , M ₁ , M ₂ , M ₃ ). Some of the processing units may be stateless.

프로세싱 유닛의 각각은 적어도 하나의 클록 사이클 동안 데이터 패킷(P₀)에 대해 자신의 관련된 타입의 동작을 수행할 수도 있다. 몇몇 실시형태에서, 프로세싱 유닛의 각각은 단일의 클록 사이클 동안 자신의 관련된 타입의 동작을 수행할 수도 있다. 프로세싱 유닛의 각각은 그들의 동작을 수행하기 위해 개별적으로 클록킹될 수도 있다. 이 클로킹은 프로세싱 유닛의 프로세싱 파이프라인의 클로킹에 추가될 수도 있다.Each of the processing units may perform its associated type of operation on the _{data packet P 0} for at least one clock cycle. In some embodiments, each of the processing units may perform its associated type of operation during a single clock cycle. Each of the processing units may be individually clocked to perform their operations. This clocking may be in addition to the clocking of the processing pipeline of the processing unit.

제2 프로세싱 유닛(640b)의 동작을 더욱 자세히 살펴보면, 제2 프로세싱 유닛(640b)은 제1 데이터 패킷과 관련하여 제1의 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 프로세싱 유닛(640a)에 연결되도록 구성된다. 제2 프로세싱 유닛(640b)은, 제1 추가적인 프로세싱 유닛으로부터, 제1의 적어도 하나의 미리 정의된 동작의 결과를 수신하도록 구성된다. 제2 프로세싱 유닛(640b)은 제1의 적어도 하나의 미리 정의된 동작의 결과에 의존하여 제2의 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 제2 프로세싱 유닛(640b)은, 제1 데이터 패킷과 관련하여 제3의 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제3 프로세싱 유닛(640d)에 연결되도록 구성된다. 제2 프로세싱 유닛(640b)은 제3의 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 제2의 적어도 하나의 미리 정의된 동작의 결과를 제3 프로세싱 유닛(640d)으로 전송하도록 구성된다.Looking more closely at the operation of the second processing unit 640b , the second processing unit 640b is configured to perform a first at least one predefined operation with respect to the first data packet. ) to be connected to. The second processing unit 640b is configured to receive, from the first additional processing unit, a result of the first at least one predefined operation. The second processing unit 640b is configured to perform the second at least one predefined operation depending on a result of the first at least one predefined operation. The second processing unit 640b is configured to be coupled to a third processing unit 640d, which is configured to perform a third at least one predefined operation with respect to the first data packet. The second processing unit 640b is configured to send a result of the second at least one predefined operation to the third processing unit 640d for processing in the third at least one predefined operation.

프로세싱 유닛은 복수의 데이터 패킷의 각각과 관련하여 기능을 제공하기 위해 유사하게 동작할 수도 있다.The processing unit may similarly operate to provide functionality in connection with each of the plurality of data packets.

본 출원의 실시형태는, 기능이 허용하는 경우 다수의 패킷이 동시에 파이프라인화될(pipelined) 수도 있도록 하는 그러한 것이다.An embodiment of the present application is such that multiple packets may be pipelined concurrently if the function permits.

데이터 패킷의 파이프라인화(pipelining)를 예시하는 도 12에 대한 참조가 이루어진다. 도시되는 바와 같이, 상이한 프로세싱 유닛에 의해 상이한 패킷이 동시에 프로세싱될 수도 있다. 제1 프로세싱 유닛(640a)은 제3 데이터 패킷(P₂)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제2 프로세싱 유닛(640b)은 제2 데이터 패킷(P₁)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제3 프로세싱 유닛(640d)은 제1 데이터 패킷(P₀)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다.Reference is made to FIG. 12 , which illustrates the pipelining of data packets. As shown, different packets may be processed concurrently by different processing units. The first processing unit 640a is executing its respective at least one operation at a first time t ₀ _{with respect to the third data packet P 2 .} The second processing unit 640b is executing its respective at least one operation at the first time t ₀ _{with respect to the second data packet P 1 .} The third processing unit 640d is executing its respective at least one operation at a first time t ₀ _{with respect to the first data packet P 0 .}

각각의 적어도 하나의 동작이 프로세싱 유닛의 각각에 의해 실행된 이후, 패킷의 각각은 시퀀스에서 하나의 스테이지를 따라 이동한다. 예를 들면, 후속하는 제2 시간(t₁)에서, 제1 프로세싱 유닛(640a)은 제4 데이터 패킷(P₃)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제2 프로세싱 유닛(640b)은 제3 데이터 패킷(P₂)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제3 프로세싱 유닛(640d)은 제1 데이터 패킷(P₁)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다.After each at least one operation is executed by each of the processing units, each of the packets moves along one stage in the sequence. For example, at a subsequent second time t ₁ , the first processing unit 640a performs its respective at least one operation at the first time t ₀ _{with respect to the fourth data packet P 3 .} is running The second processing unit 640b is executing its respective at least one operation at the first time t ₀ _{with respect to the third data packet P 2 .} The third processing unit 640d is executing its respective at least one operation at a first time t ₀ _{with respect to the first data packet P 1 .}

몇몇 실시형태에서, 주어진 스테이지에서 존재할 복수의 패킷이 있을 수도 있다는 것이 인식되어야 한다.It should be appreciated that in some embodiments, there may be multiple packets to be present in a given stage.

몇몇 실시형태에서, 패킷이 하나의 스테이지로부터 다음의 스테이지로 이동할 수도 있지만, 반드시 잠금 단계에 있는 것은 아니다.In some embodiments, a packet may move from one stage to the next, but is not necessarily in a lock phase.

파이프라인 위험이 없는 한, 고정된 클록에서 동작하는 그러한 파이프라인은 일정한 대역폭을 가질 수도 있다. 이것은 시스템에서의 지터를 감소시킬 수도 있다.As long as there is no pipeline risk, such a pipeline running at a fixed clock may have a constant bandwidth. This may reduce jitter in the system.

명령어를 실행할 때 위험(예컨대 공유된 상태에 액세스하는 경우의 충돌)을 방지하기 위해, 프로세싱 유닛의 각각은, 필요로 되는 경우, 동작 없음(동작 없음)(즉, 프로세싱 유닛이 스톨함) 명령어를 실행하도록 구성될 수도 있다.To avoid risks (eg, conflicts when accessing shared state) when executing instructions, each of the processing units executes a no-action (no-action) (i.e., the processing unit stalls) instruction when needed. It can also be configured to run.

몇몇 실시형태에서, 연산(예컨대 단순 산술, 증분, 상수 값의 가산/감산, 시프트, 데이터 패킷으로부터의 또는 메타데이터로부터의 값 가산/감산)은 프로세싱 유닛에 의해 하나의 클록 사이클이 실행되는 것을 필요로 한다. 이것은, 한 프로세싱 유닛에 의해 필요로 되는 공유된 상태에서의 값이 다른 프로세싱 유닛에 의해 아직 업데이트되지 않았다는 것을 의미할 수 있다. 따라서, 공유된 상태(1110)에 있는 오래된(out of date) 값은 그들을 필요로 하는 프로세싱 유닛에 의해 판독될 수도 있다. 따라서, 공유된 상태에 값을 판독하고 기록할 때 위험이 발생할 수도 있다. 다른 한편으로는, 중간 값에 대한 동작은, 위험 발생 없이 메타데이터로서 통과될 수도 있다.In some embodiments, operations (eg, simple arithmetic, increment, addition/subtraction of constant values, shifts, addition/subtraction of values from data packets or from metadata) require one clock cycle to be executed by the processing unit. do it with This may mean that the value in the shared state needed by one processing unit has not yet been updated by another processing unit. Thus, out of date values in shared state 1110 may be read by the processing unit that needs them. Therefore, a risk may arise when reading and writing values to a shared state. On the other hand, operations on intermediate values may be passed as metadata without incurring risk.

방지될 수도 있는 공유된 상태(1110)에 대한 판독 및 기록시의 위험의 예는 증분 동작의 맥락에서 주어질 수 있다. 그러한 증분 동작은, 공유된 상태(1110)에서 패킷 카운터를 증분시키는 동작일 수도 있다. 증분 연산의 하나의 구현예에서, 파이프라인의 제1 시간 슬롯 동안, 제2 프로세싱 유닛(640b)은 공유된 상태(1110)로부터 카운터의 값을 판독하도록, 그리고 이 판독 동작의 출력을 (예를 들면, 메타데이터(M₂)로서) 제3 프로세싱 유닛(640d)으로 제공하도록 구성된다. 제3 프로세싱 유닛(640d)은 제2 프로세싱 유닛(640b)으로부터 카운터의 값을 수신하도록 구성된다. 제2 시간 슬롯 동안, 제3 프로세싱 유닛(640d)은 이 값을 증분시키고, 새로 증분된 값을 공유된 상태(1110)에 기록한다.An example of a risk in reading and writing to a shared state 1110 that may be avoided may be given in the context of an incremental operation. Such an increment operation may be an operation to increment a packet counter in the shared state 1110 . In one implementation of the increment operation, during the first time slot of the pipeline, the second processing unit 640b reads the value of the counter from the shared state 1110, and outputs the output of this read operation (e.g. for example, as metadata M ₂ ) to the third processing unit 640d . The third processing unit 640d is configured to receive the value of the counter from the second processing unit 640b. During the second time slot, the third processing unit 640d increments this value and writes the newly incremented value to the shared state 1110 .

그러한 증분 동작을 수행할 때 문제가 발생할 수도 있는데, 그 문제는, 제2 시간 슬롯 동안, 제2 프로세싱 유닛(640b)이 공유된 상태(1110)에서 저장되는 카운터에 액세스 하려고 시도하는 경우, 제2 프로세싱 유닛(640b)은, 공유된 상태(1110)에 있는 카운터 값이 제3 프로세싱 유닛(640d)에 의해 업데이트되기 이전에, 카운터의 이전 값을 판독할 수도 있다는 것이다.A problem may arise when performing such an increment operation, which occurs when, during a second time slot, the second processing unit 640b attempts to access a counter stored in the shared state 1110 , the second The processing unit 640b may read the previous value of the counter before the counter value in the shared state 1110 is updated by the third processing unit 640d.

따라서, 이러한 문제를 해결하기 위해, 제2 프로세싱 유닛(640b)은 (동작 없음 명령어 또는 파이프라인 버블의 제2 프로세싱 유닛(640b)에 의한 실행을 통해) 제2 시간 슬롯 동안 스톨될 수도 있다. 스톨은 다음 번 명령어의 실행에서 지연인 것으로 이해될 수도 있다. 이 지연은 다음 번 명령어 대신 "동작 없음" 명령어의 실행에 의해 구현될 수도 있다. 그 다음, 제2 프로세싱 유닛(640b)은, 후속하는 제3 시간 슬롯 동안 공유된 상태(1110)로부터 카운터 값을 판독한다. 제3 시간 슬롯 동안, 공유된 상태(1110)에서의 카운터는 업데이트되었고, 따라서, 제2 프로세싱 유닛(640b)이 업데이트된 값을 판독한다는 것이 보장된다.Thus, to address this problem, the second processing unit 640b may be stalled for a second time slot (via execution by the second processing unit 640b of a no action instruction or pipeline bubble). A stall may be understood as a delay in the execution of the next instruction. This delay may be implemented by the execution of a "no action" instruction instead of the next instruction. The second processing unit 640b then reads the counter value from the shared state 1110 during a subsequent third time slot. During the third time slot, the counter in the shared state 1110 has been updated, thus ensuring that the second processing unit 640b reads the updated value.

몇몇 실시형태에서, 각각의 최소 단위는, 단일의 파이프라인 시간 슬롯 동안, 상태로부터 판독하도록, 상태를 업데이트하도록 그리고 업데이트된 상태를 기록하도록 구성된다. 이 경우, 상기에서 설명되는 프로세싱 유닛의 스톨링(stalling)은 사용되지 않을 수도 있다. 그러나, 프로세싱 유닛을 스톨링하는 것은 요구되는 메모리 인터페이스 비용을 감소시킬 수도 있다.In some embodiments, each minimal unit is configured to read from the state, update the state, and write the updated state, during a single pipeline time slot. In this case, stalling of the processing unit described above may not be used. However, stalling the processing unit may reduce the required memory interface cost.

몇몇 실시형태에서, 위험을 방지하기 위해, 파이프라인에서의 프로세싱 유닛은 그들 자신의 동작을 수행하기 이전에, 파이프라인에서의 다른 프로세싱 유닛이 그들의 프로세싱을 완료할 때까지 대기할 수도 있다.In some embodiments, to avoid risk, processing units in the pipeline may wait for other processing units in the pipeline to complete their processing before performing their own operations.

언급되는 바와 같이, 컴파일러는 임의적인 수의 프로세싱 스테이지 인스턴스를 링크하는 것에 의해 데이터 경로를 구축하는데, 여기서 각각의 인스턴스는 미리 정의된 수(주어진 예에서는 세 개)의 미리 합성된 프로세싱 스테이지 템플릿 중 하나로부터 구축된다. 프로세싱 스테이지 템플릿은 로직 스테이지 템플릿(예를 들면, 레지스터, 스크래치 패드 메모리, 및 메타데이터에 걸친 산술 연산을 제공함), 패킷 액세스 상태 템플릿(예를 들면, 패킷 데이터 로드 및/또는 패킷 데이터 저장소를 제공함), 및 맵 액세스 스테이지 템플릿(예를 들면, 맵 룩업 알고리즘, 맵 테이블 사이즈)이다.As mentioned, the compiler builds a data path by linking an arbitrary number of processing stage instances, where each instance is one of a predefined number (three in the given example) of pre-synthesized processing stage templates. is built from Processing stage templates include logic stage templates (e.g., providing arithmetic operations across registers, scratch pad memory, and metadata), packet access state templates (e.g. providing packet data loads and/or packet data storage). , and a map access stage template (eg, map lookup algorithm, map table size).

각각의 프로세싱 스테이지 인스턴스는 프로세싱 유닛 중 단일의 하나에 의해 구현될 수도 있다. 즉, 각각의 프로세싱 스테이지는 프로세싱 유닛에 의해 실행되는 각각의 적어도 하나의 동작을 포함한다.Each processing stage instance may be implemented by a single one of the processing units. That is, each processing stage includes a respective at least one operation executed by a processing unit.

도 13은 수신된 데이터 패킷을 프로세싱하기 위해 프로세싱 스테이지가 파이프라인(1300)에서 함께 연결될 수도 있는 방법의 예를 예시한다. 도 13에서 도시되는 바와 같이, 제1 데이터 패킷이 수신되어 FIFO(1305)에서 저장된다. 하나 이상의 호출 인수(calling argument)가 제1 로직 스테이지(1310)에서 수신된다. 호출 인수는 수신된 데이터 패킷에 대해 실행될 기능을 식별하는 프로그램 선택기(program selector)를 포함할 수도 있다. 호출 인수는 수신된 데이터 패킷의 패킷 길이의 표시를 포함할 수도 있다. 제1 로직 스테이지(1310)는 호출 인수를 프로세싱하도록 그리고 제1 패킷 액세스 스테이지(1315)에 출력을 제공하도록 구성된다.13 illustrates an example of how processing stages may be coupled together in pipeline 1300 to process received data packets. As shown in FIG. 13 , a first data packet is received and stored in FIFO 1305 . One or more calling arguments are received in the first logic stage 1310 . The call argument may include a program selector that identifies the function to be executed on the received data packet. The call argument may include an indication of the packet length of the received data packet. The first logic stage 1310 is configured to process the call argument and provide an output to the first packet access stage 1315 .

제1 패킷 액세스 스테이지(1315)는 네트워크 탭(network tap; 1320)에서 제1 패킷으로부터 데이터를 로딩한다. 제1 패킷 액세스 스테이지(1315)는 또한 제1 로직 스테이지(1310)의 출력에 의존하여 데이터를 제1 패킷에 기록할 수도 있다. 제1 패킷 액세스 스테이지(1315)는 제1 데이터 패킷의 전방(front)에 데이터를 기록할 수도 있다. 제1 패킷 액세스 스테이지(1315)는 데이터 패킷 내의 데이터를 덮어쓸 수도 있다.A first packet access stage 1315 loads data from a first packet at a network tap 1320 . The first packet access stage 1315 may also write data to the first packet depending on the output of the first logic stage 1310 . The first packet access stage 1315 may write data to the front of the first data packet. The first packet access stage 1315 may overwrite the data in the data packet.

로딩된 데이터 및 임의의 다른 메타데이터 및/또는 인수는, 그 다음, 제2 로직 스테이지(1325)로 제공되는데, 제2 로직 스테이지(1325)는 제1 데이터 패킷과 관련하여 프로세싱을 수행하고 출력 인수를 제1 맵 액세스 스테이지(1330)로 제공한다. 제1 맵 액세스 스테이지(1330)는 제2 로직 스테이지(1325)로부터의 출력을 사용하여 룩업 테이블에 대한 룩업을 수행하여 제1 데이터 패킷과 관련하여 수행될 액션을 결정한다. 그 다음, 출력은 제3 로직 스테이지(1335)로 전달되는데, 제3 로직 스테이지(1335)는 이 출력을 프로세싱하고 결과를 제2 패킷 액세스 스테이지(1340)로 전달한다.The loaded data and any other metadata and/or arguments are then provided to a second logic stage 1325, which performs processing with respect to the first data packet and outputs arguments. is provided to the first map access stage 1330 . The first map access stage 1330 uses the output from the second logic stage 1325 to perform a lookup on a lookup table to determine the action to be performed with respect to the first data packet. The output is then passed to a third logic stage 1335 , which processes the output and passes the result to a second packet access stage 1340 .

제2 패킷 액세스 스테이지(1340)는 제3 로직 스테이지(1335)의 출력에 의존하여 제1 데이터 패킷으로부터 데이터를 판독하고 및/또는 제1 데이터 패킷에 데이터를 기록할 수도 있다. 그 다음, 제2 패킷 액세스 스테이지(1340)의 결과는, 자신이 수신하는 입력과 관련하여 프로세싱을 수행하도록 구성되는 제4 로직 스테이지(1345)로 전달된다.The second packet access stage 1340 may read data from and/or write data to the first data packet depending on the output of the third logic stage 1335 . The result of the second packet access stage 1340 is then passed to a fourth logic stage 1345 that is configured to perform processing with respect to the input it receives.

파이프라인은 복수의 패킷 액세스 스테이지, 로직 스테이지, 및 맵 액세스 스테이지를 포함할 수도 있다. 최종 로직 스테이지(1350)가 반환 인수를 출력하도록 구성될 수도 있다. 반환 인수는 데이터 패킷의 시작을 식별하는 포인터를 포함할 수도 있다. 반환 인수는 데이터 패킷과 관련하여 수행될 액션의 표시를 포함할 수도 있다. 액션의 표시는 패킷이 드랍되어야 하는지 또는 그렇지 않은지의 여부를 나타낼 수도 있다. 액션의 표시는 패킷이 호스트 시스템으로 포워딩되어야 하는지 또는 그렇지 않은지의 여부를 나타낼 수도 있다. 네트워크 인터페이스 디바이스는 패킷이 드랍되어야 한다는 표시에 응답하여 각각의 데이터 패킷을 드랍하도록 구성되는 적어도 하나의 프로세싱 유닛을 포함할 수도 있다.A pipeline may include a plurality of packet access stages, logic stages, and map access stages. The final logic stage 1350 may be configured to output a return argument. The return argument may include a pointer identifying the start of the data packet. The return argument may include an indication of an action to be performed with respect to the data packet. The indication of an action may indicate whether the packet should be dropped or not. The indication of an action may indicate whether the packet should or should not be forwarded to the host system. The network interface device may include at least one processing unit configured to drop each data packet in response to an indication that the packet should be dropped.

파이프라인(1300)은 하나 이상의 바이패스 FIFO(l355a, l355b, l355c)를 추가적으로 포함할 수도 있다. 바이패스 FIFO는 프로세싱 데이터, 예를 들면, 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지 주변의 제1 데이터 패킷으로부터의 데이터를 전달하기 위해 사용될 수도 있다. 몇몇 실시형태에서, 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지는, 그들 각각의 적어도 하나의 동작을 수행하기 위해 제1 데이터 패킷으로부터의 데이터를 필요로 하지 않는다. 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지는 입력 인수에 의존하여 그들 각각의 적어도 하나의 동작을 수행할 수도 있다.The pipeline 1300 may additionally include one or more bypass FIFOs 1355a, 1355b, and 1355c. The bypass FIFO may be used to convey processing data, eg, data from the first data packet around the map access stage and/or packet access stage. In some embodiments, the map access stage and/or the packet access stage do not require data from the first data packet to perform their respective at least one operation. The map access stage and/or the packet access stage may perform their respective at least one operation depending on the input argument.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(600, 700)에 의해 수행되는 방법(800)을 예시하는 도 8에 대한 참조가 이루어진다.Reference is made to FIG. 8 , which illustrates a method 800 performed by a network interface device 600 , 700 according to an embodiment of the present application.

S810에서, 기능을 수행하기 위해 네트워크 인터페이스 디바이스의 하드웨어 모듈이 배치된다. 하드웨어 모듈은 데이터 패킷과 관련하여 하드웨어에서 한 타입의 동작을 수행하도록 각각 구성되는 복수의 프로세싱 유닛을 포함한다. S810은, 각각의 수신된 데이터 패킷과 관련하여 기능을 제공하기 위해 특정한 순서로 그들의 각각의 미리 정의된 타입의 동작을 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것을 포함한다. 그와 같이 하드웨어 모듈을 배열하는 것은, 수신된 데이터 패킷이 복수의 프로세싱 유닛 중 적어도 일부의 복수의 동작의 각각에 의한 프로세싱을 거치도록 복수의 프로세싱 유닛 중 적어도 일부를 연결하는 것을 포함한다. 연결은 프로세싱 유닛 사이에서 데이터 패킷 및 관련된 메타데이터를 라우팅하도록 하드웨어 모듈의 라우팅 하드웨어를 구성하는 것에 의해 달성될 수도 있다.In S810, a hardware module of the network interface device is disposed to perform a function. A hardware module includes a plurality of processing units each configured to perform a type of operation in hardware with respect to a data packet. S810 includes arranging at least some of the plurality of processing units to perform an operation of their respective predefined type in a particular order to provide a function with respect to each received data packet. Such arranging the hardware module includes coupling at least some of the plurality of processing units such that the received data packet is subjected to processing by each of a plurality of operations of at least some of the plurality of processing units. The connection may be achieved by configuring the routing hardware of the hardware module to route data packets and associated metadata between processing units.

S820에서, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 네트워크로부터 제1 데이터 패킷이 수신된다.In S820 , a first data packet is received from the network at a first interface of the network interface device.

S830에서, 제1 데이터 패킷은 S810의 컴파일 프로세스 동안 연결되었던 적어도 일부 프로세싱 유닛의 각각에 의해 프로세싱된다. 적어도 일부 프로세싱 유닛의 각각은 적어도 하나의 데이터 패킷과 관련하여 수행하도록 미리 구성되는 동작의 타입을 수행한다. 그러므로, 기능은 제1 데이터 패킷과 관련하여 수행된다.In S830 , the first data packet is processed by each of at least some processing units that were connected during the compilation process of S810 . Each of the at least some processing units performs a type of operation that is preconfigured to perform in connection with the at least one data packet. Therefore, a function is performed in association with the first data packet.

S840에서, 프로세싱된 제1 데이터 패킷은 자신의 목적지로 계속 전송된다. 이것은 데이터 패킷을 호스트에도 역시 전송하는 것을 포함할 수도 있다. 이것은 네트워크를 통해 데이터 패킷을 전송하는 것을 포함할 수도 있다.In S840, the processed first data packet continues to be transmitted to its destination. This may include sending the data packet to the host as well. This may include sending data packets over the network.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(700)에서 수행될 수도 있는 방법(900)을 예시하는 도 9에 대한 참조가 이루어진다.Reference is made to FIG. 9 , which illustrates a method 900 that may be performed in a network interface device 700 according to an embodiment of the present application.

S910에서, 네트워크 인터페이스 디바이스의 제1의 적어도 하나의 프로세싱 유닛(즉, 제1 회로부)은 네트워크를 통해 수신되는 데이터 패킷을 수신 및 프로세싱하도록 구성된다. 이 프로세싱은 데이터 패킷과 관련하여 기능을 수행하는 것을 포함한다. 프로세싱은 제1 시간 기간 동안 수행된다.In S910 , the first at least one processing unit (ie, first circuitry) of the network interface device is configured to receive and process the data packet received via the network. This processing includes performing a function with respect to the data packet. The processing is performed during the first time period.

S920에서, 제2 컴파일 프로세스는 제2의 적어도 하나의 프로세싱 유닛(즉, 제2 회로부)에 대한 수행을 위한 기능을 컴파일하기 위해 제1 시간 기간 동안 수행된다.In S920 , a second compilation process is performed for a first time period to compile a function for execution on a second at least one processing unit (ie, a second circuit unit).

S930에서, 제2 컴파일 프로세스이 완료되었는지 또는 아닌지의 여부가 결정된다. 만약 그렇지 않으면, S910 및 S920으로 다시 복귀하는데, 여기서 제1의 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련하여 프로세싱을 계속 수행하고, 하고, 제2 컴파일 프로세스는 계속된다.In S930, it is determined whether the second compilation process is completed or not. If not, it returns back to S910 and S920, where the first at least one processing unit continues to perform processing with respect to the data packet received from the network, and the second compilation process continues.

S940에서, 제2 컴파일이 완료되었다는 것을 결정하는 것에 응답하여, 제1의 적어도 하나의 프로세싱 유닛은 수신된 데이터 패킷과 관련한 기능의 수행을 중지한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛은 소정의 데이터 플로우에 관해서만 기능의 수행을 중지할 수도 있다. 그 다음, 제2의 적어도 하나의 프로세싱 유닛은 그들 소정의 데이터 플로우와 관련하여 (S950에서) 기능을 대신 수행할 수도 있다.In S940 , in response to determining that the second compilation is complete, the first at least one processing unit stops performing a function related to the received data packet. In some embodiments, the first at least one processing unit may cease to perform a function only with respect to a given data flow. Then, the second at least one processing unit may instead perform the function (in S950 ) with respect to their given data flow.

S950에서, 제2 컴파일 프로세스가 완료되면, 제2의 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하도록 구성된다.In S950 , when the second compilation process is completed, the second at least one processing unit is configured to start performing a function related to the data packet received from the network.

본 출원의 실시형태에 따른 방법(1600)을 예시하는 도 16에 대한 참조가 이루어진다. 방법(1600)은 네트워크 인터페이스 디바이스 또는 호스트 디바이스에서 수행될 수 있다.Reference is made to FIG. 16 , which illustrates a method 1600 according to an embodiment of the present application. Method 1600 may be performed at a network interface device or a host device.

S1610에서, 제1의 적어도 하나의 프로세싱 유닛이 수행할 기능을 컴파일하기 위한 컴파일 프로세스가 수행된다.In S1610, a compilation process for compiling a function to be performed by the first at least one processing unit is performed.

S1620에서, 제2의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스가 수행된다. 이 프로세스는 제1 기능을 제공하기 위해 데이터 패킷을 프로세싱하기 위한 복수의 스테이지의 스테이지와 관련되는 적어도 하나의 동작을 수행할 것을 제2의 적어도 하나의 프로세싱 유닛의 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함한다. 복수의 프로세싱 유닛의 각각은 한 타입의 프로세싱을 수행하도록 구성되고, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 결정에 의존하여 수행된다. 다시 말하면, 프로세싱 유닛은 그들의 템플릿에 따라 선택된다.In S1620, a compilation process is performed to compile a function to be performed by the second at least one processing unit. The process comprises assigning to each of the plurality of processing units of the second at least one processing unit to perform at least one operation associated with the stage of the plurality of stages for processing the data packet to provide a first function. include that Each of the plurality of processing units is configured to perform one type of processing, and the assigning is performed in dependence on a determination that the processing unit is configured to perform the appropriate type of processing to perform each at least one operation. In other words, the processing units are selected according to their template.

1630에서, S1620의 컴파일 프로세스의 완료 이전에, 제1의 적어도 하나의 프로세싱 유닛으로 하여금 기능을 수행하게 하기 위한 명령어가 전송된다. 이 명령어는 S1620의 컴파일 프로세스가 시작되기 이전에 전송될 수도 있다.In 1630 , before completion of the compilation process of S1620, an instruction for causing the first at least one processing unit to perform a function is transmitted. This command may be transmitted before the compilation process of S1620 starts.

S1640에서, S1620에서의 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금 데이터 패킷과 관련하여 기능을 수행하게 하기 위한 명령어가 제2 회로부로 전송된다. 이 명령어는 S1620에서 생성되는 컴파일된 명령어를 포함할 수도 있다.In S1640, following the completion of the compilation process in S1620, an instruction for causing the second circuitry to perform a function with respect to the data packet is sent to the second circuitry. This instruction may include compiled instructions generated in S1620.

본 출원의 실시형태에 따른 기능은, 네트워크 인터페이스에서 프로세싱 슬라이스의 플러그형 컴포넌트로서 제공될 수도 있다. 슬라이스(1425)가 네트워크 인터페이스 디바이스(600)에서 어떻게 사용될 수도 있는지의 예를 예시하는 도 14에 대한 참조가 이루어진다. 슬라이스(1425)는 프로세싱 파이프라인으로서 지칭될 수도 있다.Functionality according to embodiments of the present application may be provided as a pluggable component of a processing slice in a network interface. Reference is made to FIG. 14 , which illustrates an example of how slice 1425 may be used in network interface device 600 . Slice 1425 may be referred to as a processing pipeline.

네트워크 인터페이스 디바이스(600)는, 슬라이스(1425)에 의해 프로세싱될 그리고 그 다음 네트워크를 통해 송신될 호스트로부터의 데이터 패킷을 수신 및 저장하기 위한 송신 큐(1405)를 포함한다. 네트워크 인터페이스 디바이스(600)는, 슬라이스(1425)에 의해 프로세싱될 그리고 그 다음 호스트로 전달될 네트워크(1410)로부터 수신되는 데이터 패킷을 저장하기 위한 수신 큐(1410)를 포함한다. 네트워크 인터페이스 디바이스(600)는 슬라이스(1425)에 의해 프로세싱된 그리고 호스트로의 전달을 위한 것인 네트워크로부터 수신되는 데이터 패킷을 저장하기 위한 수신 큐(1415)를 포함한다. 네트워크 인터페이스 디바이스(600)는 슬라이스(1425)에 의해 프로세싱된 그리고 네트워크로의 전달을 위한 것인 호스트로부터 수신되는 데이터 패킷을 저장하기 위한 송신 큐를 포함한다.The network interface device 600 includes a transmit queue 1405 for receiving and storing data packets from a host to be processed by the slice 1425 and then to be transmitted over the network. The network interface device 600 includes a receive queue 1410 for storing data packets received from the network 1410 to be processed by the slice 1425 and then forwarded to the host. Network interface device 600 includes receive queue 1415 for storing data packets processed by slice 1425 and received from the network for delivery to a host. Network interface device 600 includes a transmit queue for storing data packets processed by slice 1425 and received from a host that are for delivery to a network.

네트워크 인터페이스 디바이스(600)의 슬라이스(1425)는 수신 경로 및 송신 경로 상의 데이터 패킷을 프로세싱하기 위한 복수의 프로세싱 기능을 포함한다. 슬라이스(1425)는 수신 경로 및 송신 경로 상의 데이터 패킷의 프로토콜 프로세싱을 수행하도록 구성되는 프로토콜 스택을 포함할 수도 있다. 몇몇 실시형태에서, 네트워크 인터페이스 디바이스(600)에는 복수의 슬라이스가 있을 수도 있다. 복수의 슬라이스 중 적어도 하나는 네트워크로부터 수신되는 수신 데이터 패킷을 프로세싱하도록 구성될 수도 있다. 복수의 슬라이스 중 적어도 하나는 네트워크를 통한 송신을 위해 송신 데이터 패킷을 프로세싱하도록 구성될 수도 있다. 슬라이스는, 적어도 하나의 FPGA 및/또는 적어도 하나의 ASIC와 같은 하드웨어 프로세싱 장치에 의해 구현될 수도 있다.Slice 1425 of network interface device 600 includes a plurality of processing functions for processing data packets on a receive path and a transmit path. Slice 1425 may include a protocol stack configured to perform protocol processing of data packets on a receive path and a transmit path. In some embodiments, the network interface device 600 may have multiple slices. At least one of the plurality of slices may be configured to process a received data packet received from the network. At least one of the plurality of slices may be configured to process a transmit data packet for transmission over a network. A slice may be implemented by a hardware processing device such as at least one FPGA and/or at least one ASIC.

가속기 컴포넌트(l430a, l430b, l430c, l430d)는 도시되는 바와 같이 슬라이스의 상이한 스테이지에서 삽입될 수도 있다. 가속기 컴포넌트 각각은 슬라이스를 통과하는 데이터 패킷과 관련하여 기능을 제공한다. 가속기 컴포넌트는, 즉석에서, 즉, 네트워크 인터페이스 디바이스의 동작 동안 삽입될 수도 있거나 또는 제거될 수도 있다. 따라서, 가속기 컴포넌트는 플러그형 컴포넌트이다. 가속기 컴포넌트는 슬라이스(1425)에 대해 할당되는 로직 영역이다. 그들의 각각은, 슬라이스를 통과하는 패킷이 컴포넌트 안팎으로 스트리밍되는 것을 허용하는 스트리밍 패킷 인터페이스를 지원한다.Accelerator components 1430a, 1430b, 1430c, and 1430d may be inserted at different stages of the slice as shown. Each accelerator component provides a function with respect to a data packet passing through a slice. The accelerator component may be inserted or removed on the fly, ie during operation of the network interface device. Accordingly, the accelerator component is a pluggable component. The accelerator component is a logical region allocated for slice 1425 . Each of them supports a streaming packet interface that allows packets passing through the slice to be streamed into and out of the component.

예를 들면, 한 타입의 가속기 컴포넌트는 수신 또는 송신 경로 상의 데이터 패킷의 암호화를 제공하도록 구성될 수도 있다. 다른 타입의 가속기 컴포넌트는 수신 또는 송신 경로 상의 데이터 패킷의 복호화를 제공하도록 구성될 수도 있다.For example, one type of accelerator component may be configured to provide encryption of data packets on a receive or transmit path. Another type of accelerator component may be configured to provide decoding of data packets on a receive or transmit path.

(도 6을 참조하여 상기에서 논의되는 바와 같은) 복수의 연결된 프로세싱 유닛에 의해 수행되는 동작을 실행하는 것에 의해 제공되는 상기에서 논의되는 기능은 가속기 컴포넌트에 의해 제공될 수도 있다. 유사하게, (도 4를 참조하여 상기에서 논의되는 바와 같은) 네트워크 프로세싱 CPU의 어레이 및/또는 (도 5를 참조하여 상기에서 논의되는 바와 같은) FPGA 애플리케이션에 의해 제공되는 기능은 가속기 컴포넌트에 의해 제공될 수도 있다.The functionality discussed above provided by executing an operation performed by a plurality of coupled processing units (as discussed above with reference to FIG. 6 ) may be provided by an accelerator component. Similarly, the functionality provided by the array of network processing CPUs (as discussed above with reference to FIG. 4) and/or the FPGA application (as discussed above with reference to FIG. 5) is provided by the accelerator component. could be

설명되는 바와 같이, 네트워크 인터페이스 디바이스의 동작 동안, 제1의 적어도 하나의 프로세싱 유닛(예컨대, 복수의 연결된 프로세싱 유닛)에 의해 수행되는 프로세싱은 제2의 적어도 하나의 프로세싱 유닛으로부터 마이그레이션될 수도 있다. 이 마이그레이션을 구현하기 위해, 슬라이스(1425)의 컴포넌트 중 제1의 적어도 하나의 프로세싱 유닛에 의한 프로세싱을 위한 컴포넌트는 제2의 적어도 하나의 프로세싱 유닛에 의한 프로세싱을 위한 컴포넌트에 의해 대체될 수도 있다.As described, during operation of the network interface device, processing performed by the first at least one processing unit (eg, a plurality of coupled processing units) may be migrated from the second at least one processing unit. To implement this migration, one of the components of the slice 1425 for processing by the first at least one processing unit may be replaced by a component for processing by the second at least one processing unit.

네트워크 인터페이스 디바이스는 슬라이스(1425)로부터 컴포넌트를 삽입 및 제거하도록 구성되는 제어 프로세서를 포함할 수도 있다. 상기에서 논의되는 제1 시간 기간 동안, 제1의 적어도 하나의 프로세싱 유닛에 의한 기능을 수행하기 위한 컴포넌트가 슬라이스(1425)에 존재할 수도 있다. 제어 프로세서는, 제1 시간 기간에 후속하여: 제1의 적어도 하나의 프로세싱 유닛에 의한 기능을 제공하는 플러그형 컴포넌트를 슬라이스(1425)로부터 제거하도록 그리고 제2의 적어도 하나의 프로세싱 유닛에 의한 기능을 제공하는 플러그형 컴포넌트를 슬라이스(1425)에 삽입하도록 구성될 수도 있다.The network interface device may include a control processor configured to insert and remove components from the slice 1425 . During the first time period discussed above, a component for performing a function by a first at least one processing unit may be present in the slice 1425 . The control processor is configured to: subsequent to the first period of time: remove from the slice 1425 a pluggable component providing functionality by the first at least one processing unit and disable the functionality by the second at least one processing unit. It may be configured to insert the provided pluggable component into the slice 1425 .

슬라이스로부터 컴포넌트를 삽입 및 제거하는 것 외에도 또는 그 대신, 제어 프로세서는 컴포넌트에 프로그램을 로딩할 수도 있고 컴포넌트로의 프레임의 플로우를 제어하기 위한 제어 평면 커맨드(control-plane command)를 발행할 수도 있다. 이 경우, 컴포넌트는 파이프라인으로부터 삽입 또는 제거되지 않고도 동작하게 되거나 또는 동작하지 않게 될지도 모른다.In addition to or instead of inserting and removing components from a slice, the control processor may load programs into the components and issue control-plane commands to control the flow of frames into the components. In this case, the component may or may not become operational without being inserted or removed from the pipeline.

몇몇 실시형태에서, 제어 평면 또는 구성 정보는, 별개의 제어 버스를 필요로 하기보다는, 데이터 경로를 통해 전달된다. 몇몇 실시형태에서, 데이터 경로 컴포넌트의 구성을 업데이트하기 위한 요청은 네트워크 패킷과 동일한 버스를 통해 전달되는 메시지로서 인코딩된다. 따라서, 데이터 경로는 두 가지 타입의 패킷: 네트워크 패킷 및 제어 패킷을 전달할 수도 있다.In some embodiments, control plane or configuration information is carried over a data path, rather than requiring a separate control bus. In some embodiments, the request to update the configuration of the data path component is encoded as a message carried over the same bus as the network packet. Thus, a data path may carry two types of packets: network packets and control packets.

제어 패킷은 제어 프로세서에 의해 형성되고, 슬라이스(1425)를 사용하여 데이터 패킷을 전송 또는 수신하기 위해 사용되는 동일한 메커니즘을 사용하여 슬라이스(1425)에 주입된다. 이 동일한 메커니즘은 송신 큐 또는 수신 큐일 수도 있다. 제어 패킷은 임의의 적절한 방식으로 네트워크 패킷과 구별될 수도 있다. 몇몇 실시형태에서, 상이한 타입의 패킷은 메타데이터 워드 내의 비트 또는 비트들에 의해 구별될 수도 있다.Control packets are formed by the control processor and injected into slice 1425 using the same mechanism used to transmit or receive data packets using slice 1425 . This same mechanism may be a transmit queue or a receive queue. Control packets may be distinguished from network packets in any suitable manner. In some embodiments, different types of packets may be distinguished by a bit or bits within a metadata word.

몇몇 실시형태에서, 제어 패킷은, 제어 패킷이 취하는 슬라이스(1425)를 통과하는 경로를 결정하는 라우팅 필드를 메타데이터 워드에서 포함한다. 제어 패킷은 제어 커맨드의 시퀀스를 전달할 수도 있다. 각각의 제어 커맨드는 슬라이스(1425)의 하나 이상의 컴포넌트를 타겟으로 할 수도 있다. 각각의 데이터 경로 컴포넌트는 컴포넌트 ID 필드에 의해 식별된다. 각각의 제어 커맨드는 각각의 식별된 컴포넌트에 대한 요청을 인코딩한다. 요청은 그 컴포넌트의 구성에 대해 변경을 행하는 것일 수도 있다. 요청은, 컴포넌트가 활성화되었는지 또는 그렇지 않은지의 여부, 즉, 컴포넌트가 슬라이스를 통과하는 데이터 패킷과 관련하여 자신의 기능을 수행하는지 또는 그렇지 않은지의 여부를 제어할 수도 있다.In some embodiments, the control packet includes a routing field in the metadata word that determines the path through the slice 1425 that the control packet takes. A control packet may carry a sequence of control commands. Each control command may target one or more components of slice 1425 . Each data path component is identified by a component ID field. Each control command encodes a request for each identified component. The request may be to make a change to the configuration of that component. The request may control whether the component is active or not, that is, whether the component performs its function with respect to data packets passing through the slice or not.

따라서, 몇몇 실시형태에서, 네트워크 인터페이스 디바이스(600)의 제어 프로세서는, 슬라이스의 컴포넌트 중 하나로 하여금 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하게 하기 위해 메시지를 전송하도록 구성된다. 이 메시지는 플러그형 컴포넌트를 통해 전송되는 그리고 기능을 수행하기 위해 컴포넌트로의 프레임의 최소 단위의 스위치 오버(atomic switch over)를 야기하는 제어 평면 메시지이다. 그 다음, 이 컴포넌트는, 슬라이스가 스위치 아웃될 때까지, 슬라이스를 통과하는 모든 수신된 데이터 패킷에 대해 실행된다. 제어 프로세서는, 슬라이스의 컴포넌트 중 다른 것으로 하여금, 이 컴포넌트가 네트워크 인터페이스 디바이스(600)에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 메시지를 전송하도록 구성된다.Accordingly, in some embodiments, the control processor of the network interface device 600 is configured to send a message to cause one of the components of the slice to begin performing a function relating to a data packet received at the network interface device. This message is a control plane message that is sent through a pluggable component and causes an atomic switch over of a frame to the component to perform a function. This component is then executed on every received data packet passing through the slice, until the slice is switched out. The control processor is configured to send a message to cause another of the components of the slice to stop performing a function relating to the data packet received at the network interface device 600 .

컴포넌트를 데이터 슬라이스(1425) 안팎으로 스위칭하기 위해, 입구 및 출구 데이터 경로의 다양한 지점에서 소켓이 존재할 수도 있다. 제어 프로세서는 추가적인 로직을 슬라이스(1425) 안팎으로 연결할 수도 있다. 이 추가적인 로직은 컴포넌트 사이에서 배치되는 FIFO의 형태를 취할 수도 있다.Sockets may exist at various points in the ingress and egress data paths to switch components in and out of data slice 1425 . The control processor may couple additional logic into and out of slice 1425 . This additional logic may take the form of a FIFO placed between components.

제어 프로세서는 슬라이스(1425)를 통해 슬라이스(1425)의 구성된 컴포넌트로 제어 평면 메시지를 전송할 수도 있다. 구성은 슬라이스(1425)의 컴포넌트에 의해 수행되는 기능을 결정할 수도 있다. 예를 들면, 슬라이스(1425)를 통해 전송되는 제어 메시지는 하드웨어 모듈로 하여금 데이터 패킷과 관련하여 기능을 수행하도록 구성되게 할 수도 있다. 그러한 제어 메시지는, 소정의 기능을 제공하기 위해, 하드웨어 모듈의 최소 단위로 하여금, 하드웨어 모듈의 파이프라인으로 인터커넥트되게 할 수도 있다. 그러한 제어 메시지는, 하드웨어 모듈의 개개의 최소 단위로 하여금, 개별적으로 선택된 최소 단위에 의해 수행될 동작을 선택하도록 구성되게 할 수도 있다. 각각의 최소 단위가 한 타입의 동작을 수행하도록 미리 구성되기 때문에, 각각의 최소 단위에 대한 동작의 선택은, 각각의 최소 단위가 수행하도록 미리 구성되는 동작의 타입에 의존하여 이루어진다.The control processor may send a control plane message via slice 1425 to configured components of slice 1425 . The configuration may determine the functions performed by the components of the slice 1425 . For example, a control message transmitted over slice 1425 may cause a hardware module to be configured to perform a function with respect to a data packet. Such control messages may cause a minimum unit of hardware modules to be interconnected into a pipeline of hardware modules to provide certain functions. Such a control message may cause an individual smallest unit of the hardware module to be configured to select an operation to be performed by the individually selected smallest unit. Since each minimum unit is preconfigured to perform one type of operation, the selection of an operation for each minimum unit is made dependent on the type of operation that each minimum unit is preconfigured to perform.

이제, 몇몇 추가적인 실시형태가 도 19 내지 도 21을 참조하여 설명될 것이다. 이 실시형태에서, 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인이 FPGA에서 실행된다. FPGA의 서브유닛으로 하여금 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인(feedforward pipeline)을 구현하게 하기 위한 방법이 설명될 것이다. 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인은 eBPF 프로그램 또는 P4 프로그램 또는 임의의 다른 적절한 프로그램일 수도 있다.Some additional embodiments will now be described with reference to FIGS. 19-21 . In this embodiment, the packet processing program or feedforward pipeline is executed in the FPGA. A method for causing a subunit of an FPGA to implement a packet processing program or a feedforward pipeline will be described. The packet processing program or feedforward pipeline may be an eBPF program or a P4 program or any other suitable program.

이 FPGA는 네트워크 인터페이스 디바이스에서 제공될 수도 있다. 몇몇 실시형태에서, 패킷 프로세싱 프로그램은, 네트워크 인터페이스 디바이스가 자신의 호스트와 관련하여 설치된 이후에만 배치되거나 또는 실행된다.This FPGA may be provided in a network interface device. In some embodiments, the packet processing program is deployed or executed only after the network interface device is installed in association with its host.

패킷 프로세싱 프로그램 또는 피드포워드 파이프라인은 루프가 없는 로직 플로우를 구현할 수도 있다.A packet processing program or feedforward pipeline may implement a loop-free logic flow.

몇몇 실시형태에서, 프로그램은 예컨대 유저 레벨에서 더 낮은 특권이 있는 도메인 또는 특권이 없는 도메인에서 작성될 수도 있다. 프로그램은 커널과 같은 특권이 있는 또는 더 높은 특권이 있는 도메인 상에서 실행될 수도 있다. 프로그램을 실행하는 하드웨어는 어떠한 루프도 없다는 것을 요구할 수도 있다.In some embodiments, a program may be written in a less privileged domain or a non-privileged domain, such as at the user level. A program may run on a privileged or higher privileged domain, such as the kernel. The hardware executing the program may require that there are no loops.

다음의 실시형태에서, eBPF 프로그램 예에 대한 참조가 이루어진다. 그러나, 다른 실시형태는 임의의 다른 적절한 프로그램과 함께 사용될 수도 있다는 것이 인식되어야 한다.In the following embodiments, reference is made to eBPF program examples. It should be appreciated, however, that other embodiments may be used with any other suitable program.

하기의 실시형태 중 하나 이상은 이전 실시형태 중 하나 이상과 연계하여 사용될 수도 있다는 것이 인식되어야 한다.It should be appreciated that one or more of the following embodiments may be used in connection with one or more of the preceding embodiments.

몇몇 실시형태는 FPGA, ASIC 또는 임의의 다른 적절한 하드웨어 디바이스의 맥락에서 제공될 수도 있다. 몇몇 실시형태는 FPGA 또는 ASIC 또는 등등의 서브유닛을 사용한다. 다음의 예는 FPGA를 참조하여 설명된다. 유사한 프로세스가 ASIC 또는 임의의 다른 적절한 하드웨어 디바이스를 사용하여 수행될 수도 있다는 것이 인식되어야 한다.Some embodiments may be provided in the context of an FPGA, ASIC, or any other suitable hardware device. Some embodiments use subunits such as FPGAs or ASICs or the like. The following examples are described with reference to FPGAs. It should be appreciated that a similar process may be performed using an ASIC or any other suitable hardware device.

서브유닛은 최소 단위일 수도 있다. 최소 단위의 몇몇 예는 이전에 설명되었다. 최소 단위의 그들 앞서 설명된 예 중 임의의 것은, 대안적으로 또는 추가적으로, 서브유닛으로서 사용될 수도 있다는 것이 인식되어야 한다. 대안적으로 또는 추가적으로, 이들 서브유닛은 "슬라이스" 또는 구성 가능한 로직 블록으로서 지칭될 수도 있다.A subunit may be a minimum unit. Some examples of minimum units have been described previously. It should be appreciated that any of those previously described examples of minimum units may alternatively or additionally be used as subunits. Alternatively or additionally, these subunits may be referred to as “slices” or configurable logic blocks.

이들 서브유닛의 각각은, 단일의 명령어 또는 복수의 관련된 명령어를 수행하도록 구성될 수도 있다. 후자의 경우, 관련된 명령어는 단일의 출력(이것은 하나 이상의 비트에 의해 정의될 수도 있음)을 제공할 수도 있다.Each of these subunits may be configured to perform a single instruction or a plurality of related instructions. In the latter case, the associated instruction may provide a single output (which may be defined by one or more bits).

서브유닛은 계산 유닛인 것으로 간주될 수 있다. 서브유닛은 패킷이 순서대로 프로세싱되는 파이프라인에서 배열될 수도 있다. 몇몇 실시형태에서, 서브유닛은 프로그램에서 각각의 명령어(또는 명령어들)를 실행하도록 동적으로 할당받을 수 있다.A subunit may be considered to be a computational unit. Subunits may be arranged in a pipeline in which packets are processed in order. In some embodiments, a subunit may be dynamically assigned to execute each instruction (or instructions) in a program.

몇몇 실시형태에서, 서브유닛은, 예를 들면, FPGA의 블록을 정의하기 위해 사용되는 유닛의 모두 또는 일부일 수도 있다. 몇몇 FPGA에서, FPGA의 블록은 슬라이스로 지칭된다. 몇몇 실시형태에서, 서브유닛 또는 최소 단위는 슬라이스와 동일한 것으로 생각된다.In some embodiments, a subunit may be, for example, all or part of a unit used to define a block of an FPGA. In some FPGAs, blocks of FPGAs are referred to as slices. In some embodiments, a subunit or smallest unit is considered equal to a slice.

각각의 최소 단위 또는 서브유닛을 FPGA의 각각의 블록 또는 슬라이스에 매핑하는 것에 의해, RTL 최소 단위를 FPGA 리소스에 매핑하는 접근법과 비교하여 향상된 리소스 활용이 달성될 수도 있다. 그러한 후자의 접근법은, RTL 최소 단위가 FPGA의 상대적으로 많은 수의 개개의 블록 또는 슬라이스를 필요로 하는 것을 초래할 수도 있다.By mapping each smallest unit or subunit to each block or slice of the FPGA, improved resource utilization may be achieved as compared to an approach of mapping an RTL smallest unit to an FPGA resource. Such a latter approach may result in RTL minimum units requiring a relatively large number of individual blocks or slices of the FPGA.

몇몇 실시형태에서, 컴파일링은 최소 단위 레벨에 대한 것일 수도 있다. 이것은 프로세싱이 파이프라인화된다는 이점을 가질 수도 있다. 패킷은 순서대로 프로세싱될 수도 있다. 컴파일 프로세스는 상대적으로 빠르게 수행될 수도 있다.In some embodiments, the compilation may be for a minimum unit level. This may have the advantage that processing is pipelined. Packets may be processed in order. The compilation process may be performed relatively quickly.

몇몇 실시형태에서, 산술 연산은 바이트당 하나의 슬라이스를 요구할 수도 있다. 논리 연산은 바이트당 절반의 슬라이스를 필요로 할 수도 있다. 시프트 동작은 시프트 동작의 폭에 따라 슬라이스의 모음을 필요로 할 수도 있다. 비교 동작은 바이트당 하나의 슬라이스를 필요로 할 수도 있다. 선택 동작은 바이트당 절반의 슬라이스를 필요로 할 수도 있다.In some embodiments, the arithmetic operation may require one slice per byte. A logical operation may require half a slice per byte. A shift operation may require aggregation of slices depending on the width of the shift operation. The compare operation may require one slice per byte. The select operation may require half a slice per byte.

컴파일 프로세스의 일부로서, 배치 및 라우팅이 수행된다. 배치는 특정한 명령어 또는 명령어들을 수행하기 위해 특정한 물리적 서브유닛을 할당하는 것이다. 라우팅은, 특정한 서브유닛의 출력 또는 출력들이, 예를 들면, 다른 서브유닛 또는 서브유닛들일 수도 있는 올바른 목적지로 라우팅된다는 것을 보장한다.As part of the compilation process, placement and routing are performed. Deployment is the allocation of a particular physical subunit to perform a particular instruction or instructions. Routing ensures that the output or outputs of a particular subunit are routed to the correct destination, which may be, for example, another subunit or subunits.

배치 및 라우팅은 파이프라인의 한쪽 끝에서 시작하여 특정한 서브유닛에 동작이 할당되는 프로세스를 사용할 수도 있다. 몇몇 실시형태에서, 가장 중요한 동작은 덜 중요한 동작에 앞서 배치될 수도 있다. 몇몇 실시형태에서, 라우팅은 특정한 동작이 배치되고 있는 것과 동시에 할당될 수도 있다. 몇몇 실시형태에서, 루트는 미리 계산된 루트의 제한된 세트로부터 선택될 수도 있다. 이것은 나중에 상세하게 설명될 것이다.Placement and routing may use a process in which actions are assigned to specific subunits, starting at one end of the pipeline. In some embodiments, the most important actions may be placed prior to the less important actions. In some embodiments, routing may be assigned at the same time a particular operation is being deployed. In some embodiments, the route may be selected from a limited set of pre-computed routes. This will be explained in detail later.

몇몇 실시형태에서, 루트가 할당될 수 없는 경우, 동작은 나중을 위해 유지될 것이다.In some embodiments, if a route cannot be assigned, the operation will be retained for later.

몇몇 실시형태에서, 미리 계산된 루트는 바이트 폭 루트(byte wide route)일 수도 있다. 그러나, 이것은 단지 예에 불과하며, 다른 실시형태에서, 상이한 폭의 루트가 정의될 수도 있다. 몇몇 실시형태에서, 복수의 상이한 사이즈의 루트가 제공될 수도 있다.In some embodiments, the pre-computed route may be a byte wide route. However, this is by way of example only, and in other embodiments, routes of different widths may be defined. In some embodiments, a plurality of different sized routes may be provided.

몇몇 실시형태에서, 라우팅은 인근 서브유닛 사이의 라우팅으로 제한될 수도 있다.In some embodiments, routing may be limited to routing between neighboring subunits.

몇몇 실시형태에서, 서브유닛은 FPGA 상의 규칙적인 구조물에서 물리적으로 배열될 수도 있다.In some embodiments, subunits may be physically arranged in regular structures on the FPGA.

몇몇 실시형태에서, 라우팅을 용이하게 하기 위해, 서브유닛이 통신할 수도 있는 방법에 관한 규칙이 만들어질 수도 있다. 예를 들면, 서브유닛은, 자신의 옆에, 자신의 위에 또는 자신의 아래에 있는 서브유닛으로만 출력을 제공할 수 있다.In some embodiments, to facilitate routing, rules may be made regarding how subunits may communicate. For example, a subunit may only provide output to a subunit next to it, above it, or below it.

대안적으로 또는 추가적으로, 라우팅 목적을 위해 다음 번 서브유닛이 얼마나 멀리 떨어져 있는지에 대한 제한을 둘 수도 있다. 예를 들면, 서브유닛은 인접한 서브유닛 또는 정의된 거리 이내에 있는(예를 들면, 단지 하나 개재하는 서브유닛이 존재함) 서브유닛으로만 데이터를 출력할 수도 있다.Alternatively or additionally, one may place a limit on how far the next subunit is for routing purposes. For example, a subunit may output data only to an adjacent subunit or a subunit within a defined distance (eg, there is only one intervening subunit).

몇몇 실시형태의 방법을 도시하는 도 19에 대한 참조가 이루어진다.Reference is made to FIG. 19 , which illustrates a method of some embodiments.

몇몇 실시형태에서, FPGA는 하나 이상의 "정적인" 영역 및 하나 이상의 "동적인" 영역을 구비할 수도 있다. 정적인 영역은 표준 구성을 제공하고 동적 기능은 엔드 유저의 요구에 따라 기능을 제공할 수도 있다. 정적인 부분은, 예를 들면, 엔드 유저가 네트워크 인터페이스 디바이스를 수신하기 이전에, 예를 들면, 네트워크 인터페이스 디바이스가 호스트와 관련하여 설치되기 이전에 정의될 수도 있다. 예를 들면, 정적인 영역은 네트워크 인터페이스 디바이스로 하여금 소정의 기능을 제공하게 하도록 구성될 수도 있다. 정적인 영역은 최소 단위 사이에서 미리 계산된 루트를 제공받을 것이다. 나중에 더욱 상세하게 논의될 바와 같이, 하나 이상의 동적인 영역을 통과하는 하나 이상의 정적인 영역 사이의 라우팅이 있을 수도 있다. 동적인 영역은, 네트워크 인터페이스 디바이스가 호스트와 관련하여 배치될 때, 엔드 유저에 의해 그들의 요구에 따라 구성될 수도 있다. 동적인 영역은 시간 경과에 따라 엔드 유저에 대해 상이한 기능을 수행하도록 구성될 수도 있다.In some embodiments, an FPGA may have one or more “static” regions and one or more “dynamic” regions. Static domains provide standard configuration and dynamic functions can also provide functionality based on end-user needs. The static part may be defined, for example, before the end user receives the network interface device, eg before the network interface device is installed with respect to the host. For example, a static area may be configured to cause a network interface device to provide certain functionality. A static region will be given a pre-computed route between the smallest units. As will be discussed in more detail later, there may be routing between one or more static regions through one or more dynamic regions. The dynamic area may be configured by end users according to their needs when the network interface device is deployed in relation to the host. Dynamic regions may be configured to perform different functions for end users over time.

단계(S1)에서, 메인 비트 파일(50) 및 도구 체크포인트(tool checkpoint)(52)로서 지칭되는 제1 비트 파일을 제공하기 위해 제1 컴파일 프로세스가 수행된다. 이것은 몇몇 실시형태에서 정적인 영역의 적어도 일부에 대한 비트 파일이다. 비트 파일은, FPGA로 다운로드되면, FPGA로 하여금, 프로그램 - 비트 파일은 이 프로그램으로부터 컴파일되었음 - 에서 명시되는 바와 같이 기능하게 할 것이다. 몇몇 실시형태에서, 제1 컴파일 프로세스에서 사용되는 프로그램은 임의의 하나 이상의 프로그램일 수도 있거나 또는 FPGA의 일부 내에서 라우팅의 결정을 지원하기 위해 특별히 설계되는 테스트 프로그램일 수도 있다. 몇몇 실시형태에서, 일련의 간단한 프로그램이 대안적으로 또는 추가적으로 사용될 수도 있다.In step S1 , a first compilation process is performed to provide a main bit file 50 and a first bit file referred to as a tool checkpoint 52 . This is a bit file for at least a portion of a static region in some embodiments. The bit file, when downloaded to the FPGA, will cause the FPGA to function as specified in the program - the bit file was compiled from this program. In some embodiments, the program used in the first compilation process may be any one or more programs or a test program specifically designed to support the determination of routing within a portion of the FPGA. In some embodiments, a series of simple programs may alternatively or additionally be used.

프로그램은 수정될 수도 있거나 또는 컴파일러에 의해 사용될 수 있는 재구성 가능한 파티션을 가질 수도 있다. 프로그램은, 재구성 가능한 파티션 밖으로 네트를 이동하는 것에 의해 컴파일러의 작업을 더 쉽게 만들 수도 있도록 수정될 수도 있다.A program may have reconfigurable partitions that may be modified or used by the compiler. The program may be modified to make the compiler's job easier by moving the net out of the reconfigurable partition.

단계(S1)는 설계 도구에서 수행될 수도 있다. 단지 예로서, Vivado(비바도) 도구는 Xilinx FPGA와 함께 사용될 수도 있다. 체크포인트 파일은 설계 도구에 의해 제공될 수도 있다. 체크포인트 파일은, 비트 파일이 생성되는 지점에서 설계의 스냅샷을 나타낸다. 체크포인트 파일은 하나 이상의 합성된 넷리스트, 설계 제약, 배치 정보 및 라우팅 정보를 포함할 수도 있다.Step S1 may be performed in a design tool. By way of example only, the Vivado tool may be used with Xilinx FPGAs. The checkpoint file may be provided by the design tool. A checkpoint file represents a snapshot of the design at the point at which the bit file is created. A checkpoint file may include one or more synthesized netlists, design constraints, placement information, and routing information.

단계(S2)에서, 비트 파일은 비트 파일 디스크립션(54)을 제공하기 위해 체크포인트 파일을 고려하면서 분석된다. 분석은 리소스를 검출하는 것, 루트를 생성하는 것, 타이밍을 체크하는 것, 하나 이상의 부분적인 비트 파일을 생성하는 것 및 비트 파일 디스크립션을 생성하는 것 중 하나 이상일 수도 있다.In step S2, the bit file is analyzed taking into account the checkpoint file to provide a bit file description 54. The analysis may be one or more of detecting a resource, creating a route, checking timing, generating one or more partial bit files, and generating a bit file description.

분석은 비트 파일로부터 라우팅 정보를 추출하도록 구성될 수도 있다. 분석은 신호가 어떤 와이어 또는 루트 상에서 전파되었는지를 결정하도록 구성될 수도 있다.The analysis may be configured to extract routing information from the bit file. The analysis may be configured to determine on which wire or route the signal propagated.

분석 국면(phase)은 합성 또는 설계 도구에서 적어도 부분적으로 수행될 수도 있다. 몇몇 실시형태에서 Vivado의 스크립팅 도구가 사용될 수도 있다. 스크립팅 도구는 TCL(tool command language; 도구 커맨드 언어)일 수도 있다. TCL은 Vivado의 성능을 추가하거나 또는 수정하기 위해 사용될 수 있다. Vivado의 기능은 TCL 스크립트에 의해 호출되고 제어될 수도 있다.The analysis phase may be performed at least in part in a synthesis or design tool. In some embodiments Vivado's scripting tools may be used. The scripting tool may be a tool command language (TCL). TCL can be used to add or modify the capabilities of Vivado. Vivado's functions can also be invoked and controlled by TCL scripts.

비트 파일 디스크립션(54)은 FPGA의 주어진 부분이 어떻게 사용될 수 있는지를 정의한다. 예를 들면, 비트 파일 디스크립션은, 어떤 최소 단위가 어떤 다른 최소 단위로 라우팅될 수 있는지 및 그들 최소 단위 사이에서 라우팅하는 것을 가능하게 하는 하나 이상의 루트를 나타낼 것이다. 예를 들면, 각각의 최소 단위에 대해, 비트 파일 디스크립션은, 그 최소 단위에 대한 입력이 유래할 수 있는 곳 및 그 최소 단위로부터의 출력이 데이터 출력을 위한 하나 이상의 루트와 함께 라우팅될 수 있는 곳을 나타낼 것이다. 비트 파일 디스크립션은 어떠한 프로그램과도 독립적이다.The bit file description 54 defines how a given portion of the FPGA can be used. For example, the bit file description may indicate which smallest unit may be routed to any other minimal unit and one or more routes that enable routing between those smallest units. For example, for each minimum unit, the bit file description may include where the input to that minimum unit may originate and where the output from that minimum unit may be routed along with one or more routes for data output. will indicate The bit file description is independent of any program.

비트 파일 디스크립션은, 루트 정보, 루트의 어떤 쌍이 충돌하는지의 표시 및 최소 단위의 필요한 구성으로부터 비트 파일을 생성하는 방법의 설명 중 하나 이상을 포함할 수도 있다.The bit file description may include one or more of route information, an indication of which pairs of routes collide, and a description of how to create a bit file from the minimum unit of necessary configuration.

비트 파일 디스크립션은, 최소 단위의 세트 사이에서 이용 가능한 그러나 임의의 특정한 명령어가 주어진 최소 단위에 의해 수행되기 이전에 루트의 세트를 제공할 수도 있다.The bit file description may provide the set of roots available between the set of minimum units, but before any particular instruction is executed by the given minimum unit.

비트 파일 디스크립션은 FPGA의 일부에 대한 것일 수도 있다. 비트 파일 디스크립션은 동적인 FPGA 부분에 대한 것일 수도 있다. 비트 파일 디스크립션은, 어떤 루트가 이용 가능한지 및/또는 어떤 루트가 이용 가능하지 않은지를 포함할 것이다. 예를 들면, 비트 파일은, FPGA의 동적인 부분에 대해, 예를 들면, FPGA의 정적인 부분(들)에 의해 필요로 되는 FPGA의 동적인 부분을 가로지르는 임의의 라우팅을 고려하여 어떤 루트가 이용 가능한지를 나타낼 수도 있다.The bit file description may be for a part of the FPGA. The bit file description may be for a dynamic FPGA part. The bit file description will include which routes are available and/or which routes are not. For example, the bit file may have any route to the dynamic portion of the FPGA, taking into account any routing across the dynamic portion of the FPGA that is required by, for example, the static portion(s) of the FPGA. It can also indicate whether it is available.

몇몇 실시형태에서, 비트 파일 디스크립션은 임의의 적절한 방식으로 획득될 수도 있다는 것이 인식되어야 한다. 예를 들면, FPGA 또는 ASIC 제공자에 의해 비트 파일 디스크립션이 제공될 수도 있다.It should be appreciated that, in some embodiments, the bit file description may be obtained in any suitable manner. For example, the bit file description may be provided by the FPGA or ASIC provider.

몇몇 실시형태에서, 비트 파일 디스크립션은 설계 도구에 의해 제공될 수도 있다. 이 실시형태에서, 분석 단계는 생략될 수도 있다. 설계 도구는 비트 파일 디스크립션을 출력할 수도 있다. 비트 파일 디스크립션은, FPGA의 동적인 부분을 가로지르는 임의의 필요한 라우팅을 포함하는 FPGA의 정적인 부분에 대한 것일 수도 있다.In some embodiments, the bit file description may be provided by a design tool. In this embodiment, the analysis step may be omitted. The design tool may output a bit file description. The bit file description may be for a static portion of the FPGA, including any necessary routing across the dynamic portion of the FPGA.

비트 파일 디스크립션을 생성하기 위해 임의의 다른 적절한 기술이 사용될 수도 있다는 것이 인식되어야 한다. 앞서 설명된 예제에서, FPGA를 설계하기 위해 사용되는 도구는 비트 파일을 생성하기 위해 사용되는 분석을 제공하기 위해 사용된다.It should be appreciated that any other suitable technique may be used to generate the bit file description. In the example described above, the tools used to design the FPGA are used to provide the analysis used to generate the bit files.

다른 실시형태에서, 상이한 도구가 사용될 수도 있다는 것이 인식되어야 한다. 도구는 몇몇 실시형태에서, 제품 또는 일정 범위의 제품에 고유할 수도 있다. 예를 들면, FPGA 제공자는, 그 FPGA를 관리하기 위한 관련된 도구를 제공할 수도 있다.It should be appreciated that in other embodiments, different tools may be used. A tool may, in some embodiments, be specific to a product or a range of products. For example, the FPGA provider may provide related tools for managing the FPGA.

다른 실시형태에서, 일반적인 스크립팅 도구가 사용될 수도 있다.In other embodiments, generic scripting tools may be used.

몇몇 실시형태에서, 부분적인 비트 파일을 결정하기 위해, 상이한 도구 또는 상이한 기술이 사용될 수도 있다. 예를 들면, 메인 비트 파일은, 어떤 피쳐가 어떤 피쳐에 대응하는지 결정하기 위해 분석될 수도 있다. 이것은 복수의 부분적인 비트 파일이 생성되는 것을 요구할 수도 있다.In some embodiments, different tools or different techniques may be used to determine the partial bit file. For example, the main bits file may be analyzed to determine which features correspond to which features. This may require that multiple partial bit files be created.

단계(S3)는, 네트워크 인터페이스 디바이스가 호스트와 관련하여 설치될 때 수행되고 물리적 FPGA 디바이스 상에서 실행된다는 것이 인식되어야 한다. 단계(S1 및 S2)는, 네트워크 인터페이스 디바이스를 구현하는 비트 파일 이미지를 생성하기 위해 설계 합성 프로세스의 일부로서 수행될 수도 있다. 몇몇 실시형태에서, 단계(S1) 및/또는 단계(S2)는 FPGA의 거동을 특성 묘사하기 위해 사용된다. 일단 FPGA가 특성 묘사되면, 비트 파일 디스크립션은, 주어진 정의된 방식으로 동작할 모든 물리적 네트워크 인터페이스 디바이스에 대한 메모리에서 저장된다.It should be appreciated that step S3 is performed when the network interface device is installed in association with the host and executed on the physical FPGA device. Steps S1 and S2 may be performed as part of the design synthesis process to generate a bit file image implementing the network interface device. In some embodiments, step S1 and/or step S2 is used to characterize the behavior of the FPGA. Once the FPGA is characterized, the bit file description is stored in memory for all physical network interface devices that will operate in a given defined manner.

단계(S3)에서, 비트 파일 디스크립션 및 eBPF 프로그램을 사용하여 컴파일이 수행된다. 컴파일의 출력은 eBPF 프로그램에 대한 부분적인 비트 파일이다. 컴파일은, 루트를, 부분적인 비트 파일에 그리고 슬라이스 중 개개의 슬라이스에 의해 수행될 프로그래밍에 추가할 것이다.In step S3, compilation is performed using the bit file description and the eBPF program. The output of the compilation is a partial bit file to the eBPF program. Compilation will add the root to the partial bit files and programming to be performed by individual slices of the slice.

비트 파일 디스크립션은 전개되는 시스템에서 제공될 수도 있다는 것이 인식되어야 한다. 비트 파일 디스크립션은 메모리에서 저장될 수도 있다. 비트 파일 디스크립션은 FPGA 상에서, 네트워크 인터페이스 디바이스 상에서 또는 호스트 디바이스 상에서 저장될 수도 있다. 몇몇 실시형태에서, 비트 파일 디스크립션은, 네트워크 인터페이스 디바이스 상의 FPGA에 연결되는 플래시 메모리 또는 등등에서 저장된다. 플래시 메모리는 메인 비트 파일을 또한 포함할 수도 있다.It should be appreciated that the bit file description may be provided by the system being deployed. The bit file description may be stored in memory. The bit file description may be stored on the FPGA, on a network interface device, or on a host device. In some embodiments, the bit file description is stored in flash memory or the like that is coupled to the FPGA on the network interface device. The flash memory may also include a main bit file.

eBPF 프로그램은 비트 파일 디스크립션과 함께 또는 별개로 저장될 수도 있다. eBPF 프로그램은 FPGA 상에서, 네트워크 인터페이스 디바이스 상에서 또는 호스트 상에서 저장될 수도 있다. eBPF의 경우, 프로그램은 유저 모드 프로그램으로부터 커널로 전송될 수도 있는데, 이들 둘 모두는 호스트 상에서 실행된다. 커널은 프로그램을 디바이스 드라이버로 전송할 것인데, 디바이스 드라이버는, 그 다음, 그것을, 호스트 또는 네트워크 인터페이스 디바이스 중 어느 하나 상에서 실행되는 컴파일러로 전송할 것이다. 몇몇 실시형태에서, eBPF 프로그램은, 호스트 OS가 부팅되기 이전에 실행될 수 있도록, 네트워크 인터페이스 디바이스 상에서 저장될 수도 있다.The eBPF program may be stored separately or together with the bit file description. The eBPF program may be stored on the FPGA, on the network interface device, or on the host. In the case of eBPF, programs may be transferred from user mode programs to the kernel, both running on the host. The kernel will send the program to the device driver, which will then send it to the compiler running on either the host or network interface device. In some embodiments, the eBPF program may be stored on the network interface device so that it can be executed before the host OS is booted.

컴파일러는 네트워크 인터페이스 디바이스, FPGA 또는 호스트 상의 임의의 적절한 위치에서 제공될 수도 있다. 단지 예로서, 컴파일러는 네트워크 인터페이스 디바이스 상의 CPU 상에서 실행될 수도 있다.The compiler may be provided in a network interface device, FPGA, or any suitable location on the host. By way of example only, the compiler may run on a CPU on a network interface device.

이제, 컴파일러 플로우가 설명될 것이다. 컴파일러의 프론트 엔드는 eBPF 프로그램을 받아들인다. eBPF 프로그램은 임의의 적절한 언어로 작성될 수도 있다. 예를 들면, eBPF 프로그램은 C 타입 언어로 작성될 수도 있다. 컴파일러는 프로그램을 중간 표현(intermediate representation; IR)으로 변환하도록 프론트 엔드에서 구성된다. 몇몇 실시형태에서, IR은 LLVM-IR 또는 임의의 다른 적절한 IR일 수도 있다.Now, the compiler flow will be described. The front end of the compiler accepts eBPF programs. The eBPF program may be written in any suitable language. For example, the eBPF program may be written in a C type language. A compiler is configured at the front end to translate the program into an intermediate representation (IR). In some embodiments, the IR may be LLVM-IR or any other suitable IR.

몇몇 실시형태에서, 패킷/맵 액세스 프리미티브(primitive)를 생성하기 위해 포인터 분석이 수행될 수도 있다.In some embodiments, pointer analysis may be performed to generate packet/map access primitives.

몇몇 실시형태에서, IR의 최적화가 컴파일러에 의해 수행될 수도 있다는 것이 인식되어야 한다. 이것은 몇몇 실시형태에서 옵션 사항일 수도 있다.It should be appreciated that, in some embodiments, optimization of IR may be performed by a compiler. This may be optional in some embodiments.

컴파일러의 하이 레벨 합성 백엔드는 프로그램 파이프라인을 스테이지로 분할하도록, 패킷 액세스 탭을 생성하도록, 그리고 C 코드를 방출하도록 구성된다. 몇몇 실시형태에서, 설계 도구의 HLS 부분 및/또는 사용되고 있는 설계 도구는, HLS 국면의 출력을 합성하기 위해 호출될 수도 있다.The compiler's high-level synthesis backend is configured to split the program pipeline into stages, generate packet access taps, and emit C code. In some embodiments, the HLS portion of the design tool and/or the design tool being used may be invoked to synthesize the output of the HLS phase.

FPGA 최소 단위에 대한 컴파일러 백엔드는, 파이프라인을 스테이지로 분할하고 패킷 액세스 탭을 생성한다. 제어 종속성을 데이터 종속성으로 변환하기 위해, if 변환(if-conversion)이 수행될 수도 있다. 설계는 배치되고 라우팅된다. eBPF 프로그램에 대한 부분적인 비트 파일은 방출된다.The compiler backend to the FPGA smallest unit splits the pipeline into stages and creates packet access taps. To convert a control dependency into a data dependency, an if-conversion may be performed. The design is placed and routed. Partial bit files for eBPF programs are emitted.

라우팅 충돌이 있는 도 20a에서 도시되는 바와 같이, 라우팅 문제가 발생할 수 있다. 예를 들면, 슬라이스 A는 슬라이스 C와 통신할 수도 있고, 슬라이스 B는 슬라이스 D와 통신할 수도 있다. 도 20a의 배열에서, 공통 라우팅 부분(60)은 슬라이스 A와 슬라이스 C 사이의 통신뿐만 아니라 슬라이스 B와 D 사이의 통신에도 할당되었다. 몇몇 실시형태에서, 이러한 라우팅 충돌은 방지될 수도 있다. 이와 관련하여 도 20b에 대한 참조가 이루어진다. 알 수 있는 바와 같이, 슬라이스 B와 슬라이스 D 사이의 루트(64)와 비교하여, 슬라이스 A와 슬라이스 C 사이에서 별개의 루트(62)가 제공된다.As shown in FIG. 20A with routing conflicts, routing problems may arise. For example, slice A may communicate with slice C, and slice B may communicate with slice D. In the arrangement of FIG. 20A , a common routing portion 60 has been allocated for communication between slices B and D as well as communication between slices A and C. In some embodiments, such routing conflicts may be avoided. Reference is made to FIG. 20b in this regard. As can be seen, a separate route 62 is provided between slice A and slice C compared to route 64 between slice B and slice D.

몇몇 실시형태에서, 비트 파일 디스크립션은 서브유닛의 적어도 몇몇 쌍에 대한 복수의 상이한 루트를 포함할 수도 있다. 컴파일 프로세스는 도 20a에서 도시되는 바와 같은 라우팅 충돌을 체크할 것이다. 라우팅 충돌의 경우, 컴파일러는 루트 중 적절한 대안적인 루트를 선택하는 것에 의해 그러한 충돌을 해결하거나 또는 방지할 수 있다.In some embodiments, the bit file description may include a plurality of different routes for at least some pairs of subunits. The compilation process will check for routing conflicts as shown in Fig. 20A. In the case of routing conflicts, the compiler may resolve or avoid such conflicts by choosing an appropriate alternative route among the routes.

도 21은 eBPF 프로그램을 수행하기 위한 FPGA의 파티션(66)을 도시한다. 파티션은, 예를 들면, 일련의 입력 플립플롭(68) 및 일련의 출력 플립플롭을 통해 FPGA의 정적인 부분과 인터페이싱한다. 몇몇 실시형태에서, 앞서 논의되는 바와 같이 설계 전역에서 라우팅(70)이 있을 수도 있다.21 shows a partition 66 of the FPGA for performing the eBPF program. The partition interfaces with the static portion of the FPGA via, for example, a series of input flip-flops 68 and a series of output flip-flops. In some embodiments, there may be routing 70 throughout the design as discussed above.

컴파일러는 컴파일러에 의해 구성되고 있는 FPGA 영역에 걸친 라우팅을 처리할 필요가 있을 수도 있다. 컴파일러는 메인 비트 파일 내에서 재구성 가능한 파티션에 맞는 부분적인 비트 파일을 생성할 필요가 있다. 메인 비트 파일이 재구성 가능한 파티션을 가지고 생성되는 경우, 설계 도구는, 부분적인 비트 파일에 의해 로직 리소스가 사용될 수 있도록, 재구성 가능한 파티션 내에서 그들 리소스를 사용하는 것을 방지할 것이다. 그러나, 설계 도구는 재구성 가능한 파티션 내에서 라우팅 리소스의 사용을 방지할 수 없을 수도 있다.The compiler may need to handle routing across the FPGA area being configured by the compiler. The compiler needs to create a partial bitfile that fits into a reconfigurable partition within the main bitfile. If the main bit file is created with reconfigurable partitions, the design tool will avoid using those resources within the reconfigurable partition so that logical resources can be used by the partial bit files. However, the design tool may not be able to prevent the use of routing resources within the reconfigurable partition.

결과적으로, 분석 도구는, 메인 비트 파일 내에 있는 설계 도구에 의해 사용되었던 라우팅 리소스의 사용을 방지할 필요가 있을 것이다. 분석 도구는, 비트 파일 디스크립션에서의 이용 가능한 루트의 자신의 목록이 메인 비트 파일에 의해 사용되고 있는 리소스를 사용하는 어떠한 것도 포함하지 않는다는 것을 보장할 필요가 있을 수도 있다. 이용 가능한 루트는, FPGA가 고도로 규칙적이기 때문에, FPGA 내의 많은 수의 장소에서 사용될 수 있는 루트 템플릿의 관점에서 정의될 수도 있다. 메인 비트 파일에 의해 사용되는 라우팅 리소스는 규칙성을 깨뜨리고, 그들이 메인 비트 파일과 충돌할 장소에서 그들 템플릿을 사용하는 것을 분석 도구가 방지한다는 것을 의미한다. 분석 도구는 그들 장소에서 사용될 수 있는 새로운 루트 템플릿을 생성하는 것 및/또는 소정의 루트 템플릿이 특정한 위치에서 사용되는 것을 방지하는 것을 필요로 할 수도 있다.As a result, the analysis tool will need to avoid the use of routing resources that were used by the design tool in the main bit file. The analysis tool may need to ensure that its list of available routes in the bit file description does not contain anything using resources being used by the main bit file. The available routes may be defined in terms of route templates that can be used in a large number of places within the FPGA, since FPGAs are highly regular. The routing resources used by the main bit file break the regularity, meaning that the analysis tool prevents them from using their templates in places where they would collide with the main bit file. Analysis tools may require creating new route templates that can be used at those locations and/or preventing certain route templates from being used at specific locations.

이제, 몇몇 예시적인 eBPF 프로그램 단편(fragment)을 최소 단위에 의해 수행될 명령어로 변환함에 있어서 컴파일러에 의해 제공되는 기능의 몇몇 예가 설명될 것이다.Some examples of functionality provided by the compiler in converting some exemplary eBPF program fragments into instructions to be executed by a minimal unit will now be described.

몇몇 실시형태는 비트 파일 디스크립션을 생성하기 위해 임의의 적절한 합성 도구를 사용할 수도 있다. 단지 예로서, 몇몇 실시형태는, 하드웨어에 대해 최소 단위 트랜잭션(atomic transaction)을 사용하는 모드에 기초하는 Bluespec(블루스펙) 도구를 사용할 수도 있다.Some embodiments may use any suitable synthesis tool to generate the bit file description. By way of example only, some embodiments may use the Bluespec tool, which is based on a mode that uses atomic transactions for hardware.

제1 예에서, eBPF 프로그램 단편은 두 개의 명령어를 갖는다:In a first example, the eBPF program fragment has two instructions:

명령어 1: r1 += r2Command 1: r1 += r2

명령어 2: r1 += r3Command 2: r1 += r3

제1 명령어는 레지스터 1(r1) 내의 숫자를 레지스터 2(r2)의 숫자에 더하고 결과를 r1에 배치한다. 제2 명령어는 r1을 r3에 더하고 결과를 r1에 배치한다. 이 예에서의 명령어 둘 모두는 64 비트 레지스터를 사용하지만 그러나 가장 낮은 32 비트만을 사용한다. 결과의 상위 32 비트는 0으로 채워진다.The first instruction adds the number in register 1 (r1) to the number in register 2 (r2) and places the result in r1. The second instruction adds r1 to r3 and places the result in r1. Both instructions in this example use 64-bit registers, but only the lowest 32 bits. The upper 32 bits of the result are padded with zeros.

컴파일러는 이들을 최소 단위에 의해 수행될 명령어로 변환할 것이다. 32 비트 가산 명령어(add instruction)는 32 쌍의 룩업 테이블(LUT), 32 비트 캐리 체인(carry chain) 및 32 개의 플립플롭을 필요로 한다.The compiler will translate these into instructions to be executed by the smallest unit. A 32-bit add instruction requires 32 pairs of lookup tables (LUTs), a 32-bit carry chain and 32 flip-flops.

룩업 테이블의 각각의 쌍은 두 개의 비트를 더하여 2 비트 결과를 생성한다. 캐리 체인은, 가산 동안, 한 비트가 숫자 열(digit column)로부터 다음 번 열로 옮겨지는 것을 허용하고, 한 비트가, 감산 동안, 다음 번 열로부터 빌려지는 것을 허용하는 구조이다.Each pair of lookup tables adds two bits to produce a 2-bit result. A carry chain is a structure that allows one bit to be moved from a digit column to the next during addition, and allows a bit to be borrowed from the next column, during subtraction.

32 개의 플립플롭은, 하나의 클록 사이클 상에서 값을 받아들이고 다음 번 클록 사이클 상에서 그것을 재현하는 저장 엘리먼트이다. 이들은 클록 사이클당 행해지는 작업의 양을 제한하기 위해 그리고 타이밍 분석을 단순화하기 위해 사용될 수도 있다.The 32 flip-flops are storage elements that accept a value on one clock cycle and reproduce it on the next clock cycle. They may be used to limit the amount of work done per clock cycle and to simplify timing analysis.

몇몇 실시형태에서, FPGA는 다수의 슬라이스를 포함할 수도 있다. 몇몇 예시적인 슬라이스에서, 캐리 체인은 슬라이스의 저부(CIN)로부터 슬라이스의 상단(COUT) - 이것은, 그 다음, 다음 번 슬라이스의 CIN 입력에 연결됨 - 으로 전파된다.In some embodiments, an FPGA may include multiple slices. In some example slices, the carry chain propagates from the bottom of the slice (CIN) to the top of the slice (COUT), which is then connected to the CIN input of the next slice.

각각의 슬라이스가 4 비트 캐리 체인을 갖는 예에서, 32 비트 가산을 수행하기 위해서는 여덟 개의 슬라이스가 사용된다. 이 실시형태에서, 최소 단위는 한 쌍의 슬라이스에 의해 제공되는 것으로서 간주될 수도 있다. 이것은, 몇몇 실시형태에서는 최소 단위가 8 비트 값 상에서 동작하는 것이 편리할 수도 있기 때문이다.In the example where each slice has a 4-bit carry chain, eight slices are used to perform a 32-bit addition. In this embodiment, the minimum unit may be considered as provided by a pair of slices. This is because in some embodiments it may be convenient for the smallest unit to operate on 8-bit values.

각각의 슬라이스가 8 비트 캐리 체인을 갖는 예에서, 32 비트 가산을 수행하기 위해 네 개의 슬라이스가 사용된다. 이 실시형태에서, 최소 단위는 슬라이스에 의해 제공되는 것으로서 간주될 수도 있다.In the example where each slice has an 8-bit carry chain, four slices are used to perform a 32-bit addition. In this embodiment, the smallest unit may be considered as being provided by a slice.

이것은 단지 예에 불과하며, 앞서 논의되는 바와 같이, 최소 단위는 임의의 적절한 방식으로서 정의될 수도 있다는 것이 인식되어야 한다.It should be appreciated that this is by way of example only, and, as discussed above, the minimum unit may be defined in any suitable manner.

이 예에서는, 이제, FPGA가 8 비트 캐리 체인을 지원하는 슬라이스를 구비하는 경우가 제1 예시적인 eBPF 프로그램의 단편의 컴파일에서 사용될 것이다.In this example, now the case where the FPGA has a slice that supports 8-bit carry chains will be used in the compilation of the fragment of the first exemplary eBPF program.

32 비트 폭의 3 개의 입력 값 및 32 비트 폭의 1 개의 출력 값이 있다. 그들 3 개의 입력 값을 생성한 다른 더 이전의 명령어가 있을 수도 있다. 다음에서는, 슬라이스(최소 단위)의 어떤 임의적인 위치가 가정될 것이다.There are three input values that are 32 bits wide and one output value that are 32 bits wide. There may be other older instructions that generated those three inputs. In the following, some arbitrary position of the slice (minimum unit) will be assumed.

다음의 번호 지정 규칙(numbering convention)이 사용될 것이다. 슬라이스(최소 단위)는 규칙적인 행과 열 배열로 배열된다. XnYm은 배열에서의 최소 단위의 위치를 나타낸다. Xn은 열을 나타내고 Ym은 행을 나타낸다. X6Y0은 슬라이스가 열 6에 그리고 행 0에 있다는 것을 나타낸다. 다른 실시형태에서는 임의의 다른 적절한 번호 지정 스킴이 사용될 수도 있다는 것이 인식되어야 한다.The following numbering convention will be used. Slices (minimum units) are arranged in a regular row and column arrangement. XnYm represents the position of the smallest unit in the array. Xn represents a column and Ym represents a row. X6Y0 indicates that the slice is in column 6 and in row 0. It should be appreciated that any other suitable numbering scheme may be used in other embodiments.

다음의 위치에서 초기 값이 동시에 생성되었다는 것을 가정한다:Assume that the initial values were generated concurrently in the following locations:

r1: 슬라이스 X6Y0, X6Y1, X6Y2 및 X6Y3r1: slices X6Y0, X6Y1, X6Y2 and X6Y3

r2: 슬라이스 X6Y4, X6Y5, X6Y6 및 X6Y7r2: slices X6Y4, X6Y5, X6Y6 and X6Y7

r3: 슬라이스 X6Y8, X6Y9, X6Y10 및 X6Y11r3: slices X6Y8, X6Y9, X6Y10 and X6Y11

제1 명령어의 결과는, 캐리 체인이 위로 올바르게 연결되도록 동일한 열에 있는 네 개의 인접한 슬라이스에 의해 계산될 필요가 있다. 컴파일러는 슬라이스 X7Y0, X7Y1, X7Y2 및 X7Y3에서 그 결과를 계산할 것을 선택할 수도 있다. 그것이 작동하기 위해서는, 입력은 위로 연결될 필요가 있다. X6Y0에서부터 X7Y0으로의 연결, X6Y1로부터 X7Y1로의 다른 연결, X6Y2에서부터 X7Y2로의 연결, X6Y3에서부터 X7Y3으로의 연결이 있을 것이다. 또한, X6Y4-X6Y7에서부터 X7Y0-X7Y3으로의 대응하는 연결이 있을 필요가 있다.The result of the first instruction needs to be computed by the four adjacent slices in the same column so that the carry chain is correctly connected up. The compiler may choose to compute the result in slices X7Y0, X7Y1, X7Y2 and X7Y3. For it to work, the input needs to be wired up. There will be a connection from X6Y0 to X7Y0, another connection from X6Y1 to X7Y1, a connection from X6Y2 to X7Y2, and a connection from X6Y3 to X7Y3. Also, there needs to be a corresponding connection from X6Y4-X6Y7 to X7Y0-X7Y3.

이들은 8 개의 입력 비트의 각각이 대응하는 출력 비트에 연결된다는 것을 의미하는 전체 바이트 연결(full-byte connection)일 것이다. 예를 들면:These would be full-byte connections meaning that each of the 8 input bits is connected to the corresponding output bit. For example:

슬라이스 X6Y0 플립 플립(flip-flip) 0으로부터의 출력이 슬라이스 X7Y0 LUT 0의 입력 0에 연결됨.Slice X6Y0 Flip Output from flip-flip 0 connected to input 0 of slice X7Y0 LUT 0.

슬라이스 X6Y0 플립 플립 1의 출력이 슬라이스 X7Y0 LUT 1의 입력 0에 연결됨.Slice X6Y0 Flip Flip 1's output connected to input 0 of slice X7Y0 LUT 1.

다음까지 계속 그런 식임It's like that until the next time

슬라이스 X6Y0 플립 플립 7로부터의 출력이 슬라이스 X7Y0 LUT 7의 입력 0에 연결됨.Slice X6Y0 Flip Output from flip 7 connected to input 0 of slice X7Y0 LUT 7.

제1 클록 사이클 동안, 슬라이스 X6Y0-X6Y7의 r1 및 r2 값은, 슬라이스 X7Y0-X7Y3의 입력으로 전달될 것이고, LUT 및 캐리 체인에 의해 프로세싱될 것이고, 그리고 결과는, 다음 번 사이클 상에서의 사용되도록 준비가 된, 그들 슬라이스 X7Y0-X7Y3의 플립 플립에 저장될 것이다.During the first clock cycle, the r1 and r2 values of slice X6Y0-X6Y7 will be passed to the input of slice X7Y0-X7Y3, will be processed by the LUT and carry chain, and the result will be ready for use on the next cycle , they will be stored in the flip flip of slices X7Y0-X7Y3.

명령어 2로 이동한다. 컴파일러는 명령어 2의 결과를 계산할 장소를 선택할 필요가 있다. 그것은 슬라이스 X7Y4 내지 X7Y7을 선택할 수도 있다. 다시, 명령어 1(X7Y0 내지 X7Y3)의 결과로부터 명령어 2(X7Y4 내지 X7Y7)에 대한 입력까지 전체 바이트 연결이 있을 것이다.Go to command 2. The compiler needs to choose where to compute the result of instruction 2. It may select slices X7Y4 to X7Y7. Again, there will be a full byte concatenation from the result of instruction 1 (X7Y0 to X7Y3) to the input to instruction 2 (X7Y4 to X7Y7).

r3 값도 또한 필요로 된다. r1, r2 및 r3이 사이클 0에서 생성되었다면, 그러면, r1 + r2는 사이클 1에서 생성될 것이다. r3의 값은, 그것이 사이클 1에서 생성되도록, 클록 사이클만큼 지연될 필요가 있다. 컴파일러는 슬라이스 X7Y8 내지 X7Y11을 사용하여 사이클 1에서 r3을 생성할 것을 선택할 수도 있다. 그 다음, 사이클 0(X6Y8 내지 X6Y11)에서 r3을 생성한 원래의 슬라이스로부터 사이클 1(X7Y8 내지 X7Y11)에서 동일한 값을 생성하는 새 슬라이스까지의 연결이 있을 필요가 있을 것이다. 그것을 수행하면, 이제, 그들 새로운 슬라이스로부터 명령어 2에 대한 슬라이스까지의 연결이 있을 필요가 있다. 따라서, 슬라이스 X7Y8로붜터의 출력은 슬라이스 X7Y4의 입력에 연결되고 계속 그런 식이다.The r3 value is also required. If r1, r2 and r3 were generated in cycle 0, then r1 + r2 would be generated in cycle 1. The value of r3 needs to be delayed by a clock cycle so that it is generated in cycle 1. The compiler may choose to generate r3 in cycle 1 using slices X7Y8 through X7Y11. Then there will need to be a concatenation from the original slice that produced r3 in cycle 0 (X6Y8 to X6Y11) to the new slice that produced the same value in cycle 1 (X7Y8 to X7Y11). Having done that, now we need to have a connection from their new slice to the slice for instruction 2. Thus, the output of the slice X7Y8 rotor is connected to the input of the slice X7Y4 and so on.

그러면, FPGA 비트 파일은 다음의 피쳐를 포함할 것이다:Then the FPGA bit file will contain the following features:

- X6Y0으로부터 X7Y0 입력 0까지의 전체 바이트 연결(초기 r1 바이트 0)- whole byte concatenation from X6Y0 to X7Y0 input 0 (initial r1 byte 0)

- X6Y1로부터 X7Y1 입력 0까지의 전체 바이트 연결(초기 r1 바이트 1)- whole byte concatenation from X6Y1 to X7Y1 input 0 (initial r1 byte 1)

- X6Y2로부터 X7Y2 입력 0까지의 전체 바이트 연결(초기 r1 바이트 2)- whole byte concatenation from X6Y2 to X7Y2 input 0 (initial r1 byte 2)

- X6Y3으로부터 X7Y3 입력 0까지의 전체 바이트 연결(초기 r1 바이트 3)- whole byte concatenation from X6Y3 to X7Y3 input 0 (initial r1 byte 3)

- X6Y4로부터 X7Y0 입력 1까지의 전체 바이트 연결(초기 r2 바이트 0)- whole byte concatenation from X6Y4 to X7Y0 input 1 (initial r2 byte 0)

- X6Y5로부터 X7Y1 입력 1까지의 전체 바이트 연결(초기 r2 바이트 1)- whole byte concatenation from X6Y5 to X7Y1 input 1 (initial r2 byte 1)

- X6Y6으로부터 X7Y2 입력 1까지의 전체 바이트 연결(초기 r2 바이트 2)- Concatenate full bytes from X6Y6 to X7Y2 input 1 (initial r2 byte 2)

- X6Y7로부터 X7Y3 입력 1까지의 전체 바이트 연결(초기 r2 바이트 3)- whole byte concatenation from X6Y7 to X7Y3 input 1 (initial r2 byte 3)

- X6Y8로부터 X7Y8 입력 0까지의 전체 바이트 연결(초기 r3 바이트 0)- whole byte concatenation from X6Y8 to X7Y8 input 0 (initial r3 byte 0)

- X6Y9로부터 X7Y9 입력 0까지의 전체 바이트 연결(초기 r3 바이트 1)- whole byte concatenation from X6Y9 to X7Y9 input 0 (initial r3 byte 1)

- X6Y10으로부터 X7Y10 입력 0까지의 전체 바이트 연결(초기 r3 바이트 2)- Concatenate whole bytes from X6Y10 to X7Y10 input 0 (initial r3 byte 2)

- X6Y11로부터 X7Y11 입력 0까지의 전체 바이트 연결(초기 r3 바이트 3)- Concatenate whole bytes from X6Y11 to X7Y11 input 0 (initial r3 byte 3)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y0(명령어 1 바이트 0)- Slice X7Y0 configured to add input 0 to input 1 (instruction 1 byte 0)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y1(명령어 1 바이트 1)- Slice X7Y1 configured to add input 0 to input 1 (instruction 1 byte 1)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y2(명령어 1 바이트 2)- Slice X7Y2 configured to add input 0 to input 1 (instruction 1 byte 2)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y3(명령어 1 바이트 3)- Slice X7Y3 configured to add input 0 to input 1 (instruction 1 byte 3)

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y8(r3 지연 바이트 0)- Slice X7Y8 (r3 delay byte 0) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y9(r3 지연 바이트 1)- Slice X7Y9 (r3 delay byte 1) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y10(r3 지연 바이트 2)- Slice X7Y10 (r3 delay byte 2) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y11(r3 지연 바이트 3)- Slice X7Y11 (r3 delay byte 3) configured to copy input 0 to output

- X7Y0으로부터 X7Y4 입력 0까지의 전체 바이트 연결(명령어 1 바이트 0)- Concatenate all bytes from X7Y0 to X7Y4 input 0 (command 1 byte 0)

- X7Y1로부터 X7Y5 입력 0까지의 전체 바이트 연결(명령어 1 바이트 1)- Concatenate all bytes from X7Y1 to X7Y5 input 0 (command 1 byte 1)

- X7Y2로부터 X7Y6 입력 0까지의 전체 바이트 연결(명령어 1 바이트 2)- Concatenate all bytes from X7Y2 to X7Y6 input 0 (command 1 byte 2)

- X7Y3로부터 X7Y7 입력 0까지의 전체 바이트 연결(명령어 1 바이트 3)- Concatenate all bytes from X7Y3 to X7Y7 input 0 (command 1 byte 3)

- X7Y8로부터 X7Y4 입력 1까지의 전체 바이트 연결(r3 지연 바이트 0)- whole byte concatenation from X7Y8 to X7Y4 input 1 (r3 delay byte 0)

- X7Y9로부터 X7Y5 입력 1까지의 전체 바이트 연결(r3 지연 바이트 1)- whole byte concatenation from X7Y9 to X7Y5 input 1 (r3 delay byte 1)

- X7Y10으로부터 X7Y6 입력 1까지의 전체 바이트 연결(r3 지연 바이트 2)- whole byte concatenation from X7Y10 to X7Y6 input 1 (r3 delay byte 2)

- X7Y11로부터 X7Y7 입력 1까지의 전체 바이트 연결(r3 지연 바이트 3)- whole byte concatenation from X7Y11 to X7Y7 input 1 (r3 delay byte 3)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y4(명령어 2 바이트 0)- Slice X7Y4 configured to add input 0 to input 1 (instruction 2 byte 0)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y5(명령어 2 바이트 1)- Slice X7Y5 configured to add input 0 to input 1 (instruction 2 byte 1)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y6(명령어 2 바이트 2)- Slice X7Y6 configured to add input 0 to input 1 (instruction 2 byte 2)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y7(명령어 2 바이트 3)- Slice X7Y7 configured to add input 0 to input 1 (instruction 2 byte 3)

컴파일러는 명령어 2 결과의 상위 32 비트를 생성할 필요가 없는데, 그들이 0인 것으로 공지되어 있기 때문이다. 그것은 단지 그 사실을 메모해 둘 수 있고 그들이 사용될 때마다 제로를 사용할 수 있다.The compiler does not need to generate the upper 32 bits of the instruction 2 result, since they are known to be zero. It can just take note of that fact and use zeros whenever they are used.

이제, eBPF 단편의 컴파일의 제2 예가 설명될 것이다.Now, a second example of compilation of eBPF fragments will be described.

명령어 1: r1 & = 0xffCommand 1: r1 & = 0xff

명령어 2: r2 & = 0xffCommand 2: r2 & = 0xff

명령어 3: r1 < r2인 경우 L1로 이동함Command 3: Go to L1 if r1 < r2

명령어 4: r1 = r2Command 4: r1 = r2

라벨 L1.Label L1.

제1 명령어는 상수 0xff와의 r1의 비트 단위 AND를 수행하고 결과를 r1에 배치한다. 결과에서의 주어진 비트는, 대응하는 비트가 원래 r1에서 1로 설정되었고 대응하는 비트가 상수에서 1로 설정되는 경우, 1로 설정될 것이다. 그렇지 않으면, 그것은 제로로 설정될 것이다. 상수 0xff는 비트 0 내지 7이 설정되게 하고 비트 8 내지 63이 비워지게 하며, 따라서 결과는, r1의 비트 0 내지 7이 변경되지 않을 것이지만 비트 8 내지 63이 제로로 설정될 것이다는 것일 것이다. 비트 8 내지 63가 제로이고 그들을 생성할 필요가 없다는 것을 컴파일러가 이해하기 때문에, 이것은 컴파일러에 대한 것들을 단순화한다. 제2 명령어는 r2에 대해 동일한 것을 행한다.The first instruction performs a bitwise AND of r1 with the constant 0xff and places the result in r1. A given bit in the result will be set to 1 if the corresponding bit was originally set to 1 in r1 and the corresponding bit was set to 1 in the constant. Otherwise, it will be set to zero. The constant 0xff causes bits 0-7 to be set and bits 8-63 cleared, so the result will be that bits 0-7 of r1 will not be changed but bits 8-63 will be set to zero. This simplifies things for the compiler, because the compiler understands that bits 8 through 63 are zero and need not generate them. The second instruction does the same for r2.

명령어 3은 r1이 r2보다 더 작은지의 여부를 체크하고, 만약 그렇다면, 라벨 L1로 점프한다. 이것은 명령어 4를 스킵한다. 명령어 4는 r2로부터의 값을 r1로 단순히 복사한다. 명령어의 이 시퀀스는, r1 바이트 0 및 r2 바이트 0 중 최소 값을 찾고, 결과를 r1 바이트 0에 배치한다.Instruction 3 checks whether r1 is less than r2, and if so, jumps to label L1. This skips instruction 4. Instruction 4 simply copies the value from r2 to r1. This sequence of instructions finds the smallest of r1 byte 0 and r2 byte 0, and places the result in r1 byte 0.

컴파일러는, 조건부 점프를 선택 명령어(select instruction)로 변환하기 위해 "if 변환(if conversion)"으로 공지되어 있는 기술을 사용할 수도 있다:The compiler may use a technique known as “if conversion” to convert conditional jumps into select instructions:

명령어 1: r1 & = 0xffCommand 1: r1 & = 0xff

명령어 2: r2 & = 0xffCommand 2: r2 & = 0xff

명령어 5: c1 = (r1 < r2)Command 5: c1 = (r1 < r2)

명령어 6: r1 = c1 ? r1 : r2Command 6: r1 = c1 ? r1 : r2

명령어 5는 r1을 r2와 비교하고, r1이 r2보다 더 작은 경우 c1을 1로 설정하고 그렇지 않다면 c1을 제로로 설정한다. 명령어 6은, c1이 설정되는 경우 r1을 r1로 복사하고(이것은 아무런 효과도 가지지 않음) 그렇지 않으면 r2를 r1로 복사하는 선택 명령어이다. c1이 1과 동일하면, 그러면, 명령어 3은 명령어 4를 스킵할 것인데, 이것은, r1이 명령어 1로부터의 자신의 값을 유지할 것이다는 것을 의미한다. 이 경우, 선택 명령어는 r1도 또한 변경되지 않은 상태로 유지한다. c1이 제로와 동일하면, 그러면, 명령어 3은 명령어 4를 스킵하지 않을 것이고, 따라서 r2는 명령어 4에 의해 r1에 복사될 것이다. 다시, 선택 명령어는 r2를 r1로 복사할 것이고, 따라서, 새로운 시퀀스는 이전 시퀀스와 동일한 효과를 갖는다.Instruction 5 compares r1 to r2, sets c1 to 1 if r1 is less than r2, otherwise sets c1 to zero. Command 6 is an optional command that copies r1 to r1 if c1 is set (this has no effect), otherwise copies r2 to r1. If c1 is equal to 1, then instruction 3 will skip instruction 4, which means r1 will keep its value from instruction 1. In this case, the select instruction also leaves r1 unchanged. If c1 is equal to zero, then instruction 3 will not skip instruction 4, so r2 will be copied to r1 by instruction 4. Again, the select instruction will copy r2 to r1, so the new sequence has the same effect as the old sequence.

명령어 6은 유효한 eBPF 명령어가 아니다. 그러나, 컴파일러가 작동하고 있는 동안, 명령어는 LLVM-IR에서 표현된다. 명령어 6은 LLVM-IR에서 유효한 명령어일 것이다.Instruction 6 is not a valid eBPF instruction. However, while the compiler is running, the instructions are expressed in LLVM-IR. Instruction 6 would be a valid instruction in LLVM-IR.

이제 이들 명령어는 최소 단위로 할당될 필요가 있다. 입력 r1이 슬라이스 X0Y0 내지 X0Y7에서 이용 가능하고 r2가 슬라이스 X0Y8 내지 X0Y15에서 이용 가능하다는 것을 가정한다. 명령어 1 및 2는, 컴파일러로 하여금 r1 및 r2의 상위 7 바이트가 제로로 설정되었다는 것을 메모하게 한다.Now these instructions need to be allocated in the smallest unit. Assume that input r1 is available in slices X0Y0 through X0Y7 and r2 is available in slices X0Y8 through X0Y15. Instructions 1 and 2 cause the compiler to note that the upper 7 bytes of r1 and r2 are set to zero.

그 다음, 컴파일러는 슬라이스 X1Y0에서 명령어 5의 결과를 계산할 것을 선택할 수도 있다. 슬라이스 X0Y0의 출력으로부터 슬라이스 X1Y0의 입력 0까지 전체 바이트 연결이 요구되고 슬라이스 X0Y8의 출력으로부터 슬라이스 X1Y0의 입력 1까지 전체 바이트 연결이 요구된다. 두 값을 비교하는 방식은, 다른 값으로부터 하나의 값을 감산하고, 다음 번 위쪽 비트로부터 빌리는 것을 시도하는 것에 의해 계산이 오버플로되는지를 확인하는 것이다. 이 비교의 결과는, 그 다음, 슬라이스 X1Y1의 플립플롭 7에 저장된다.The compiler may then choose to compute the result of instruction 5 in slice X1Y0. A full byte concatenation is required from the output of slice X0Y0 to input 0 of slice X1Y0 and a full byte concatenation is required from the output of slice X0Y8 to input 1 of slice X1Y0. The way to compare two values is to see if the computation overflows by subtracting one value from the other and trying to borrow from the next higher bit. The result of this comparison is then stored in flip-flop 7 of slice X1Y1.

제1 예와 같이, r1 및 r2는 명령어 6에 올바른 시간에 값을 제공하기 위해 한 사이클만큼 지연될 필요가 있을 것이다. 컴파일러는 r1 및 r2에 대해 슬라이스 X1Y1 및 X1Y2를 각각 사용할 수도 있다.As in the first example, r1 and r2 will need to be delayed by one cycle to provide a value to instruction 6 at the correct time. The compiler may use slices X1Y1 and X1Y2 for r1 and r2, respectively.

선택 명령어는 세 가지 입력: c1, r1 및 r2를 필요로 한다. r1 및 r2는 1 바이트 폭이지만, 그러나 c1은 단지 1 비트 폭이다는 것을 유의한다. 컴파일이 선택 명령어 슬라이스 X2Y0의 결과를 계산한다고 가정한다. 선택은 슬라이스 X2Y0 내의 각각의 LUT가 1 비트를 핸들링하면서 비트 단위 기반으로 수행된다:The select command requires three inputs: c1, r1 and r2. Note that r1 and r2 are 1 byte wide, but c1 is only 1 bit wide. Assume that compilation computes the result of the selection instruction slice X2Y0. Selection is performed on a bit-by-bit basis, with each LUT in slice X2Y0 handling 1 bit:

c1이 설정되면, 그러면, 결과의 비트 0은 r1 비트 0 및 r2 비트 0임If c1 is set, then bit 0 of the result is r1 bit 0 and r2 bit 0

그렇지 않고other

c1이 설정되면 그러면 결과의 비트 1은 r1 비트 1 및 r2 비트 1임If c1 is set then bit 1 of the result is r1 bit 1 and r2 bit 1

그렇지 않고other

... 다음까지 계속 그런 식임...and so on

c1이 설정되면 그러면, 결과의 비트 7은 r1 비트 7 및 r2 비트 7임If c1 is set then bit 7 of the result is r1 bit 7 and r2 bit 7

그렇지 않고.other.

각각의 LUT는 r1로부터의 대응하는 비트 및 r2로부터의 대응하는 비트에 액세스할 필요가 있을 수도 있지만, 그러나 모든 LUT는 c1에 액세스할 필요가 있다. 이것은, c1이 슬라이스의 입력 0의 비트에 걸쳐 복제될 필요가 있다는 것을 의미한다. 따라서 명령어 6의 입력에 대한 연결은 다음의 것일 것이다:Each LUT may need to access the corresponding bit from r1 and the corresponding bit from r2, but all LUTs need to access c1. This means that c1 needs to be copied over the bits of input 0 of the slice. So the connection to the input of command 6 would be:

슬라이스 X1Y0의 출력의 비트 7을 슬라이스 X2Y0의 입력 0에 복제함.Duplicate bit 7 of the output of slice X1Y0 to input 0 of slice X2Y0.

슬라이스 X1Y1의 출력으로부터 슬라이스 X2Y0의 입력 1까지의 전체 바이트 연결.Concatenation of whole bytes from output of slice X1Y1 to input 1 of slice X2Y0.

슬라이스 X1Y2의 출력으로부터 슬라이스 X2Y0의 입력 2까지의 전체 바이트 연결.Concatenation of whole bytes from output of slice X1Y2 to input 2 of slice X2Y0.

해결될 필요가 있는 다른 문제는 시프트 명령어에 관련된다. 다음의 예를 고려한다:Another problem that needs to be addressed relates to shift instructions. Consider the following example:

5 비트만큼의 16 비트 좌측 시프트는 다음의 것을 할 필요가 있다:A 16 bit left shift by 5 bits needs to do the following:

출력 비트 0을 제로로 설정함Set output bit 0 to zero

출력 비트 1을 제로로 설정함set output bit 1 to zero

출력 비트 2를 제로로 설정함set output bit 2 to zero

출력 비트 3을 제로로 설정함set output bit 3 to zero

출력 비트 4를 제로로 설정함set output bit 4 to zero

입력 비트 0을 출력 비트 5에 복사함Copies input bit 0 to output bit 5

입력 비트 1을 출력 비트 6에 복사함Copies input bit 1 to output bit 6

......

입력 비트 10을 출력 비트 15에 복사함Copies input bit 10 to output bit 15

여기에서 입력 및 출력은 연결의 것이다는 것을 유의한다. 연결의 입력은 제1 슬라이스의 출력으로부터 유래한다. 연결의 출력은 제2 슬라이스의 입력으로 진행한다.Note that the input and output here are of the connection. The input of the concatenation comes from the output of the first slice. The output of the concatenation goes to the input of the second slice.

슬라이스 내에서 이러한 종류의 연결을 만드는 것이 가능하지 않을 수도 있지만, 그러나 오히려 슬라이스 사이의 상호 접속에 의해 가능할 수도 있다. 컴파일러는, 16 비트 입력 값이 동일한 열 내의 두 개의 인접한 슬라이스에 의해 생성되었다는 것을 가정할 수 있는데, 값이 그곳에서 생성된다는 것을 컴파일러가 확인할 수 있기 때문이다.It may not be possible to make this kind of connection within a slice, but rather may be possible by the interconnection between the slices. The compiler can assume that a 16-bit input value was generated by two adjacent slices in the same column, because the compiler can verify that the value is generated there.

한 예로서, 입력이 슬라이스 X0Y4 및 X0Y5에 의해 생성된다는 것 및 출력이 슬라이스 X1Y4 및 X1Y5로 진행한다는 것을 가정한다. 그 경우, 다음의 연결이 요구된다:As an example, assume that the input is produced by slices X0Y4 and X0Y5 and that the output goes to slices X1Y4 and X1Y5. In that case, the following connections are required:

슬라이스 X1Y4 비트 0은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 0 is known to be zero and is therefore not needed

슬라이스 X1Y4 비트 1은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 1 is known to be zero and is therefore not needed

슬라이스 X1Y4 비트 2는 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 2 is known to be zero and is therefore not needed

슬라이스 X1Y4 비트 3은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 3 is known to be zero and is therefore not needed

슬라이스 X1Y4 비트 4는 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 4 is known to be zero and is therefore not needed

슬라이스 X1Y4 비트 5는 슬라이스 X0Y4 비트 0으로부터 유래한다Slice X1Y4 bit 5 comes from slice X0Y4 bit 0

슬라이스 X1Y4 비트 6은 슬라이스 X0Y4 비트 1로부터 유래한다Slice X1Y4 bit 6 comes from slice X0Y4 bit 1.

슬라이스 X1Y4 비트 7은 슬라이스 X0Y4 비트 2로부터 유래한다Slice X1Y4 bit 7 comes from slice X0Y4 bit 2

슬라이스 X1Y5 비트 0은 슬라이스 X0Y4 비트 3으로부터 유래한다Slice X1Y5 bit 0 comes from slice X0Y4 bit 3

슬라이스 X1Y5 비트 1은 슬라이스 X0Y4 비트 4로부터 유래한다Slice X1Y5 bit 1 comes from slice X0Y4 bit 4

슬라이스 X1Y5 비트 2는 슬라이스 X0Y4 비트 5로부터 유래한다Slice X1Y5 bit 2 comes from slice X0Y4 bit 5

슬라이스 X1Y5 비트 3은 슬라이스 X0Y4 비트 6으로부터 유래한다Slice X1Y5 bit 3 comes from slice X0Y4 bit 6

슬라이스 X1Y5 비트 4는 슬라이스 X0Y4 비트 7로부터 유래한다Slice X1Y5 bit 4 comes from slice X0Y4 bit 7

슬라이스 X1Y5 비트 5는 슬라이스 X0Y5 비트 0으로부터 유래한다Slice X1Y5 bit 5 comes from slice X0Y5 bit 0

슬라이스 X1Y5 비트 6은 슬라이스 X0Y5 비트 1로부터 유래한다Slice X1Y5 bit 6 comes from slice X0Y5 bit 1.

슬라이스 X1Y5 비트 7은 슬라이스 X0Y5 비트 2로부터 유래한다Slice X1Y5 bit 7 comes from slice X0Y5 bit 2

슬라이스 X1Y5의 입력에 대한 8 개의 연결은 시프트된 연결 또는 시프트된 루트로서 간주될 수 있다. 슬라이스 X1Y4에 대해 동일한 구조가 사용될 수 있지만, 그러나 X1Y3 및 X1Y4로부터의 입력을 가지는데, 비트 5-7이 매치하고 슬라이스가 비트 0-4를 무시할 수 있고 따라서 그곳에서 어떤 입력이 제시되는지가 중요하지 않기 때문이다.The eight connections to the input of slice X1Y5 can be considered as shifted connections or shifted roots. The same structure could be used for slice X1Y4, but with inputs from X1Y3 and X1Y4, bits 5-7 match and slice can ignore bits 0-4 so it doesn't matter what input is presented there because it doesn't

1 비트와 7 비트 사이에서 임의의 양만큼 시프트할 수 있을 필요가 있을 수도 있다. 0 비트 또는 8 비트만큼 시프트하는 연결은, 그 경우에 각각의 비트가 다른 슬라이스의 대응하는 비트에 연결되기 때문에, 전체 바이트 연결과 바로 동일하다.It may be necessary to be able to shift between 1 bit and 7 bits by an arbitrary amount. A concatenation that shifts by 0 or 8 bits is exactly the same as a full byte concatenation, since in that case each bit is concatenated to the corresponding bit of the other slice.

가변 양만큼의 시프팅은, 시프트되고 있는 값의 폭에 따라, 두 개 또는 세 개의 스테이지에서 행해질 수도 있다. 스테이지는 다음과 같다:Shifting by a variable amount may be done in two or three stages, depending on the width of the value being shifted. The stages are as follows:

스테이지 1: 0, 1, 2 또는 3만큼 시프트함.Stage 1: Shift by 0, 1, 2 or 3.

스테이지 2: 0, 4, 8 또는 12만큼 시프트함.Stage 2: Shift by 0, 4, 8 or 12.

스테이지 3: 0, 16, 32 또는 48만큼 시프트함(32 비트 또는 64 비트 전용).Stage 3: Shift by 0, 16, 32 or 48 (32-bit or 64-bit only).

다른 예로서, 가변적인 양만큼의 바이트의 산술적 우측 시프트가 있다고 가정하면, 시프트될 값은 슬라이스 X3Y2에 의해 생성되고 시프트 양은 X3Y3에 의해 생성된다.As another example, assuming there is an arithmetic right shift of bytes by a variable amount, the value to be shifted is generated by the slice X3Y2 and the shift amount is generated by X3Y3.

산술적 우측 시프트는 "산술적 우측 시프트" 타입의 연결을 필요로 한다. 이 타입의 연결은, 하나의 슬라이스의 출력을 취하고 그들을 다른 슬라이스의 입력에 연결하지만, 그러나, 프로세스에서 일정한 양만큼 그들을 우측으로 시프트하여, 필요에 따라 부호 비트를 복제한다.Arithmetic right shift requires an "arithmetic right shift" type of connection. This type of concatenation takes the output of one slice and connects them to the input of another slice, but shifts them right by a certain amount in the process, duplicating the sign bits as needed.

예를 들면, "3만큼의 산술적 우측 시프트" 연결은 다음과 같을 것이다:For example, an "arithmetic right shift by 3" link would look like this:

출력 비트 0은 입력 비트 3으로부터 유래함Output bit 0 comes from input bit 3

출력 비트 1은 입력 비트 4로부터 유래함Output bit 1 is from input bit 4

출력 비트 2는 입력 비트 5로부터 유래함Output bit 2 comes from input bit 5

출력 비트 3은 입력 비트 6으로부터 유래함output bit 3 comes from input bit 6

출력 비트 4는 입력 비트 7로부터 유래함Output bit 4 comes from input bit 7

출력 비트 5는 입력 비트 7(부호 비트)로부터 유래함Output bit 5 comes from input bit 7 (sign bit)

출력 비트 6은 입력 비트 7(부호 비트)로부터 유래함Output bit 6 comes from input bit 7 (sign bit)

출력 비트 7은 입력 비트 7(부호 비트)로부터 유래함Output bit 7 comes from input bit 7 (sign bit)

스테이지 1은 슬라이스 X4Y2에서 계산될 수도 있는데, 이 경우, 그것은 다음의 연결을 필요로 할 것이다:Stage 1 may be computed on slice X4Y2, in which case it would require the following concatenation:

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 0까지의 전체 바이트Total bytes from slice X3Y2 to slice X4Y2 input 0

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 1까지의 1만큼의 산술적 우측 시프트arithmetic right shift by 1 from slice X3Y2 to slice X4Y2 input 1

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 2까지의 2만큼의 산술적 우측 시프트arithmetic right shift by 2 from slice X3Y2 to slice X4Y2 input 2

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 3까지의 3만큼의 산술적 우측 시프트arithmetic right shift by 3 from slice X3Y2 to slice X4Y2 input 3

슬라이스 X3Y3 비트 0을 슬라이스 X4Y2 입력 4에 복제함Duplicate slice X3Y3 bit 0 to slice X4Y2 input 4

슬라이스 X3Y3 비트 1을 슬라이스 X4Y2 입력 5에 복제함Duplicate slice X3Y3 bit 1 to slice X4Y2 input 5

그 다음, 슬라이스 X4Y2는 다음과 같이 입력 4 및 입력 5에 기초하여 처음 네 개의 입력 중 하나를 선택하도록 구성된다:Slice X4Y2 is then configured to select one of the first four inputs based on inputs 4 and 5 as follows:

입력 4가 0이고 입력 5가 0임: 입력 0을 선택함input 4 is 0 and input 5 is 0: select input 0

입력 4가 1이고 입력 5가 0임: 입력 1을 선택함input 4 is 1 and input 5 is 0: select input 1

입력 4가 0이고 입력 5가 1임: 입력 2를 선택함input 4 is 0 and input 5 is 1: select input 2

입력 4가 1이고 입력 5가 1임: 입력 3을 선택함input 4 is 1 and input 5 is 1: select input 3

시프트 양은 지연된 버전을 제공하기 위해 슬라이스 X3Y3으로부터 슬라이스 X4Y3으로 복사될 수도 있다.The shift amount may be copied from slice X3Y3 to slice X4Y3 to provide a delayed version.

스테이지 2는 슬라이스 X5Y2에서 계산될 수도 있는데, 이 경우, 그것은 다음의 연결을 필요로 할 것이다:Stage 2 may be computed on slice X5Y2, in which case it would require the following concatenation:

슬라이스 X4Y2로부터 슬라이스 X5Y2 입력 0까지의 전체 바이트Total bytes from slice X4Y2 to slice X5Y2 input 0

슬라이스 X4Y2로부터 슬라이스 X5Y2 입력 1까지의 4만큼의 산술적 우측 시프트Arithmetic right shift by 4 from slice X4Y2 to slice X5Y2 input 1

슬라이스 X4Y3 비트 2를 슬라이스 X5Y2 입력 2에 복제함Duplicate slice X4Y3 bit 2 to slice X5Y2 input 2

그 다음, 슬라이스 X5Y2는 다음과 같이 입력 2에 기초하여 입력 0 또는 입력 1을 선택하도록 구성될 것이다:Slice X5Y2 will then be configured to select either input 0 or input 1 based on input 2 as follows:

입력 2가 0임: 입력 0을 선택함input 2 is 0: input 0 is selected

입력 2가 1임: 입력 1을 선택함input 2 is 1: select input 1

슬라이스 X5Y2의 출력은 가변 산술적 우측 시프트 연산(variable arithmetic shift right operation)의 결과일 것이다.The output of slice X5Y2 will be the result of a variable arithmetic shift right operation.

주어진 최소 단위에 대한 비트 파일은 다음과 같을 수도 있다:The bit file for a given smallest unit may be:

최소 단위의 신원 정보(identity information)The smallest unit of identity information

주어진 최소 단위가 입력 및 그 입력에 대한 이용 가능한 루트를 수신할 수 있는 다른 최소 단위의 목록.A list of other minimum units for which a given minimum unit can receive an input and an available route to that input.

주어진 최소 단위가 출력 및 그 출력에 대한 이용 가능한 루트를 제공할 수 있는 다른 최소 단위의 목록A list of other minimum units for which a given minimum unit can give an output and an available route to that output.

FPGA가 규칙적인 구조체이기 때문에, 필요에 따라 최소 단위의 개개의 최소 단위에 대한 수정을 갖는 복수의 최소 단위에 대해 사용될 수 있는 공통 템플릿이 있을 수도 있다는 것이 인식되어야 한다.It should be appreciated that since FPGAs are regular structures, there may be a common template that can be used for multiple minimum units with modifications to the respective minimum units of the minimum unit as needed.

예로서, 슬라이스 X7Y1에 대한 비트 파일 디스크립션은 다음의 가능한 입력 및 출력을 명시할 수도 있다:As an example, the bit file description for slice X7Y1 may specify the following possible inputs and outputs:

루트 A 또는 루트 B를 통한 X6Y1로부터의 입력Input from X6Y1 via route A or route B

루트 C 또는 루트 D를 통한 X6Y5로부터의 입력Input from X6Y5 via route C or route D

루트 E 또는 루트 F를 통한 X7Y0로부터의 입력Input from X7Y0 via route E or route F

루트 G 또는 루트 H를 통한 X8Y1로의 출력Output to X8Y1 via route G or route H

루트 I 또는 루트 J를 통한 X7Y2로의 출력Output to X7Y2 via route I or route J

루트 K 또는 루트 L을 통한 X7Y5로의 출력.Output to X7Y5 via route K or route L.

컴파일러는, 다음의 것의 앞서 설명된 제1 eBPF 예에 대한 슬라이스 X7Y1의 입력 및 출력에 대한 부분적인 비트 파일을 제공하기 위해 이 비트 파일 디스크립션을 사용할 것이다.The compiler will use this bit file description to provide a partial bit file for the input and output of slice X7Y1 for the previously described first eBPF example of the following.

루트 A를 통한 X6Y1로부터의 입력Input from X6Y1 via route A

루트 C를 통한 X6Y5로부터의 입력Input from X6Y5 via route C

예로서, 슬라이스 XnYm에 대한 비트 파일 디스크립션은 다음의 가능한 입력 및 출력을 명시할 수도 있다:As an example, the bit file description for slice XnYm may specify the following possible inputs and outputs:

루트 A 또는 루트 B를 통한 Xn-1Ym으로부터의 입력Input from Xn-1Ym via route A or route B

루트 C 또는 루트 D를 통한 Xn-1Ym+4로부터의 입력Input from Xn-1Ym+4 via route C or route D

루트 E 또는 루트 F를 통한 XnYm-1로부터의 입력Input from XnYm-1 via route E or route F

루트 G 또는 루트 H를 통한 Xn+1Ym으로의 출력Output to Xn+1Ym via route G or route H

루트 I 또는 루트 J를 통한 XnYm+1로의 출력Output to XnYm+1 via route I or route J

루트 K 또는 루트 L을 통한 XnYm+4로의 출력.Output to XnYm+4 via route K or route L.

이 비트 파일 디스크립션은, 앞서 설명되는 바와 같이, 컴파일러가 사용하기에 이용 가능하지 않은 하나 이상의 루트를 제거하도록 수정될 수도 있다. 이것은, 루트가 다른 최소 단위에 의해 사용되거나 또는 파티션을 통한 라우팅을 위해 사용되기 때문일 수도 있다.This bit file description may be modified to remove one or more roots that are not available for use by the compiler, as described above. This may be because the route is used by other minimal units or for routing through partitions.

컴파일러는, 하나 이상의 컴퓨터 프로세서에 의해 실행될 수도 있는 컴퓨터 실행 가능 명령어를 포함하는 컴퓨터 프로그램에 의해 구현될 수도 있다는 것이 인식되어야 한다. 컴파일러는 하나 이상의 메모리와 연계하여 동작하는 적어도 하나의 프로세서와 같은 하드웨어 상에서 실행될 수도 있다.It should be appreciated that a compiler may be implemented by a computer program comprising computer executable instructions that may be executed by one or more computer processors. A compiler may run on hardware, such as at least one processor operating in conjunction with one or more memories.

상기에서 예시적인 실시형태를 설명하지만, 본 발명의 범위를 벗어나지 않으면서 개시된 솔루션에 대해 이루어질 수도 있는 여러 가지 변형 및 수정이 있다는 것을 유의한다.While exemplary embodiments have been described above, it is noted that there are many variations and modifications that may be made to the disclosed solutions without departing from the scope of the present invention.

따라서, 실시형태는 첨부된 청구범위의 범위 내에서 변할 수도 있다. 일반적으로, 몇몇 실시형태는 하드웨어 또는 특수 목적 회로, 소프트웨어, 로직 또는 이들의 임의의 조합으로 구현될 수도 있다. 예를 들면, 몇몇 양태는 하드웨어로 구현될 수도 있고, 한편, 다른 양태는, 비록 실시형태가 컨트롤러, 마이크로프로세서 또는 다른 컴퓨팅 디바이스로 제한되지는 않지만, 이들에 의해 실행될 수도 있는 펌웨어 또는 소프트웨어로 구현될 수도 있다.Accordingly, embodiments may vary within the scope of the appended claims. In general, some embodiments may be implemented in hardware or special purpose circuitry, software, logic, or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although embodiments are not limited to controllers, microprocessors, or other computing devices. may be

실시형태는, 메모리에 저장되며 수반된 엔티티의 적어도 하나의 데이터 프로세서에 의해 실행 가능한 컴퓨터 소프트웨어에 의해, 또는 하드웨어에 의해, 또는 소프트웨어 및 하드웨어의 조합에 의해 구현될 수도 있다.Embodiments may be implemented by computer software stored in a memory and executable by at least one data processor of an accompanying entity, or by hardware, or by a combination of software and hardware.

소프트웨어는, 메모리 칩과 같은 물리적 매체, 또는 프로세서 내에서 구현되는 메모리 블록, 하드 디스크 또는 플로피 디스크와 같은 자기 매체, 및 예를 들면, DVD 및 그 데이터 변이체인 CD와 같은 광학 매체 상에 저장될 수도 있다.The software may be stored on a physical medium such as a memory chip, or a memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as, for example, a DVD and its data variants, a CD. have.

메모리는 로컬 기술 환경에 적절한 임의의 타입의 것일 수도 있고, 반도체 기반의 메모리 디바이스, 자기 메모리 디바이스 및 시스템, 광학 메모리 디바이스 및 시스템, 고정식 메모리 및 이동식 메모리와 같은 임의의 적절한 데이터 저장 기술을 사용하여 구현될 수도 있다.The memory may be of any type suitable for the local technology environment, and implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. could be

데이터 프로세서는, 로컬 기술 환경에 적절한 임의의 타입의 것일 수도 있고, 비제한적인 예로서, 범용 컴퓨터, 특수 목적 컴퓨터, 마이크로프로세서, 디지털 신호 프로세서(digital signal processor; DSP), 주문형 집적 회로(ASIC), 게이트 레벨 회로 및 멀티 코어 프로세서 아키텍쳐에 기초한 프로세서 중 하나 이상을 포함할 수도 있다.A data processor may be of any type suitable for the local technical environment, and includes, by way of non-limiting examples, a general purpose computer, a special purpose computer, a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC). , gate level circuits, and processors based on multi-core processor architectures.

첨부의 도면 및 첨부된 청구범위와 연계하여 판독될 때, 전술한 설명을 고려하여 관련 기술 분야에서 숙련된 자에 다양한 수정 및 적응이 명백하게 될 수도 있다. 그러나, 본 교시의 모든 그러한 그리고 유사한 수정은, 첨부된 청구범위에서 정의되는 바와 같은 범위 내에 여전히 속할 것이다.Various modifications and adaptations may become apparent to those skilled in the relevant art in light of the foregoing description when read in conjunction with the accompanying drawings and appended claims. However, all such and similar modifications of the present teachings will still fall within the scope as defined in the appended claims.

Claims

A network interface device for interfacing a host device to a network, comprising:
a first interface configured to receive a plurality of data packets;
a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step, wherein at least some of the plurality of processing units are associated with a different predefined type of operation Configurable hardware modules
includes,
The hardware module is configured to: provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function on the one or more of the plurality of data packets, the A network interface device, configurable to interconnect at least a portion.

The network interface device of claim 1 , wherein two or more of the at least some of the plurality of processing units are configured to perform an associated at least one predefined operation in parallel.

3. The method of claim 1 or claim 2, wherein two or more of the at least some of the plurality of processing units include:
to perform an operation of an associated predefined type within a predefined length of time defined by the clock signal; And
in response to the end of the predefined length of time, transmit the result of each at least one operation to a next processing unit;
configured, a network interface device.

4. The method of any one of claims 1 to 3, wherein each of the plurality of processing units comprises an application specific integrated circuit configured to perform at least one operation associated with the respective processing unit. , a network interface device.

5. The digital circuit according to any one of claims 1 to 4, wherein at least one of the plurality of processing units comprises a memory for storing digital circuitry and states related to processing performed by the digital circuitry, the digital circuitry comprising: , in communication with the memory to perform the predefined type of operation associated with the respective processing unit.

6. The hardware device of any one of claims 1 to 5, wherein at least two of the plurality of processing units comprise an accessible memory, the memory configured to store a state associated with a first data packet; and during performance of the first function by a module, at least two of the plurality of processing units are configured to access and modify the state.

7. The network of claim 6, wherein a first processing unit of the at least some of the plurality of processing units is configured to stall during access of the value of the state by a second processing unit of the plurality of processing units. interface device.

8. The method of any preceding claim, wherein one or more of the plurality of processing units are individually configurable to perform an operation specific to each pipeline based on an associated predefined type of operation. network interface device.

9. The method of any one of claims 1 to 8, wherein the hardware module is configured to receive an instruction, and in response to the instruction:
interconnecting at least a portion of the plurality of processing units to provide a data processing pipeline for processing one or more of the plurality of data packets;
causing one or more of the plurality of processing units to perform an associated predefined type of operation on one or more data packets;
adding one or more of the plurality of processing units to a data processing pipeline; and
removing one or more of the plurality of processing units from a data processing pipeline;
A network interface device configured to do at least one of:

10. The method according to any one of claims 1 to 9, wherein the predefined type of operation comprises:
loading at least one value of the first data packet from memory;
storing at least one value of the data packet in a memory; and
performing a lookup on a lookup table to determine the action to be performed on the data packet
A network interface device comprising at least one of:

11. The method of any one of claims 1 to 10, wherein at least one of the at least some of the plurality of processing units is configured to perform at least one result of an associated at least one predefined operation in the first processing pipeline. then forward to a processing unit, wherein the subsequent processing unit is configured to perform a next predefined action depending on the at least one result.

The network interface device according to any one of the preceding claims, wherein each of the different predefined types of operation is defined by a different template.

13. The method according to any one of claims 1 to 12, wherein the predefined type of operation comprises:
accessing data packets;
accessing a lookup table stored in a memory of the hardware module;
performing logical operations on data loaded from data packets; and
performing logical operations on data loaded from the lookup table;
A network interface device comprising at least one of:

14. The method of any one of claims 1 to 13, wherein the hardware module comprises routing hardware, and wherein the hardware module connects between the plurality of processing units in a particular order defined by the first data processing pipeline. configurable to interconnect at least a portion of the plurality of processing units to provide the first data processing pipeline by configuring the routing hardware to route data packets.

15. The second function according to any one of claims 1 to 14, wherein the hardware module provides a second data processing pipeline for processing one or more of the plurality of data packets to provide a second function different from the first function. A network interface device configurable to interconnect at least some of the plurality of processing units to perform

16. The method of any one of claims 1 to 15, wherein the hardware module, after interconnecting at least some of the plurality of processing units to provide the first data processing pipeline, configures a second data processing pipeline. A network interface device configurable to interconnect at least a portion of the plurality of processing units to provide

The network interface device of claim 1 , comprising additional circuitry separate from the hardware module and configured to perform the first function for one or more of the plurality of data packets. .

18. The method of claim 17, wherein the additional circuitry comprises:
field programmable gate arrays; and
Multiple central processing units
A network interface device comprising at least one of:

19. The device according to claim 17 or 18, wherein the network interface device comprises at least one controller, and wherein the additional circuitry is configured to process the first data packet during a compilation process for the first function to be performed in the hardware module. and the at least one controller is configured to, in response to completion of the compilation process, control the hardware module to start performing the first function on a data packet.

20. The method of claim 19, wherein the at least one controller is further configured to, in response to determining that the compilation process for the first function to be performed in the hardware module is complete, to stop performing the first function on a data packet. A network interface device configured to control additional circuitry.

19. The network interface device according to claim 17 or 18, wherein the network interface device comprises at least one controller, and the hardware module is configured to: and, in response to the determining, to determine that the compilation process for the first function to be performed in the additional circuitry is complete, and in response to the determination, the at least one controller is configured to: and control the additional circuitry to start performing

22. The method of claim 21, wherein the at least one controller is further configured to, in response to the determining that the compilation process for the first function to be performed in the additional circuitry is complete, stop performing the first function on a data packet. and a network interface device configured to control the hardware module.

23. A device as claimed in any preceding claim, comprising at least one controller configured to perform a compilation process to provide the first function to be performed in the hardware module.

24. A data processing system comprising the network interface device of any one of claims 1 to 23 and a host device, wherein the data processing system performs a compilation process to provide the first function to be performed in the hardware module. A data processing system comprising at least one controller configured to:

25. The method of claim 24, wherein the at least one controller comprises:
the network interface device; and
the host device
provided by one or more of the data processing systems.

26. The method of claim 24 or 25, wherein the compilation process is performed in response to a determination by the at least one controller that a computer program representing the first function is safe for execution in a kernel mode of the host device. being a data processing system.

27. The method of claim 24, 25, or 26, wherein the at least one controller executes at least one operation from a plurality of operations represented by a sequence of computer code instructions in the first data processing pipeline. and perform the compilation process by assigning, in a particular order, to each of the at least some of the plurality of processing units to perform, the plurality of operations performing the first function for the one or more of the plurality of data packets. provided by a data processing system.

28. The method of any one of claims 24-27, wherein the at least one controller comprises:
send, prior to completion of the compilation process, a first instruction for causing additional circuitry of the network interface device to perform the first function on a data packet; And
subsequent to the completion of the compilation process, send a second instruction for causing the hardware module to start performing the first function on the data packet;
configured, a data processing system.

A method for implementation in a network interface device, comprising:
receiving, at the first interface, a plurality of data packets; and
at least a portion of a plurality of processing units of a hardware module to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function on the one or more of the plurality of data packets configuring the hardware module to interconnect
includes,
Each processing unit is associated with a predefined type of operation executable in a single step,
and at least some of the plurality of processing units are associated with different predefined types of operation.

A non-transitory computer-readable medium comprising program instructions for causing a network interface device to perform a method, the method comprising:
receiving, at the first interface, a plurality of data packets; and
at least a portion of a plurality of processing units of a hardware module to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function on the one or more of the plurality of data packets configuring the hardware module to interconnect
includes,
Each processing unit is associated with a predefined type of operation executable in a single step,
and at least some of the plurality of processing units are associated with different predefined types of operation.