KR101625562B1

KR101625562B1 - Block-based signal processing

Info

Publication number: KR101625562B1
Application number: KR1020130164508A
Authority: KR
Inventors: 라카 싱
Original assignee: 아나로그 디바이시즈 인코포레이티드
Priority date: 2012-12-26
Filing date: 2013-12-26
Publication date: 2016-06-13
Also published as: US20140176571A1; DE102013114508B4; US9251554B2; DE102013114508A1; KR20140083917A

Abstract

데이터 처리 애플리케이션들을 위한 신호 흐름들은, 상기 흐름들 내의 각 처리 노드를, 그것이 그것의 입력 버퍼에서 충분한 양의 입력 데이터 포함할 경우, 인에이블시키도록 구현될 수 있다. 다양한 실시예에서, 그러한 신호 흐름들은, 나중에 신호 흐름을 실행하는데 적절한 코드를 자동 생성하는 GUI 툴 내에서 그래픽적으로 정의될 수 있다.Signal flows for data processing applications may be implemented to enable each processing node in the flows, if it contains a sufficient amount of input data in its input buffer. In various embodiments, such signal flows can be defined graphically within a GUI tool that automatically generates code that is suitable for later performing signal flow.

Description

Block-based signal processing {BLOCK-BASED SIGNAL PROCESSING}

본 발명은 일반적으로 이미지를 위한 신호 흐름 구조(signal-flow architecture) 및 다른 데이터 처리 애플리케이션들에 관한 것이고, 일부 실시예들에서, 그래픽 표현에 기초하여 신호 흐름을 구현하는 프로그램 코드를 생성하기 위한 툴(tool)에 관한 것이다.The present invention generally relates to signal-flow architectures and other data processing applications for images, and in some embodiments, a tool for generating program code for implementing signal flow based on a graphical representation to a tool.

일반적으로 이미지 처리 애플리케이션들은 다수의 기능 처리 블록들 - 이하 "노드"로 지칭됨 - 을 포함하며, 이들 기능 처리 블록들은 순차적으로 실행되어, 원시(raw) 이미지 데이터를 사용자에게 제시되는 최종 이미지들로 변환하고 및/또는 이미지 데이터를 분석하여 그들이 캡처하는 객체들 또는 조건들에 관한 정보를 추출한다. 이러한 애플리케이션들에서, 노드들을 연결하는데 필요한 신호 흐름을 통제하는(즉, 다수의 기능 블록들을 위한 입력, 출력, 임시 데이터 저장 및 데이터 전달을 관리하는) 알고리즘은 통상 애플리케이션의 코어(core)를 형성하고, 특히, DSP(digital signal processing) 상에서 또는 하드웨어에서 구현되는 경우에, 종종 처리 전력의 상당 부분을 소모한다. 도 1은, 예를 들어, 이미지들에서 사람, 자동차, 또는 다른 객체들을 검출하기 위해 사용될 수 있는, 전경 "블롭" 검출(foreground "blob" detection)을 위한 알고리즘의 예시적 신호 흐름을 도시한다. 제1 노드(100)('ABS DIFF')는 한 이미지와 배경 기준 이미지 사이의 이미지 값들(예를 들어, 그레이스케일 값들)의 픽셀 단위 차이를 계산한다. 이와 같이 계산된 차이는, 고정 또는 적응 임계값에 대해 임계화되어(thresholded) 제2 노드(102)('이진 임계화(BINARY THRESHOLD)')에서 이진 이미지가 생성된다. 이진 이미지는 '이로전(EROSION)' 및 '딜레이션(DILATION)' 노드들(104, 106)에서 추가 후처리를 하여, 노이즈 픽셀들을 축소시키고(erode) 이진 이미지 출력을 향상시킨다. 최종 노드(108)('연결 레이블링(CONNECTED LABELLING)')는 이진 이미지에서 연결된 픽셀들을 식별하고 이들을 "블롭들"으로 표기한다.Generally, image processing applications include a plurality of functional processing blocks, hereinafter referred to as "nodes ", which are sequentially executed to provide raw image data to the final images presented to the user Transforms and / or analyzes the image data to extract information about the objects or conditions they capture. In such applications, an algorithm that controls the signal flow necessary to connect the nodes (i.e., manages input, output, temporary data storage and data delivery for multiple functional blocks) typically forms the core of the application , Especially when implemented on digital signal processing (DSP) or in hardware, often consume a significant portion of the processing power. Figure 1 illustrates an exemplary signal flow of an algorithm for foreground " blob "detection, which may be used, for example, to detect people, cars, or other objects in images. The first node 100 ('ABS DIFF') computes the pixel-by-pixel difference of the image values (e.g., gray scale values) between one image and the background reference image. This computed difference is thresholded for a fixed or adaptive threshold and a binary image is generated at the second node 102 ('BINARY THRESHOLD'). The binary image is further processed at the 'EROSION' and 'DILATION' nodes 104 and 106 to reduce noise pixels and improve binary image output. The final node 108 ('CONNECTED LABELING') identifies the connected pixels in the binary image and marks them as "blobs".

신호 및 데이터 흐름을 구현하기 위해 적합한 프로그램 코드를 개발하는 것은 (저레벨 DSP 언어로 또는 C 또는 C++ 등의 고레벨 언어로 작성되던지 간에) 일반적으로 알고리즘 또는 애플리케이션 프로그래머에게는 벅찬 작업이고, 메모리 할당, 직접 메모리 액세스, 제어 등에 관련한 수많은 단계의 설계 최적화를 포함한다. 따라서, 이러한 작업을 자동화하거나 또는 반자동화하는 것이 바람직하다. 그래픽 사용자 인터페이스(graphical user interface; GUI)에서 애플리케이션 개발자에 의해 작성된 신호 흐름의 도식적 표현으로부터 코드를 자동 생성하는 이용가능한 프로그램 툴(tool)들이 있다. 이러한 툴들은 통산 샘플 기반 신호 흐름 구조 또는 프레임 기반 신호 흐름 구조를 지원하고, 여기서, 처리 노드들은 개별 데이터 샘플 또는 전체 프레임 상에서 각각 동작한다. 샘플 기반 툴들은, 예를 들어, 오디오 신호 처리 및 모터 제어 등에 광범위하게 사용된다. 그러나, 이들 샘플 기반 툴은, 예를 들어, 단일 이미지가 이미 다수의 데이터 샘플들(즉, 픽셀들)을 포함하고 있기 때문에 통상 더 빠른 샘플 처리 속도를 필요로 하고, 또한 (개별 샘플들 상에서보다는) 샘플들의 집합들 상에서 동작하는 처리 단계들을 종종 포함하는 다수의 이미지 처리 애플리케이션들에 이 툴들은 적합하지 않을 수 있다. 예를 들면, 이미지 평활화(image-smoothing) 단계는 각각의 픽셀을 몇몇 픽셀들의 블록에 대한 평균으로 대체하는 단계를 포함하고, 일차원 푸리에 변환은 본질적으로 출력 이미지의 각각의 픽셀에 대해 입력 이미지의 전체 행(row)을 필요로 한다. 다른 툴들은 전체 이미지 프레임들 상에서 동작한다. 그러나, 다수의 환경에서 완전한 이미지 프레임들을 처리할 필요는 없다. 또한, (범용 컴퓨터 상에서 보다는) 제한된 로컬 메모리를 가진 DSP 또는 다른 특수목적 프로세서들 상에서 구현된 실제 이미지 처리 애플리케이션에서, 프레임 기반 구조들은 외부(오프-칩) 메모리로의 빈번한 액세스를 필요로 함으로써 시스템이 비효율적으로 되게 한다.Developing the appropriate program code to implement the signal and data flow (whether in a low-level DSP language or in a high-level language such as C or C ++) is usually a daunting task for an algorithm or application programmer, Access control, and so on. It is therefore desirable to automate or semi-automate such tasks. There are available program tools that automatically generate code from a graphical representation of the signal flow created by an application developer in a graphical user interface (GUI). These tools support a cumulative sample-based signal flow structure or a frame-based signal flow structure, where processing nodes operate on separate data samples or entire frames, respectively. The sample-based tools are widely used, for example, for audio signal processing and motor control. However, these sample-based tools typically require faster sample throughput speeds, for example because a single image already contains multiple data samples (i.e., pixels) ), These tools may not be suitable for a large number of image processing applications, often including processing steps that operate on sets of samples. For example, an image-smoothing step may include replacing each pixel with an average for a block of several pixels, wherein the one-dimensional Fourier transform essentially consists of applying the entire Requires a row. Other tools operate on the entire image frames. However, in many circumstances it is not necessary to process complete image frames. Also, in real image processing applications implemented on a DSP or other special purpose processors with limited local memory (rather than on a general purpose computer), frame based architectures require frequent access to external (off-chip) memory, Inefficient.

따라서, 메모리 및 대역폭 제한에 따른 DSP 또는 다른 하드웨어 상에서의 효율적인 이미지 처리를 용이하게 하는 신호 흐름 구조들뿐만 아니라 그러한 신호 흐름들의 구현에 애플리케이션 개발자들을 돕는 툴들이 필요하다.Thus, there is a need for tools that help application developers to implement such signal flows as well as signal flow structures that facilitate efficient image processing on a DSP or other hardware depending on memory and bandwidth constraints.

본 발명은 블록 기반 데이터(특히, 이미지) 처리를 용이하게 하고 샘플 기반 신호 흐름 및 프레임 기반 신호 흐름의 다수의 결점을 극복하는 신호 흐름 구조들에 관한 것이다. 블록 기반 신호 처리는 통상 개별 처리 단계들과 연관된 메모리 요건들 및 외부 메모리 액세스의 빈도를 경감시키도록 동작하여, 결과적으로 프레임 기반 처리에 비해 전체 효율이 증가하게 된다. 본문에 사용된 데이터 "블록"은 "프레임"으로 지칭되는 더 완전한 데이터 세트의 서브셋을 형성하는 데이터 샘플들의 집합이다. 예를 들면, 이미지 프레임은 특정 시점에 카메라 또는 다른 광학 검출기에 의해 얻어진 시각적 데이터 샘플들의 전체를 포함하고, 일반적으로, (비록 1차원 또는 3차원 시각적 데이터 세트 또한 "이미지 프레임"의 범위 내에 있지만) 이미지 픽셀들의 2차원 매트릭스 또는 어레이를 형성한다. 이미지 블록은 예를 들면, 이미지 프레임의 하나 이상의 행 또는 열, 또는 특정 픽셀 주위의 다수의 행 및/또는 열의 일부를 확장한 서브-어레이로 구성될 수 있다. 비록 이미지 처리가 블록 기반 신호 처리의 주요 애플리케이션들 중 하나이고, 예시 목적으로 본 명세서에 걸쳐 널리 사용되지만, 본 발명은 이미지 데이터에 국한되지 않고 블록 기반 처리에 적용가능한 (예를 들어, 이산화된 물리 분야의 측정 또는 시뮬레이션, 또는 멀티 채널 오디오 신호를 포함하는) 임의의 유형의 데이터에 일반적으로 적용가능하다는 것이 이해되어야 한다.The present invention relates to signal flow structures that facilitate block-based data (particularly, image) processing and overcome many of the drawbacks of sample-based signal flow and frame-based signal flow. Block-based signal processing typically operates to reduce the frequency of memory requirements and external memory access associated with individual processing steps, resulting in an overall increase in efficiency over frame-based processing. The data "block" used in the text is a collection of data samples forming a subset of the more complete data set referred to as the "frame ". For example, an image frame includes all of the visual data samples obtained by a camera or other optical detector at a particular point in time, and generally (although a one-dimensional or three-dimensional visual data set is also within the scope of an & Dimensional matrix or array of image pixels. An image block may consist of, for example, one or more rows or columns of image frames, or a sub-array that extends a portion of a plurality of rows and / or columns around a particular pixel. Although image processing is one of the key applications of block-based signal processing and is used extensively throughout the specification for purposes of illustration, the present invention is not limited to image data, but may be applied to block-based processing (e.g., It is to be understood that the present invention is generally applicable to any type of data (including measurement or simulation in the field, or multi-channel audio signals).

본문에 따른 신호 흐름은 통상 복수의 노드들을 포함하고, 각각의 노드는 기능 유닛, 즉, 입력 데이터 블록 상의 특정 처리 기능의 수행에 대응한다. 노드들은 통상 하드웨어(즉, 회로), 소프트웨어(즉, 실행가능한 명령어들의 집합), 또는 이들의 조합으로 구현될 수 있다. 소프트웨어 구현시, 일부 실시예에서 각각의 노드는 개별 기능 블록 또는 프로세서에 의해 실행되는 명령어들의 집합에 대응하고; 일부 실시예에서 하나 이상의 노드들은 다수의 명령어 블록들에 의해 각각 구현되며; 일부 실시예에서 둘 이상의 노드들은 명령어들의 단일 블록에 의해 함께 구현된다. 유사하게, 하드웨어 구현시, 각각의 노드는 단일 전용 회로에 대응할 수 있고; 다수의 노드들은 다중 기능 회로에 의해 구현될 수 있으며; 및/또는 다수의 개별 회로들은 집합적으로 단일 노드를 구현할 수 있다. 문맥에 따라, "노드"는 이하 본문에서 처리 단계 또는 기능 자체를 지칭하거나, 또는 하드웨어 및/또는 소프트웨어에서의 구현을 지칭할 수 있다.The signal flow according to the text typically comprises a plurality of nodes, each node corresponding to the performance of a particular processing function on the functional unit, i. Nodes can typically be implemented in hardware (i.e., circuitry), software (i.e., a set of executable instructions), or a combination thereof. In a software implementation, in some embodiments each node corresponds to a set of instructions executed by a separate functional block or processor; In some embodiments, one or more nodes are each implemented by a plurality of instruction blocks; In some embodiments, two or more nodes are implemented together by a single block of instructions. Similarly, in a hardware implementation, each node may correspond to a single dedicated circuit; Multiple nodes may be implemented by multiple functional circuits; And / or multiple individual circuits may collectively implement a single node. Depending on the context, a "node" may refer to a processing step or function itself in the following text, or may refer to an implementation in hardware and / or software.

블록 기반 신호 흐름에서, 한 노드에서 한 유닛(유닛은 개별 샘플 또는 데이터 블록일 수 있음)의 출력을 생성하기 위해 필요한 블록의 크기는 노드들 간에 상이할 수 있다. 예를 들면, 이미지 처리 애플리케이션들에서, 한 이미지 처리 단계는 한 행의 입력 상에 동작하여 한 행의 출력을 생성할 수 있는 반면, 다른 단계는 각각의 행의 출력을 위해 3행의 입력이 필요할 수 있다. 노드들은 통상 필요한 양의 데이터를 저장하기 위한 관련 입력 버퍼들을 구비한다. 일부 실시예들에서, 각각의 노드는 한 유닛의 출력을 생성하기에 충분한 데이터가 자신의 입력 버퍼에서 사용가능하게 되자마자 트리거되고; 그에 따라, 전체 로컬 메모리 요건들 및 대기 시간들이 최소화된다. 단일 프로세서 또는 회로가 다수의 노드들을 실행하는 실시예들에서, 자신의 입력 버퍼에 충분한 데이터를 갖는 각각의 노드는 실행을 위해 인에이블되고, 프로세서의 또는 회로의 계산 용량이 허용되자마자 실행된다. 본문에 사용된 버퍼는 데이터 저장을 위한 임의의 수단을 의미하고, 임의의 종류의 저장 매체, 디바이스, 또는 예를 들면, RAM(random-access memory), 하드웨어 레지스터, 플립 플롭, 래치 또는 이들의 임의의 조합에서의 하나 이상의 파티션들 또는 섹션들을 포함하는 구조에서 구현될 수 있다. 버퍼는 단일의 인접 유닛일 필요는 없고, 상이한 메모리 위치들에서 다수의 부분을 포함할 수 있다. 또한, 버퍼는 데이터를 직접적으로, 또는 포인터들을 통해 간접적으로 별개의 메모리 위치들에 저장할 수 있다.In a block-based signal flow, the size of the block needed to generate the output of one unit (which may be a separate sample or block of data) at one node may be different between the nodes. For example, in image processing applications, one image processing step may operate on one row of inputs to produce one row of output, while the other step requires three rows of input for each row of output . The nodes typically have associated input buffers for storing the required amount of data. In some embodiments, each node is triggered as soon as sufficient data is available in its input buffer to produce an output of one unit; As a result, the total local memory requirements and latency are minimized. In embodiments where a single processor or circuit is executing multiple nodes, each node having sufficient data in its input buffer is enabled for execution and executed as soon as the computing capacity of the processor or circuit is allowed. The buffer used in the text refers to any means for storing data and may be any type of storage medium, device or device, such as a random-access memory (RAM), a hardware register, a flip-flop, Lt; / RTI > may be implemented in a structure that includes one or more partitions or sections in a combination of < RTI ID = 0.0 > The buffer need not be a single contiguous unit but may contain multiple portions at different memory locations. In addition, the buffer may store data directly or in pointers to different memory locations indirectly.

블록 기반 데이터 처리 방법들 및 이들을 구현하기 위한 하드웨어에 부가하여, 본 발명은 다수의 실시예들에서, 애플리케이션 개발자들이 신호 흐름을 그래픽으로 정의하고 그래픽 신호 흐름 표현에 기초하여 적합한 프로그램 코드를 자동생성하게 할 수 있는 GUI 기반 툴들을 제시한다. 이러한 툴들은 통상 기능적 데이터 처리 블록들(즉, 노드들)의 라이브러리, 이 라이브러리로부터 노드들을 통합하는 신호 흐름을 그리기 위한 편집기, 이 신호 흐름으로부터 코드를 생성하기 위한 컴파일러, 및 선택적으로 이 코드를 실행하고 테스트하기 위한 시뮬레이터를 포함한다.In addition to block-based data processing methods and hardware for implementing them, the present invention, in many embodiments, allows application developers to define signal flows graphically and automatically generate appropriate program code based on the graphical signal flow representation We present GUI-based tools that can be used. These tools typically include a library of functional data processing blocks (i. E., Nodes), an editor for drawing a signal flow incorporating nodes from the library, a compiler for generating code from the signal flow, and optionally, And a simulator for testing.

제1 양태에서, 본 발명은 일련의 처리 노드들에 의해 데이터 프레임들(예를 들어 이미지 프레임들 등)을 처리하기 위한 방법을 제시하고, 각각의 노드는 입력 데이터 블록 - 각각의 블록은 복수의 샘플들을 포함하고 프레임의 일부(예를 들어 다수의 행의 이미지 프레임)를 구성함 - 을 처리하여 한 유닛의 출력(예를 들어 한 행의 이미지 프레임)을 생성하도록 구성된다. 이 방법은, 노드들과 연관된 입력 버퍼들에서 데이터를 수신하는 단계, 및 노드 자신의 연관된 입력 버퍼가 한 유닛의 출력을 생성할 정도의 충분한 데이터를 저장하였을 때 각각의 노드의 실행을 야기하는 단계를 포함한다. 노드의 실행은 입력 버퍼가 충분한 데이터를 갖자마자 노드를 트리거링함으로써 직접적으로 야기될 수 있고; 이것은 예를 들어 노드가 전용 회로, 프로세서, 또는 처리를 시작하기 전에 단지 트리거 신호를 대기하는 다른 계산 유닛을 갖는 실시예들에서의 경우일 수 있다. 대안으로, 노드의 실행은 노드의 상태를 변경하여 처리가 인에이블되거나 승인되게 함으로써 간접적으로 야기될 수 있다. 그러한 경우에, 일련의 노드들을 구현하는 프로세서가 여유 용량을 갖자마자 노드는 처리될 것이다.In a first aspect, the present invention provides a method for processing data frames (e.g., image frames, etc.) by a series of processing nodes, each node having an input data block, (E.g., a plurality of rows of image frames) to generate an output of one unit (e.g., a row of image frames). The method includes receiving data in input buffers associated with the nodes and causing execution of each node when the node's associated input buffer has stored sufficient data to produce an output of one unit . The execution of the node can be directly caused by triggering the node as soon as the input buffer has enough data; This may be the case, for example, in embodiments where the node has dedicated circuitry, a processor, or another computing unit that simply waits for a trigger signal before starting processing. Alternatively, the execution of the node may be caused indirectly by changing the state of the node so that the process is enabled or authorized. In such a case, the node will be processed as soon as the processor implementing the series of nodes has free capacity.

제2 양태에서, 본 발명은 일련의 처리 노드들을 구현하는 데이터 처리 시스템에서 신호 흐름을 제어하는 방법을 제시하고, 각각의 노드는 노드-고유 정수배(node-specific integer-multiple)의 한 유닛의 입력 데이터를 포함하는 입력 데이터 블록으로부터 한 유닛의 출력 데이터(예를 들어, 한 행의 데이터)를 생성하도록 구성된다. 이 방법은 노들들에 연관된 입력 버퍼들에서 데이터를 수신함으로써 일련의 노드들을 통해 신호 흐름을 제어하는 단계, 및 노드와 연관된 입력 버퍼가 각각의 노드-고유 정수배의 한 유닛의 데이터를 저장하는 경우에 각각의 노드의 실행을 야기하는 단계(즉, 노드를 트리거링 하거나 인에이블링하는 단계)를 포함한다.In a second aspect, the present invention provides a method of controlling signal flow in a data processing system that implements a series of processing nodes, each node having an input of a unit of node-specific integer-multiple (E.g., one row of data) from an input data block containing the data. The method includes the steps of controlling signal flow through a series of nodes by receiving data in input buffers associated with the nodes, and in the case where the input buffer associated with the node stores data of a unit of a respective node- (I. E., Triggering or enabling the node) to cause the execution of each node.

각각의 처리 노드에서, 데이터는 선행 노드 및/또는 DMA 소스 노드로부터 수신될 수 있다. 일부 실시예들에서, 시리즈내의 제1 처리 노드는 DMA 소스 노드로부터의 데이터를 판독하고 시리즈내의 마지막 처리 노드는 DMA 싱크 노드로 데이터를 기입한다. 특정 실시예들에서, 각각의 입력 버퍼에 대해 카운터가 유지되고; 이 방법은 선행 처리 노드 또는 DMA 소스 노드로부터 수신된 입력 데이터의 각각의 유닛에 대해 카운터를 증분시키는 단계를 포함한다. 일부 실시예들에서, 처리 노드들 중 하나에 연관된 버퍼에 할당된 메모리는 노드 자신의 다운스트림 노드에 연관된 버퍼를 위해 재사용된다. 처리 노드들은 병렬로 또는 순차적으로(in parallel or sequentially) 실행될 수 있다.At each processing node, data may be received from a preceding node and / or a DMA source node. In some embodiments, the first processing node in the series reads data from the DMA source node and the last processing node in the series writes data to the DMA sink node. In certain embodiments, a counter is maintained for each input buffer; The method includes incrementing a counter for each unit of input data received from a preprocessing node or a DMA source node. In some embodiments, the memory allocated to the buffer associated with one of the processing nodes is reused for the buffer associated with the downstream node of the node itself. The processing nodes may be executed in parallel or sequentially.

제3 양태에서, 본 발명은 일련의 처리 노드들에 의해 데이터 프레임을 처리하기 위한 시스템을 제시한다. 각각의 노드는 입력 데이터의 노드 고유 블록을 처리하여 한 유닛의 출력 데이터를 생성하도록 구성되고, 각각의 블록은 복수의 데이터 샘플을 포함하고 데이터 프레임의 부분이다. 시스템은 일련의 처리 노드들을 구현하는 하나 이상의 처리 블록들, 노드들과 연관된 복수의 입력 버퍼들, 및 노드들과 연관된 입력 버퍼들이 입력 데이터의 노드 고유 블록을 저장하는 경우에 각각의 처리 블록에 의해 각각의 노드의 실행을 야기하는 논리 변환 메커니즘을 포함한다. 일부 실시예들에서, 시스템은 복수의 처리 블록을 포함하고, 각각의 처리 블록은 처리 노드들 중 하나에 대응한다.In a third aspect, the present invention provides a system for processing data frames by a series of processing nodes. Each node is configured to process a node specific block of input data to produce output data of a unit, each block comprising a plurality of data samples and being part of a data frame. The system includes one or more processing blocks that implement a series of processing nodes, a plurality of input buffers associated with the nodes, and a plurality of input buffers associated with the nodes, And a logic translation mechanism that causes execution of each node. In some embodiments, the system includes a plurality of processing blocks, each processing block corresponding to one of the processing nodes.

처리 블록(들)은 메모리에 저장된 프로세서 실행가능 명령어들로 구현될 수 있다. 대안적으로, 처리 블록(들)은 회로에서 구현될 수 있다. 일부 실시예들에서, 일련의 처리 노드들을 순차적으로 실행하기 위한 단일 회로가 제시되고, 일부 실시예들에서, (처리 노드들의 실행은 논리 변환 메커니즘에 의해 이미 야기되었거나 또는 인에이블링된) 처리 노드들을 병렬로 실행하기 위한 복수의 회로들이 제시된다. 이 문맥에서 본문에 사용된 "회로"는 프로세서 코어, 코어의 자기 완비적 부분, 연산 논리 유닛, 또는 일반적인 임의의 다른 기능 처리 유닛일 수 있다. 스위칭 메커니즘은 각각의 노드에 대해 그 노드의 노드 고유 블록에 연관된 다수의 입력 유닛들을 저장하는 복수의 레지스터 및 그 노드와 연관된 버퍼에 현재 저장된 다수의 입력 유닛들을 위한 카운터를 포함할 수 있다. 레지스터들은 하드웨어 레지스터들이거나, 또는 처리 블록(들)에 연관된 로컬 메모리에 저장될 수 있다. 일부 실시예들에서, 시스템은 DSP(digital signal processor)이다.The processing block (s) may be implemented with processor executable instructions stored in memory. Alternatively, the processing block (s) may be implemented in circuitry. In some embodiments, a single circuit for sequentially executing a series of processing nodes is presented, and in some embodiments, the processing node (which has already been caused or enabled by a logic translation mechanism) A plurality of circuits are shown for executing them in parallel. The term "circuit" used in this context may be a processor core, a self-contained portion of a core, an arithmetic logic unit, or any other general purpose processing unit. The switching mechanism may include a plurality of registers for storing a plurality of input units associated with a node specific block of the node for each node and a counter for a plurality of input units currently stored in a buffer associated with the node. The registers may be hardware registers, or may be stored in local memory associated with the processing block (s). In some embodiments, the system is a digital signal processor (DSP).

제4 양태에서, 본 발명은 그래픽 사용자 인터페이스에서 정의된 신호 흐름의 그래픽 표현으로부터 블록-기반 신호 처리를 위한 프로그램 코드를 생성하기 위한 시스템을 제공한다. 시스템은 프로세서, 프로세서에 의해 실행가능한 명령들을 저장한 메모리, 및 선택적으로 그래픽 사용자 인터페이스를 표시하기 위한 디스플레이 장치(예컨대, 컴퓨터 스크린)를 포함한다. 메모리에 저장된 명령들은, (ⅰ) 신호-처리 노드들 - 각각의 노드는 노드-고유 크기를 갖는 입력 데이터의 블록으로부터 출력 데이터의 한 유닛을 생성하도록 구성됨 - 을 구현하는 기능들의 라이브러리, (ⅱ) 사용자가, 복수의 노드들 및 이들간의 연결을 포함하는 신호 흐름을 그래픽적으로 정의하도록 하고, 각각의 노드들과 라이브러리로부터의 기능들 중 하나를 연관시키도록 하는 에디터를 구현하는 명령들, (ⅲ) 그래픽적으로 정의된 신호 흐름 및 연관된 기능들로부터 프로그램 코드를 생성하기 위한 컴파일러를 구현하는 명령들 - 코드는, 그 노드와 연관된 버퍼가 각각의 노드 고유 크기의 입력 데이터의 블록을 저장하는 경우 각각의 노드의 실행을 유발함 - 을 포함한다. 에디터는 사용자가, 예컨대, DMA 소스, DMA 싱크들, 및/또는 DMA 스케줄링 경로들을 포함하는 신호 흐름의 DMA(direct memory access)를 그래픽적으로 정의하도록 더 허용하고, 컴파일러는 그래픽적으로 정의된 DMA를 구현하는 프로그램 코드를 더 생성할 수 있다. 사용자가 스케줄링 경로를 정의하지 않으면, 컴파일러는 자동적으로 DMA 스케줄링 경로를 생성할 수 있다. 컴파일러는 DMA 경로들에서 데이터 병행성(parallelism)을 분해(resolving)하고, 소스 노드 버퍼 및 싱크 노드 버퍼에서 핑퐁(ping-pong) 버퍼들을 할당하는 프로그램 코드 및, 메모리 계층의 상이한 레벨에서 노드들과 연관된 버퍼들을 구현하는 코드를 더 생성할 수 있다. 에디터는 사용자가 파라미터들을 DMA 파라미터 윈도우로 입력하도록 할 수 있고, 컴파일러는 파라미터들에 기초하여 DMA 레지스터 엔트리들을 생성할 수 있다.In a fourth aspect, the present invention provides a system for generating program code for block-based signal processing from a graphical representation of a signal flow defined in a graphical user interface. The system includes a processor, a memory storing instructions executable by the processor, and optionally a display device (e.g., a computer screen) for displaying a graphical user interface. (Ii) a library of functions implementing the signal-processing nodes, each node configured to generate a unit of output data from a block of input data having a node-unique size; (ii) Instructions for causing a user to graphically define a signal flow comprising a plurality of nodes and a connection therebetween, and implementing an editor to associate each of the nodes with one of the functions from the library; ) Commands that implement a compiler for generating program code from a graphically defined signal flow and associated functions-code is used when the buffer associated with the node stores a block of input data of each node-specific size To cause the node to execute. The editor further allows the user to graphically define direct memory access (DMA) of the signal flow including, for example, DMA sources, DMA sinks, and / or DMA scheduling paths, and the compiler allows the graphically defined DMA Lt; RTI ID = 0.0 > a < / RTI > If the user does not define a scheduling path, the compiler can automatically generate the DMA scheduling path. The compiler includes program code for resolving data parallelism in DMA paths, allocating ping-pong buffers in the source node buffer and the sink node buffer, and program code for allocating the ping- And may further generate code that implements the associated buffers. The editor allows the user to enter parameters into the DMA parameter window, and the compiler can generate DMA register entries based on the parameters.

전술한 설명은 특히, 도면과 함께, 다음에 개시되는 본 발명의 상세한 설명으로부터 더 잘 이해될 수 있다.
도 1은 종래의 예시적인 이미지-처리 애플리케이션에 대한 신호 흐름도.
도 2a는 행 기반 신호 처리의 일 구현에서 각각의 노드와 연관된 버퍼 요건들을 도시하는 개념적인 신호 흐름도.
도 2b는 본 발명의 일 실시예에 따른 노드들간의 스위치들을 포함하는 행 기반 신호 처리 구현에서 노드들과 연관된 감소된 버퍼 요건들을 도시하는 개념적 신호 흐름도.
도 3은 다양한 실시예에 따른 행 기반 신호 처리를 위한 시스템을 도시하는 블록도.
도 4a 및 4b는 다양한 실시예에 따른 노드들간의 스위치들을 포함하는 행 기반 신호 처리 흐름 및 스위치들의 데이터 흐름 및 동작을 제어하는 레지스터들을 각각 도시하는 도면.
도 5는 일 실시예에 따른 신호 흐름을 그래픽적으로 정의하고 이에 기초하여 코드를 자동-생성하기 위한 GUI 기반 툴을 도시하는 블록도.
도 6a-6h는 다양한 실시예들에 따른 GUI 기반 툴의 사용자-인터페이스 구성요소들을 도시하는 도면.
도 7은 다양한 실시예에 따른 도 5의 GUI 기반 툴을 구현하기 위한 컴퓨터 시스템을 도시하는 블록도.The foregoing description, in particular, together with the drawings, will be better understood from the following detailed description of the invention.
1 is a signal flow diagram for a conventional example image-processing application.
2A is a conceptual signal flow diagram illustrating buffer requirements associated with each node in one implementation of row based signal processing.
Figure 2B is a conceptual signal flow diagram illustrating reduced buffer requirements associated with nodes in a row-based signal processing implementation including switches between nodes in accordance with an embodiment of the present invention.
3 is a block diagram illustrating a system for row based signal processing in accordance with various embodiments.
Figures 4A and 4B are diagrams illustrating a row-based signal processing flow including switches between nodes in accordance with various embodiments and registers, respectively, that control the data flow and operation of switches.
5 is a block diagram illustrating a GUI-based tool for graphically defining a signal flow according to one embodiment and automatically generating code based thereon.
Figures 6A-6H illustrate user-interface components of a GUI-based tool in accordance with various embodiments.
Figure 7 is a block diagram illustrating a computer system for implementing the GUI-based tool of Figure 5 in accordance with various embodiments.

다양한 실시예에 따른 데이터 처리 알고리즘들은 데이터 샘플들 또는 전체 프레임들 보다는 데이터의 블록들상에서 동작한다. 이러한 블록들은, 예컨대, 2차원 데이터 어레이의 하나 이상의 행들 또는 3차원 어레이의 하나 이상의 슬라이스(slice)들로 구성될 수 있다. 행 기반 데이터 처리는, 예컨대, 컨볼루션(convolution)과 같은 2차원 필터링, 또는 이로젼(erosion) 및 딜레이션(dilation)과 같은 2차원 형태 필터링을 수행하기 위해, 다수의 이미지 처리 애플리케이션들에서 적합하거나 또는 심지어 필수적이다. 도 2a 및 2b는 4개의 노드들(200, 202, 204, 206) 또는 처리 단계들을 포함하는 예시적인 행 기반 알고리즘에 대한 신호 흐름을 도시하고, 각 단계는 특정수의 입력 행들(노드의 좌측에 표시됨)을 출력 데이터의 한 행으로 변환한다. 연속적인 출력 행들에 대해, 복수의 입력 행들의 대응하는 블록들은 입력 데이터의 각각의 프레임이 출력 데이터의 동일한 크기의 프레임이 되도록 오버랩(overlap)한다. 예컨대, 출력 프레임의 행 n은 입력 프레임의 행들, n-1, n 및 n+1로부터 생성될 수 있고, 이에 따라 2개의 인접한 출력 행들에 대한 3개의 행 입력 블록들은 2개의 행들만큼 오버랩한다. (입력 프레임은 예컨대, 0들로 패드되어(padded), 프레임의 주변 근방에서 요구되는 입력 데이터를 제공한다. 대안적으로, 입력 프레임 자신이 패드되지 않으면, 노드들은 버퍼를 채우기 위해 패드된 0들을 생성한다.)Data processing algorithms according to various embodiments operate on blocks of data rather than data samples or entire frames. Such blocks may consist of, for example, one or more rows of a two-dimensional data array or one or more slices of a three-dimensional array. Row-based data processing is suitable for many image processing applications, for example, to perform two-dimensional filtering, such as convolution, or two-dimensional filtering, such as erosion and dilation. Or even essential. 2A and 2B illustrate signal flow for an exemplary row-based algorithm including four nodes 200, 202, 204, 206 or processing steps, each step comprising a specific number of input rows (to the left of the node ) Into a single line of output data. For successive output rows, the corresponding blocks of the plurality of input rows overlap so that each frame of the input data is a frame of the same size as the output data. For example, row n of the output frame may be generated from rows of the input frame, n-1, n and n + 1, such that the three row input blocks for two adjacent output rows overlap by two rows. (The input frame is, for example, padded with 0s to provide the required input data in the vicinity of the frame.) Alternatively, if the input frame itself is not padded, Create.)

도시된 신호 흐름에서, 제1 노드(200), "노드 0"는 DMA를 통해 DMA 소스 노드(208)로부터 입력 데이터를 수신하고, 최종 노드(206), "노드 3"은 DMA를 통해 DMA 싱크 노드(210)에 출력 데이터를 기입한다. 각각의 노드는 바로 이전의 노드로부터의 출력을 임시 저장하기 위해 자신의 입력에 연관 버퍼를 갖는다. 일 실시예에서, 도 2a에 도시된 바와 같이, 버퍼들은 노드 3에서 출력의 한 행을 생성하기 위해 충분한 데이터를 저장하도록 크기가 정해진다. 노드 3은 한 행의 출력 데이터를 생성하기 위해 한 행의 입력을 필요로 하고, 이에 따라, 자신의 버퍼(212)는 한 행의 데이터를 저장하도록 구성된다. 노드 2는 한 행의 출력에 대해 5개의 입력 행들을 필요로 하고, 이에 따라, 5개의 행들의 데이터를 저장하는 버퍼(214)를 필요로 한다. 노드 1은 한 행의 출력을 생성하기 위해 3개의 행의 입력을 필요로 한다. 그러나, 노드 2에 의해 요구되는 5개의 행들의 입력을 산출하기 위해, 노드 1은 전체 7개의 행의 입력 데이터를 필요로 하고, 행들(1-3)은 제1 행의 출력을 산출하고, 행들(2-4)은 제2 행의 출력을 산출하고, 행들(3-5)은 제3 행의 출력을 산출하고, 행들(4-6)은 제4 행의 출력을 산출하고, 행들(5-7)은 제5 행의 출력을 산출한다. 이에 따라, 노드 1에서의 입력 버퍼(216)는 7개의 행들을 저장하도록 구성된다. 유사하게, 노드 0은 하나의 행의 출력을 생성하기 위해 5개의 행들의 입력을 필요로 하지만, 행 1에 의해 요구되는 7개의 행들의 데이터를 제공하기 위해, 전체 11개의 행들의 입력(제1 행의 출력에 대해 행들(1-5), 제2 행의 출력에 대해 행들(2-6)등) 저장하는 버퍼(218)를 필요로 한다. 일반적으로, n개의 행들의 입력으로부터 하나의 행의 출력을 산출하고, 전체 m 개의 행들의 입력을 필요로 하는 노드에 선행하는 노드는 n+m-1 행들을 저장하는 버퍼를 필요로 한다. 따라서, 입력 행들의 필요한 수는 소스 노드(208)까지 캐스케이드(cascade)된다. 많은 수의 노드들을 포함하는 신호 흐름들에 대해, 결과적인 버퍼 요건들은 로컬 메모리의 용량을 초과할 수 있고, 외부 메모리 액세스들을 필요하게 하고, 블록 기반 처리가 제거되도록 의도된다.In the illustrated signal flow, the first node 200, "node 0 " receives input data from the DMA source node 208 via DMA, and the last node 206," node 3 " And writes the output data to the node 210. [ Each node has an associated buffer at its input to temporarily store the output from the immediately previous node. In one embodiment, buffers are sized to store enough data to generate one row of output at node 3, as shown in FIG. 2A. Node 3 requires one row of input to generate a row of output data, and therefore its buffer 212 is configured to store a row of data. Node 2 requires five input lines for one row of output, and thus requires buffer 214 to store the data for five rows. Node 1 requires three lines of input to produce one line of output. However, in order to calculate the input of five rows required by node 2, node 1 requires a total of seven rows of input data, rows 1-3 produce the output of the first row, (2-4) produces the output of the second row, the rows (3-5) produce the output of the third row, the rows (4-6) produce the output of the fourth row, -7) produces the output of the fifth row. Thus, the input buffer 216 at node 1 is configured to store seven rows. Similarly, node 0 requires input of five rows to produce an output of one row, but to provide data for the seven rows required by row 1, input of a total of eleven rows (1-5) for the output of the row, (2-6) for the output of the second row, etc.). In general, a node that computes the output of one row from the input of n rows and that requires input of a total of m rows requires a buffer to store n + m-1 rows. Thus, the required number of input rows is cascaded to the source node 208. [ For signal flows involving a large number of nodes, the resulting buffer requirements may exceed the capacity of local memory, require external memory accesses, and block-based processing is intended to be eliminated.

도 2b는 메모리 문제를 개선한 수정된 신호 흐름을 도시한다. 여기서, 데이터 흐름은, 노드의 입력 버퍼가 하나의 행의 출력을 생성하기에 충분한 양의 데이터를 포함하자마자 각각의 노드를 트리거링하는 제어 신호 흐름에 의해 제어되고, - 개념적으로, 이것은 스위치들(220)로 도시되고, 이들 각각은 바로 앞의 노드의 출력과 다음 노드의 입력을 연결하고, 다음 노드의 입력 버퍼에서 충분한 데이터가 이용가능한 경우 닫힌다. 따라서, 예컨대, 노드 1이 3개의 노드들의 입력을 수신한 경우, 노드 2의 입력 버퍼에 저장하는, 하나의 행의 출력을 생성하기 위해 이들 행들을 처리한다. 노드 1의 입력 버퍼는 다음에 중복 기입(overwritten)될 수 있다. 구체적으로, 입력 버퍼내의 제2 및 제3 행은 하나씩 시프트업(shift up)될 수 있고(제1 행을 중복 기입), 입력의 다음 행은 (노드 0으로부터) 수신될 수 있고, 노드 1의 입력 버퍼의 제3 행 내에 저장될 수 있다. 이러한 방식에서, 각각의 노드에 필수적인 버퍼 크기는 노드가 하나의 행의 출력을 생성하는데 필요한 입력 행들의 수로 감소되고, 도 2b의 예에서, 노드들(0, 1, 2, 및 3)은 5개의 행들, 3개의 행들, 5개의 행들, 및 하나의 행 각각에 대해 버퍼들(222, 224, 226, 228)을 가질 수 있다. 당업자가 용이하게 알 수 있는 바와 같이, 도 2a의 신호 흐름이 겪게 되는 캐스케이딩 효과(cascading effect)는 제거되고, 전체 버퍼 요건은 대략 노드들의 수에 비례하거나, 또는 버퍼들이 (이하 설명되는 바와 같이) 노드들 간에 공유되면 더 적어진다. 다양한 노드들에 대한 버퍼들은, 일반적으로, 버퍼 크기 및 요구되는 액세스 빈도에 기초하여, 메모리 계층의 상이한 레벨에서 구현될 수 있다. 예컨대, (데이터의 하나의 행들에 대한) 가장 작은 버퍼들은 L1 메모리로 구현될 수 있고, 더 큰 버퍼들은 L2 또는 L3 메모리 또는 캐시 메모리(더 큰 레이턴시들과 연관됨)로 구현될 수 있다.Figure 2B shows a modified signal flow that improves the memory problem. Wherein the data flow is controlled by a control signal flow that triggers each node as soon as the input buffer of the node contains a sufficient amount of data to produce an output of one row, , Each of which connects the output of the immediately preceding node to the input of the next node and closes if sufficient data is available in the input buffer of the next node. Thus, for example, if node 1 receives the input of three nodes, it processes these rows to produce an output of one row, which is stored in the input buffer of node 2. The input buffer of node 1 may then be overwritten. Specifically, the second and third rows in the input buffer can be shifted up one by one (overwriting the first row) and the next row of inputs can be received (from node 0) And stored in the third row of the input buffer. In this way, the buffer size necessary for each node is reduced to the number of input rows the node needs to produce an output of one row, and in the example of Figure 2b, the nodes (0, 1, 2, and 3) 224, 226, 228 for each row, three rows, five rows, and one row, respectively. As can be readily appreciated by those skilled in the art, the cascading effect experienced by the signal flow of FIG. 2a is eliminated, the overall buffer requirement is approximately proportional to the number of nodes, or the buffers As well as shared between nodes). The buffers for the various nodes may generally be implemented at different levels of the memory hierarchy, based on the buffer size and the required access frequency. For example, the smallest buffers (for one row of data) may be implemented in L1 memory, and the larger buffers may be implemented in L2 or L3 memory or cache memory (associated with larger latencies).

도 2b에 도시된 신호 흐름은 단일의 프로세서를 이용하여 실행될 수 있다. 이 경우, 프로세서는 그 입력 버퍼들에서 활용가능한 충분한 데이터를 갖는 노드들 중에서 DMA 싱크 노드(210)쪽으로 가장 먼 노드를 항상 실행하도록 제어된다. 예컨대, 입력에서 새로운 데이터 프레임으로 시작할 때, 노드 0이 3번 실행되어 노드 1에 의해 요구되는 최소 데이터량을 발생시킨다. 이어서, 노드 0과 노드 1 사이의 스위치가 닫히고, 노드 1이 실행되어 하나의 행의 출력을 발생시킨다. 그 후, 노드 1의 입력 버퍼에서의 제2 및 제3 행들이 1행씩 시프트업되고, 노드 0이 다시 구동하여 노드 1의 입력 버퍼를 위한 데이터의 제3 행을 발생시킨다. 다음으로, 노드 1이 다시 실행되어 노드 2의 입력 버퍼를 위한 데이터의 제2 행을 발생시킨다. 이러한 프로세스는 입력의 5개의 행들이 노드 2에서 활용가능할 때까지 3번 더 반복되고, 이때 노드 2가 실행되고, 노드 3이 후속한다. 이어서, 프로세서는 노드 0으로 복귀하고, 전체 루프는 전체 입력 데이터 프레임이 처리될 때까지 반복한다. 각각의 루프 동안에, 메모리는 노드들 중에 재사용될 수 있다, 즉 버퍼 공간이 공유될 수 있다. 예컨대, 노드 2의 출력 행은 노드 1의 입력 버퍼에 이전에 할당된 메모리에 저장될 수 있다. 그 버퍼들이 3개의 이미 처리된 행들을 여전히 저장하고 있다면, 제1 행은 노드 2의 출력에 의해 중복 기입될 수 있으며 더 이상 필요하지 않을 것이다. 한편, 노드 1의 입력 버퍼의 이전 행들(2 및 3)이 행들(1 및 2)에 이미 복사되었다면, 그 버퍼의 제3 행은 노드 2의 출력을 저장하는데 이용될 수 있다.The signal flow shown in FIG. 2B may be executed using a single processor. In this case, the processor is controlled to always run the most distant node towards the DMA sink node 210 among the nodes with sufficient data available in its input buffers. For example, when starting with a new data frame at the input, node 0 is executed three times to generate the minimum amount of data required by node 1. Then, the switch between node 0 and node 1 is closed, and node 1 is executed to generate the output of one row. Thereafter, the second and third rows in the input buffer of node 1 are shifted up by one row, and node 0 is again driven to generate the third row of data for the input buffer of node 1. Next, node 1 is run again to generate a second row of data for the input buffer of node 2. This process is repeated three more times until five rows of input are available at node 2, where node 2 is executed and node 3 is followed. The processor then returns to node 0 and the entire loop repeats until the entire input data frame is processed. During each loop, the memory can be reused among the nodes, i. E. The buffer space can be shared. For example, the output row of node 2 may be stored in memory previously assigned to node 1's input buffer. If the buffers are still storing three already processed rows, the first row may be overwritten by the output of node 2 and will not be needed anymore. On the other hand, if the previous rows 2 and 3 of the input buffer of node 1 have already been copied to rows 1 and 2, then the third row of the buffer can be used to store the output of node 2.

몇몇 실시예들에서, 신호 흐름의 다양한 노드들이 복수의 프로세서들, 또는 복수의 스레드들을 동시에 구동하는 단일의 프로세서에 의해 병렬로(예를 들면, 시간 공유 인터리빙된 방식으로) 실행된다. 이러한 시나리오에서, 버퍼들 중의 메모리 재사용은 가능하지 않지만, 전체 실행 시간은 그 입력에서의 스위치가 닫혀있는 한, 즉 충분한 데이터가 그 입력 버퍼에서 활용될 수 있는 한 각각의 노드가 반복적으로 실행되므로 대폭 줄어들 수 있다. 일반적으로, 2개의 노드들 간의 버퍼가 채워졌고(예컨대, 노드 1의 입력 버퍼가 입력의 3개의 행들을 수신하였고) 또한 노드들 사이의 스위치가 닫혔다면, 버퍼는 한쪽 끝으로부터 채워지고 동일한 레이트로 다른 쪽 끝으로부터 유출되며, 스위치는 전체 프레임이 처리될 때까지 닫혀 있는다. 다시 말해, 초기에 버퍼를 채우는 것에 후속하여, 노드들에 걸친 데이터 이동이 파이프라인 방식(pipelined manner)으로 일어난다.In some embodiments, various nodes of the signal flow are executed in parallel (e.g., in a time-shared interleaved manner) by a plurality of processors, or a single processor that simultaneously runs a plurality of threads. In such a scenario, memory reuse among the buffers is not possible, but the total execution time is significantly slower as long as the switch at its input is closed, i.e., as long as enough data can be utilized in its input buffer, Can be reduced. In general, if the buffer between two nodes is filled (for example, the input buffer of node 1 receives three rows of input) and the switch between nodes is closed, the buffer is filled from one end and is at the same rate Out from the other end, and the switch is closed until the entire frame is processed. In other words, following the initial buffer filling, the data movement across the nodes occurs in a pipelined manner.

도 2b에 도시된 신호 흐름은 여러 방식으로 수정될 수 있다. 상이한 데이터 입력 요건들을 가지고 또한 도시된 노드들보다 더 많은(또는 더 적은) 처리 노드들을 포함하는 것에 추가하여, 신호 흐름은 추가의 DMA 소스 및/또는 싱크 노드들을 포함할 수 있고 및/또는 추가의 DMA 소스 및/또는 싱크 노드들에 연결될 수 있다. 일반적으로, 신호 흐름에서의 각각의 노드는 메모리로부터 데이터를 판독하거나(즉, 소스 노드임) 또는 메모리에 데이터를 기입할 수 있다(즉, 싱크 노드임). 또한, 신호 흐름에서의 노드들은 반드시 선형 체인(linear chain)을 형성할 필요가 없다. 몇몇 애플리케이션들에서, 신호 흐름은 2 이상의 병렬 노드들, 즉 입력 데이터를 독립적으로 처리하는 노드들을 포함하며, 그 집합 출력들은 신호 흐름에서의 추가 다운스트림을 위해 다른 노드에 의해 요구될 수 있다. 물론, 신호 흐름은 원리상 임의의 복잡한 방식으로 분기 및 재결합할 수 있다. 또한, 몇몇 실시예들에서는, 어떤 노드들이 옵션일 수 있다. 예컨대, 전형적인 이미지 처리 애플리케이션에서, 해상도를 줄이고, 그 결과 이미지 프레임들의 크기를 줄이기 위한 노드가 예를 들어 애플리케이션의 사용자에 의해 지정된 세팅에 따라 실행되거나 또는 실행되지 않을 수 있다. 이러한 옵션적인 노드를 구현하기 위해, 신호 흐름은 예를 들어 그 노드의 입력 및 출력에서의 "스위치들", 및 사용자 선택(또는 몇몇 다른 조건, 예컨대 이전 처리 단계로부터 유도된 메트릭의 수치값)에 기반하여 스위치 세팅들을 결정하는 추가의 제어 신호 라인들을 이용하는 노드 우회로(by-pass)를 포함할 수 있다.The signal flow shown in Figure 2B can be modified in many ways. In addition to including more (or fewer) processing nodes than the illustrated nodes with different data entry requirements, the signal flow may include additional DMA sources and / or sink nodes and / DMA sources and / or sink nodes. In general, each node in the signal flow can read data from the memory (i.e., is the source node) or write data to the memory (i.e., is the sink node). Also, the nodes in the signal flow need not necessarily form a linear chain. In some applications, the signal flow includes two or more parallel nodes, i.e. nodes that process the input data independently, and the aggregate outputs may be required by another node for further downstream in the signal flow. Of course, the signal flow can in principle be branched and rejoined in any complex way. Also, in some embodiments, certain nodes may be optional. For example, in a typical image processing application, a node for reducing resolution and thus reducing the size of image frames may or may not be executed, for example, according to the settings specified by the user of the application. In order to implement this optional node, the signal flow may include, for example, "switches" at the input and output of the node, and user selection (or some other condition, e.g., a numerical value of the metric derived from a previous processing step) And a node by-pass using additional control signal lines to determine switch settings based on the control signal lines.

행 기반 처리의 예로서 상술한 바와 같이, 노드들의 동작을 트리거링하기 위한 "스위치들"의 이용은 데이터 블록들이 갖는 특정 형태 및 크기에 상관없이 임의의 종류의 블록 기반 처리에 일반적으로 적용될 수 있다. 핵심은 데이터의 충분한 양이 그 입력 버퍼에 수신되어 하나의 유닛의 출력을 발생시킬 때 신호 흐름 내의 각각의 노드가 실행하도록 트리거링된다는 점에 있으며, 유닛의 크기는 특정한 애플리케이션에 의존한다. 예를 들어, 이슈가 되는 픽셀에 중심을 갖는 픽셀들의 3 × 3 블록의 평균값으로 각각의 픽셀의 값을 대체하는 이미지 평활화(image-smoothing) 단계(즉, 노드)를 고려해 본다. 이 노드는 하나의 픽셀만의 출력 유닛 크기를 가지고, 그 입력 버퍼에서 예를 들어 이미지 프레임의 좌표 (n, m)에 중심을 갖는 블록에 대응하는 3 × 3 블록을 가질 때 실행하며, 픽셀의 좌표 (n, m)를 보존하는 식으로 다음 노드의 입력 버퍼에 계산된 출력값을 기입한다. 이 노드에 의해 처리된 다음의 3 × 3 블록은 우측으로 1열만큼 시프트될 수 있으며(즉, 이미지 프레임에서 (n, m+1)에 중심을 가지며), 이에 따라, 계산된 출력은 바로 다음 노드의 입력 버퍼에 좌표 (n, m+1)와 관련하여 저장될 수 있다.The use of "switches" for triggering the operation of nodes, as described above as an example of row-based processing, can be generally applied to any kind of block-based processing irrespective of the particular type and size of data blocks. The key is that each node in the signal flow is triggered to execute when a sufficient amount of data is received in its input buffer to generate an output of one unit, and the size of the unit depends on the particular application. For example, consider an image-smoothing step (i.e., node) that replaces the value of each pixel with an average value of the 3x3 block of pixels centered at the issue pixel. This node has an output unit size of only one pixel and executes in its input buffer when, for example, it has a 3x3 block corresponding to a block centered at the coordinates (n, m) of the image frame, The calculated output value is written to the input buffer of the next node in such a manner as to save the coordinates (n, m). The next 3x3 block processed by this node can be shifted by one column to the right (i.e., centered at (n, m + 1) in the image frame) (N, m + 1) in the input buffer of the node.

각각의 노드에 대한 입력 데이터 블록의 크기는 일반적으로 앞선 노드(들)의 반복된 실행이 입력 블록에 대해 데이터의 필수량을 발생시킬 수 있도록 바로 앞선 노드로부터 출력 유닛의 크기(또는 노드가 앞선 노드들의 그룹으로부터 입력을 취한다면, 노드들의 앞선 그룹의 결합된 출력 유닛 크기)의 정수배이다. 다양한 실시예에서, 출력 유닛 크기는 모든 노드들에 대해 동일하다. 예컨대, 도 2b의 신호 흐름에서, 각각의 노드의 출력 유닛은 하나의 행의 데이터이며, 노드들에 의해 요구되는 입력 데이터 블록들 모두는 정수의 행들로 이루어진다. 모든 노드들에 대한 단일의 출력 유닛의 이용은 열 경계들에 걸친 8 × 8 오버랩 블록들과 같은 다른 블록 기반 처리 방안들에 요구되는 바와 같이 매 노드에 대해 상태 변수들을 세이빙하는 것을 필요로 하지 않고, 신호 흐름의 프로그래밍 및 신호 흐름에서의 데이터의 DMA 이동을 간소화할 수 있다. (노드가 8 × M 크기(여기서, M은 이미지의 하나의 행에서의 픽셀들의 수)의 버퍼 상에서 동작하게 될 행 기반 처리에 비해) 8 × 8 오버랩 블록 처리에서, 오버랩은 처리될 다음의 8 × 8 데이터 블록에 대해, 이전의 8 × 8 블록들로부터의 이전 열이 그 노드에서 상태 변수로서 저장되는 것을 필요로 한다.The size of the input data block for each node is generally determined by the size of the output unit from the immediately preceding node (or the size of the output unit from the preceding node (s)) so that repeated execution of the preceding node (s) The size of the combined output unit of the preceding group of nodes, if taking input from a group of nodes. In various embodiments, the output unit size is the same for all nodes. For example, in the signal flow of FIG. 2B, the output unit of each node is one row of data, and all of the input data blocks required by the nodes consist of integer rows. The use of a single output unit for all nodes does not require saving state variables for each node as required for other block-based processing schemes such as 8x8 overlap blocks across column boundaries , Programming the signal flow, and DMA transfer of the data in the signal flow. In an 8x8 overlap block processing (as compared to row based processing where a node will operate on a buffer of 8xM size, where M is the number of pixels in one row of the image) For a < RTI ID = 0.0 > 8 < / RTI > data block, the previous column from the previous 8x8 blocks needs to be stored as a state variable at that node.

도 3은 도 2b에 도시된 흐름과 같은 블록 기반 신호 처리 흐름들의 예시적인 DSP 구현을 보여주고 있다. DSP(300)는 전용의 로직 유닛 또는 프로세서 코어와 같은 ("처리 블록"으로 표시되는) 별도의 하드웨어를 갖는 신호 흐름의 각각의 노드를 구현하며, 도시된 실시예에서, 처리 블록들은 파이프라인 방식으로 데이터를 병렬로 처리할 수 있다. 구체적으로, 도시된 실시예는 2개의 처리 블록들(302, 304)만을 도시하고 있지만, 임의의 개수의 노드들을 갖는 신호 흐름들이 구현될 수 있다는 것을 알아야 한다. 데이터는 처리 블록들(302, 304) 간을 이동하고, 입력/출력 버퍼들(306, 308, 310)에 일시적으로 저장된다. 레지스터 뱅크(312)는 버퍼들(306, 308, 310)의 채움 상태를 모니터링하는 제어 파라미터들을 저장하고 또한 버퍼들과 처리 블록들(302, 304) 간의 데이터 흐름을 트리거링한다, 즉 레지스터 뱅크(312)는 노드들 간의 "스위치들"(220)을 제어한다. 제1 처리 블록의 입력 버퍼(306)는 메모리, 이미지 스트림들을 제공하는 카메라 또는 몇몇의 다른 입력 디바이스와 같은 내부 또는 외부의 데이터 소스(314, 316)로부터 데이터를 수신하고, 멀티플렉서(318)는 이러한 복수의 데이터 소스들 간의 선택을 용이하게 할 수 있다. 최종 처리 노드(304)의 출력 버퍼(310)는 예를 들어 메모리 또는 디스플레이 디바이스와 같은 내부 또는 외부의 데이터 싱크(322)에게 (옵션으로, 다른 멀티플렉서(320)를 경유하여) 데이터를 보낸다.FIG. 3 shows an exemplary DSP implementation of block-based signal processing flows, such as the flow shown in FIG. 2B. DSP 300 implements each node of a signal flow having a dedicated logic unit or a separate hardware (represented by a "processing block"), such as a processor core, Data can be processed in parallel. Specifically, although the illustrated embodiment shows only two processing blocks 302 and 304, it should be appreciated that signal flows having any number of nodes may be implemented. The data moves between the processing blocks 302 and 304 and is temporarily stored in the input / output buffers 306, 308 and 310. Register bank 312 stores control parameters that monitor the fill state of buffers 306, 308 and 310 and also triggers data flow between buffers and processing blocks 302 and 304, Quot; switches "220 between the nodes. The input buffer 306 of the first processing block receives data from an internal or external data source 314, 316, such as a memory, a camera that provides image streams, or some other input device, Thereby facilitating selection among a plurality of data sources. The output buffer 310 of the final processing node 304 sends data (optionally via another multiplexer 320) to an internal or external data sink 322, such as, for example, a memory or display device.

(범용 컴퓨터 상에서 실행되는 소프트웨어가 아닌) DSP 또는 하드웨어 상에서 구현되는 비디오 또는 이미지 처리 어플리케이션들에서, 이미지 프레임들은 국부적으로 저장되기에는 일반적으로 너무 크며, 이에 따라 저속 외부 메모리(도 3에서 데이터 소스(316)에 대응함)에 상주한다. 이미지 데이터는 행들 또는 블록들로 내부 메모리에 로드된다. 이러한 행들 또는 데이터 블록들이 일련의 처리 블록들(예컨대 블록(302, 304))에서 처리된 후에, 생성된 출력은 마찬가지로 (소스(316)와 동일한 메모리일 수 있는 데이터 싱크(322)에 대응하는) 저속 외부 메모리에 저장된다. 외부 메모리로의 데이터 이동 또는 외부 메모리로부터 내부 데이터 버퍼들(예컨대, 버퍼(306, 310))로의 데이터 이동은, 바람직하게는 DMA 제어기를 통해 이루어지며, 이러한 DMA 제어기는 대부분의 DSP들과 그 외의 특수 목적 프로세서들의 통합부이며, 예컨대 멀티플렉서(318, 320)에서 구현될 수 있다. 이 경우에, 외부 데이터 소스와 싱크(316, 322)는 DMA-인에이블된다(DMA-enabled). DMA 이동은 데이터 처리 그 자체와 함께, 프로세서로의 데이터 이동 및 프로세서로부터의 데이터 이동을 병렬화(parallelizing)하는 추가적인 장점을 갖는다; 즉 데이터 이동은 백그라운드에서 발생한다.In video or image processing applications that are implemented on a DSP or hardware (rather than software running on a general purpose computer), image frames are typically too large to be stored locally, )). &Lt; / RTI > The image data is loaded into the internal memory as rows or blocks. After these rows or data blocks are processed in a series of processing blocks (e.g., blocks 302 and 304), the generated output is similarly processed (corresponding to data sink 322, which may be the same memory as source 316) It is stored in low speed external memory. Movement of data to or from the external memory to the external memory (e. G., Buffers 306 and 310) is preferably through a DMA controller, which is responsible for most DSPs and other Purpose processors, and may be implemented in, for example, multiplexers 318 and 320. [ In this case, the external data source and sinks 316 and 322 are DMA-enabled (DMA-enabled). DMA migration, along with data processing itself, has the additional advantage of parallelizing data movement to and from the processor; In other words, data movement occurs in the background.

물론, 도 3의 하드웨어 실시예는 일례에 불과하다. 당업자에게는 용이하게 이해될 수 있는 바와 같이, 본 발명의 다양한 실시예들에 따른 신호 흐름들은 다양한 방법들로 구현될 수 있다. 예를 들어, DSP는 상이한 노드들에 대응하는, 예컨대 로컬 명령어 메모리에 저장된 명령어들의 집합들을 실행하기 위해 단일 프로세서 코어를 사용할 수 있다. 또한, 데이터 버퍼들은 동일한 메모리 공간을 공유할 수 있고, 필요에 따라 즉시 생성 및/또는 중복 기입될 수 있다. 노드들 사이의 스위치들을 제어하기 위한 레지스터들은 하드웨어 레지스터들일 수 있거나 또는 대안적으로는 버퍼들 및/또는 명령어들과 함께 로컬 메모리에 저장될 수 있다. DSP의 대안으로서, 신호 흐름은, 예컨대 마이크로컨트롤러들, ASICs(application-specific integrated circuits), FPGAs(field-programmable gate arrays) 또는 PGAs(programmable gate arrays)를 포함하는 임의의 다른 종류의 특수 목적 프로세서 상에서 실행될 수도 있다. 또한, 이에 따른 신호 흐름들은 범용 컴퓨터 상에서 실행되는 소프트웨어에서 구현될 수 있다. Of course, the hardware embodiment of Fig. 3 is merely an example. As will be readily appreciated by those skilled in the art, signal flows in accordance with various embodiments of the present invention may be implemented in a variety of ways. For example, the DSP may use a single processor core to execute sets of instructions stored in a local instruction memory, e.g., corresponding to different nodes. Also, the data buffers can share the same memory space and can be instantly created and / or over-written as needed. The registers for controlling switches between nodes may be hardware registers or, alternatively, may be stored in local memory with buffers and / or instructions. As an alternative to DSP, the signal flow may be on any other type of special purpose processor, including, for example, microcontrollers, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or programmable gate arrays (PGAs) . Also, the resulting signal flows can be implemented in software running on a general purpose computer.

(예컨대, 도 3에 도시된 바와 같은) 하드웨어 실시예로 돌아가서, 도 4a 및 4b는 레지스터들(312)에 의해 구현되는 스위치들의 동작을 보다 상세하게 도시한다. 도시된 예에서, 신호 흐름(도 4a에 도시됨)은 두 개의 노드들(400, 402)을 포함한다. 노드들(400, 402)의 각각은 입력과 출력에서 일차원 또는 이차원 데이터 버퍼들과 연관된다. 구체적으로, 도시된 바와 같이, 노드 0은 스위치(404, 406)를 통해 노드에 연결된 두 개의 DMA 소스 노드들 S0 및 S1 로부터 입력을 수신하는데, 이는 자신의 데이터 버퍼 B0 및 B1(408, 410)에 공급된다. 노드 0의 출력은 2차원 버퍼 B2(412)에 공급되는데, 이는 결과적으로 스위치(414)를 통해 노드 1의 입력에 연결된다. 노드 1의 출력은 스위치(418)를 통해 DMA 싱크 S2에 연결된 데이터 버퍼 B3(416)으로 들어간다.Turning to the hardware embodiment (e.g., as shown in FIG. 3), FIGS. 4A and 4B show the operation of the switches implemented by the registers 312 in more detail. In the illustrated example, the signal flow (shown in FIG. 4A) includes two nodes 400, 402. Each of the nodes 400, 402 is associated with one-dimensional or two-dimensional data buffers at input and output. Specifically, as shown, node 0 receives input from two DMA source nodes S0 and S1 connected to the node via switches 404 and 406, which is connected to its data buffers B0 and B1 408 and 410, . The output of node 0 is supplied to a two-dimensional buffer B2 412, which is consequently connected to the input of node 1 via switch 414. [ The output of node 1 goes through switch 418 to data buffer B3 416 coupled to DMA sink S2.

시스템은 노드(400, 402)를 통해 신호 흐름을 제어하기 위해 총괄적으로 레지스터 뱅크(312)를 형성하는 4개의 상이한 레지스터 어레이들(420, 422, 424, 426)을 보유한다. 각각의 어레이는 복수, 예컨대 32-비트 레지스터들을 포함할 수 있다. 노드-소스 어드레스 레지스터 어레이(420)는 노드의 각각의 입력 소스를 위한 레지스터를 포함한다: 도시된 예에서는 노드 0의 두 개의 입력 소스들을 위한 두 개의 엔트리 및 노드 1의 입력 소스를 위한 하나의 엔트리. 이러한 레지스터들의 엔트리들은 각각의 입력 소스들에 대한 데이터 버퍼들의 어드레스들, 즉 버퍼 B0, B1 및 B2의 어드레스들이다. 일단 초기화되면, 이러한 레지스터 엔트리들은 전체 신호 처리 동안에 변경되지 않는다. 노드-목적지 어드레스 레지스터 어레이(422)는 노드의 각각의 출력에 대한 레지스터 엔트리를 포함한다: 예를 들어, 노드 0의 출력에 대한 하나의 레지스터 및 노드 1의 출력에 대한 하나의 엔트리. 이러한 레지스터들의 엔트리들은 각각의 출력들에 대한 데이터 버퍼, 예컨대 버퍼 B2와 B3의 어드레스들이다. 노드의 출력에서의 일차원 버퍼들에서, 대응하는 레지스터들에서의 값들은 일단 초기화되면 변경되지 않는다. 출력에서의 2차원 버퍼들에서(예컨대, 노드 0의 출력에서의 버퍼 B2), 엔트리는 B2+2^nd 라인 값으로 초기화되고(즉, 레지스터에서의 초기 버퍼 어드레스는 버퍼 B2에서의 2^nd 행의 메모리 어드레스임), 첫 번째 반복 후에, 엔트리는 B2+3^rd 라인으로 변경되고, 그 후에 동일하게 유지된다. (현재 노드에 대한 입력 버퍼로서 기능하는 이전 노드의 출력은 첫 번째 반복 이후에 항상 동일한 메모리 어드레스에 기입될 것이므로, 업데이트가 필요하지 않다. 이것은 2D 버퍼들을 갖는 노드들을 대한 데이터 이동을 간소화한다. 이러한 반복은 버퍼의 크기에 의존한다는 것에 유념하자. 예컨대, 버퍼가 3×M이면, 반복은 3/2=1이고, 5×M이면, 반복은 5/2=2이다.)The system maintains four different register arrays 420, 422, 424, 426 that collectively form register banks 312 to control signal flow through nodes 400, 402. Each array may comprise a plurality, e.g., 32-bit registers. The node-to-source address register array 420 includes registers for each input source of the node: two entries for the two input sources of node 0 and one entry for the input source of node 1, . The entries of these registers are the addresses of the data buffers for each input source, i. E. The addresses of buffers B0, B1 and B2. Once initialized, these register entries are not changed during the entire signal processing. The node-destination address register array 422 includes register entries for each output of the node: one entry for the output of node 0 and one entry for the output of node 1, for example. The entries of these registers are the addresses of the data buffers for the respective outputs, e.g., buffers B2 and B3. In one-dimensional buffers at the output of the node, the values in the corresponding registers are not changed once initialized. In the two-dimensional buffer in the output (e.g., a buffer B2 in the node 0 output), the entry being initialized B2 + 2 ^nd line value (that is, the initial buffer address in the register is the 2 ^nd row in the buffer B2 Memory address), after the first iteration, the entry is changed to line B2 + 3 ^rd , and then remains the same. (The output of the previous node serving as the input buffer for the current node will always be written to the same memory address after the first iteration, so no update is needed.) This simplifies data movement for nodes with 2D buffers. Note that the iteration depends on the size of the buffer. For example, if the buffer is 3 × M, the iteration is 3/2 = 1, and if it is 5 × M, the iteration is 5/2 = 2.

스위치 값 레지스터 어레이(424)는 각 노드에 대한 제어 스위치가 닫하는 때를 판정한다: 그것은, 각각의 레지스터 내에, 스위치가 닫히고 노드의 실행이 트리거되도록 각 노드의 입력 버퍼 내에 요구되는 입력 데이터의 최소 개수의 행들(또는, 보다 일반적으로, 유닛들)을 저장한다. 예를 들어, 버퍼 B0를 노드 0에 연결시키는 스위치(404)는 버퍼 B0이 1개의 데이터 행을 저장한다; 이 값은 레지스터 내에 저장된다. 스위치(414)는 버퍼 B2를 노드 1에 연결시킨다; 3개 행들의 연관된 버퍼 요건이 적절한 레지스터 내에 저장된다. 마지막으로, 노드 카운트 레지스터 어레이(426) 내에서, 각각의 레지스터 엔트리가 노드에 대한 입력 소스 또는 노드가 접속되는 DMA 싱크에 대한 제어 스위치와 연관된다. 이 레지스터 값들은 각 노드에 의해 처리되는 데이터 행들의 수를 추적하며(따라서, 입력 데이터 행이 처리될 때마다 노드에 의해 업데이트되며), 노드에 대한 스위치가 닫히는 시기를 제어하는 카운터들이다. 초기에는, 버퍼들 B0, B1 및 B3에 대한 레지스터 값들 C0, C1 및 C3은 모두 0이고, 버퍼 B3에 대한 레지스터 값 C2는 1이다. (2차 버퍼들에 대해, 초기 값은 2에 의해 제산되고 내림되는 버퍼 내의 행들의 수인 것으로 선택되어, 그 결과, 가령, round(3/2)의 값은 1이 될 수 있다.) 다음으로, DMA 소스 노드 S0이 버퍼 B0을 픽셀들(샘플들)의 하나의 행으로 채우고 DMA 소스 노드 S1이 버퍼 B1을 픽셀들의 하나의 행으로 채울 경우, 값들은 C0=1 및 C1=1로 업데이트된다. 각 카운터 레지스터와 연관된 스위치는 그 값이 대응하는 스위치 레지스터 내에 저장된 값 이상이 될 경우에 닫힌다. 의사-구문(pseudo-syntax)에서, 이는 이하의 "if문"에 대응한다. The switch value register array 424 determines when the control switch for each node closes: it is in each register that the switch is closed and the minimum value of the input data required in each node ' s input buffer And stores a number of rows (or, more generally, units). For example, switch 404, which couples buffer B0 to node 0, stores one row of data in buffer B0; This value is stored in the register. Switch 414 couples buffer B2 to node 1; The associated buffer requirements of the three rows are stored in the appropriate registers. Finally, within node count register array 426, each register entry is associated with a control switch for the DMA sink to which the input source or node to the node is connected. These register values track the number of data rows processed by each node (and thus are updated by the node each time an input data row is processed) and are the counters that control when the switch to the node closes. Initially, the register values C0, C1 and C3 for buffers B0, B1 and B3 are all zero and the register value C2 for buffer B3 is one. (For the secondary buffers, the initial value is chosen to be the number of rows in the buffer that are divided by 2 and rounded down, resulting in, for example, the value of round (3/2) , If the DMA source node S0 fills the buffer B0 with one row of pixels (samples) and the DMA source node S1 fills the buffer B1 with one row of pixels, the values are updated to C0 = 1 and C1 = 1 . The switch associated with each counter register is closed if its value is greater than or equal to the value stored in the corresponding switch register. In pseudo-syntax, this corresponds to the following "if statement".

if(C0>=SWITCH0 and C1>=SWITCH1)if (C0 > = SWITCH0 and C1 > = SWITCH1)

{close switch 0 and switch 1; process node 0; increment C2 by 1}{close switch 0 and switch 1; process node 0; increment C2 by 1}

if(C2>=SWITCH2)if (C2 > = SWITCH2)

{close switch 2; process node 1; increment C3 by 1}{close switch 2; process node 1; increment C3 by 1}

다양한 실시예에서, 본 발명은 본 명세서에 기술된 바와 같은 신호 흐름들의 디자인 및 구현시 애플리케이션 프로그래머들에게 도움이 되는 GUI 기반 툴을 제공한다. 도 5에 개념적으로 도시된 바와 같이, GUI 툴(500)은 프로그래머가 원하는 신호 흐름의 묘사(depiction)를 생성하는 것을 가능하게 하는 드로잉 캔버스(drawing canvas)(504) 및 연관된 드로잉 툴(506)을 갖는 에디터(502)를 포함한다. 또한, 툴은 다양한 개별 이미지 처리 알고리즘들을 실행하고, 선택되었을 경우, 신호 흐름의 기능 노드들, 즉 처리 블록들로서 역할을 하는, 기능들 또는 절차들의 라이브러리(508) - 즉, 자급 자족적인(self-contained) 컴퓨터 실행가능한 명령어들의 세트들 - 를 포함한다. 일부 실시예에서, 각 기능 및 절차는 그와 연관된 아이콘 또는 다른 그래픽적 표현을 갖는다. 프로그래머는 각 아이콘들을 드로잉 캔버스 상에 드래그 및 드롭함으로써 라이브러리로부터 원하는 기능들 또는 절차들을 선택할 수 있으며, 그런 다음, 예를 들어 그들 사이에 라인들을 그림으로써 그들을 연결해서 신호 흐름을 규정할 수 있다. 다른 실시예에서, 프로그래머는 일반적인 형상들 및 심볼들을 이용하여 신호 흐름을 규정하고, 그 이후 기능들을 다양한 심볼들에 할당할 수 있다. In various embodiments, the present invention provides GUI-based tools to aid application programmers in designing and implementing signal flows as described herein. 5, the GUI tool 500 includes a drawing canvas 504 and an associated drawing tool 506 that allow the programmer to generate a depiction of the desired signal flow. Gt; 502 < / RTI > The tool also executes a library 508 of functions or procedures that performs various individual image processing algorithms and, if selected, serves as functional nodes, i.e., processing blocks, of the signal flow - self- contained sets of computer executable instructions. In some embodiments, each function and procedure has an icon or other graphical representation associated with it. The programmer can select desired functions or procedures from the library by dragging and dropping each icon onto the drawing canvas, and then define the signal flow by connecting them, for example, by drawing lines between them. In another embodiment, a programmer may define signal flow using common shapes and symbols, and then assign functions to various symbols.

기능들은 특정 프로세서 또는 특정 하드웨어 구현에 최적화될 수 있다. 실제로, 일부 실시예에서, 상이한 하드웨어 구현들에 최적화된, 여러 버전의 실행가능한 코드들이 동일한 기능성들을 위해 제공되며, 프로그램 개발자가 그들 중에서 하나를 선택하는 것을 허용한다. 또한, 기능들 또는 절차들은 고유 입력 블록 크기들 및 출력 유닛 크기들을 위해 고유하게(inherently) 프로그래밍될 수 있다. 대안적으로, 각 기능 또는 절차에 대한 입력 및 출력 블록들은 크기가 변할 수 있으며, 프로그래머가 특정 애플리케이션을 토대로 그들의 크기들을 지정하는 것을 허용한다. 일부 실시예에서, 라이브러리는 고정 크기 데이터 블록들 및 가변 크기 데이터 블록들 모두를 포함한다. 블록 크기 외에도, 다른 사용자 선택가능한 파라미터들이 다양한 기능 블록들과 연관될 수도 있다.The functions may be optimized for a particular processor or specific hardware implementation. Indeed, in some embodiments, multiple versions of executable code, optimized for different hardware implementations, are provided for the same functionality, allowing the program developer to select one of them. In addition, functions or procedures may be inherently programmed for unique input block sizes and output unit sizes. Alternatively, the input and output blocks for each function or procedure may vary in size and allow the programmer to specify their sizes based on a particular application. In some embodiments, the library includes both fixed size data blocks and variable size data blocks. In addition to the block size, other user selectable parameters may be associated with the various functional blocks.

또한, GUI 툴(500)은 그래픽 묘사로부터 원하는 신호 흐름을 실행하는 프로그램 코드(512)를 자동으로 생성하는 컴파일러(510)를 포함한다. 컴파일러(510)는 라이브러리(508)로부터의 적절한 기능들을, 예를 들어, 그들을 프로그램 코드(512) 내에 직접적으로 복제하거나 링크함으로써 통합시키며, 노드들 간의 데이터 이동을 제어하는데 필요한 명령어들을 부가한다. 컴파일러(508)는 가령, 접속들, 스위치들 및 버퍼들을 나타내는 그래픽 엘리먼트들을 적절히 실행가능한 명령어들로 번역하기 위한 규칙들의 세트를 포함할 수 있다. 일부 실시예에서, GUI 툴은 다수의 프로그래밍 언어들을 지원할 수 있다; 이 경우, 라이브러리(508)는 지원되는 언어들의 각각에, 각각의 기능을 위한 프로그램 코드를 포함한다. 소정의 실시예에서는, GUI 툴은, 프로그래머가 가령 (메모리 요건들, 특정 프로세서에 대한 실행 시간, 처리 레이턴시(processing latency)들 등과 같은) 소정의 성능 파라미터들을 평가하기 위해 특정 신호 흐름을 테스트하는 것을 허용하는 시뮬레이터(514)를 포함할 수도 있다. 시뮬레이터(514)는 컴파일러(510)에 통합될 수 있다. The GUI tool 500 also includes a compiler 510 that automatically generates program code 512 to execute the desired signal flow from the graphical depiction. The compiler 510 adds the appropriate functions from the library 508, for example, by directly copying or linking them into the program code 512, and adding the necessary instructions to control the movement of data between the nodes. The compiler 508 may comprise a set of rules for translating graphics elements representing connections, switches and buffers into appropriate executable instructions, for example. In some embodiments, the GUI tool may support multiple programming languages; In this case, the library 508 includes program code for each function in each of the supported languages. In some embodiments, the GUI tool may allow a programmer to test a particular signal flow for evaluation of certain performance parameters, such as memory requirements, execution time for a particular processor, processing latencies, and the like Lt; RTI ID = 0.0 > 514 < / RTI > The simulator 514 may be integrated into the compiler 510.

도 6a-6h는 일 실시예에 따른 신호 흐름 프로그램밍을 위한 예시적인 GUI를 도시한다. GUI는 드로잉 캔버스(504) 옆의 탭들의 패널(600)을 포함한다. "형상들" 탭(602)은 신호 흐름 구성도를 그리기 위해 라인, 직사각형 등과 같은 상이한 그래픽 엘리먼트들을 포함할 수 있다. 구성도 내의 (가령, 직사각형으로 표현된) 노드들 각각은 "IP 블록들" 탭(604)에서 이용가능한, 이미지 처리 (또는 다른 유형의) 기능들 중 어느 하나에 할당될 수 있다. 도 6b에 도시된 바와 같이, 이 기능들은 가령, 드롭 다운 리스트(605)로서 제공될 수 있다. 일단 이미지 처리 기능이 노드에 할당되면, 기능의 파라미터들은 그 기능을 위한 파라미터 윈도우 내에 입력될 수 있다. 파라미터 윈도우들은 라이브러리 내의 이용가능한 다양한 기능들과 연관하여 GUI 내에서 구현된다. 도 6c는 3개의 상이한 이미지 처리 기능들을 위한 예시적인 파라미터 윈도우들(608, 610, 612)를 나타낸다.6A-6H illustrate an exemplary GUI for signal flow programming in accordance with one embodiment. The GUI includes a panel 600 of taps next to the drawing canvas 504. "Shapes" tab 602 may include different graphic elements, such as lines, rectangles, etc., to draw a signal flow schematic. Each of the nodes (e.g., represented by a rectangle) within the block diagram may be assigned to any of the image processing (or other types of) functions available in the "IP Blocks" As shown in FIG. 6B, these functions may be provided, for example, as a drop-down list 605. Once the image processing function is assigned to the node, the parameters of the function can be entered in the parameter window for that function. The parameter windows are implemented within the GUI in association with the various functions available in the library. 6C illustrates exemplary parameter windows 608, 610, 612 for three different image processing functions.

파라미터 윈도우들(606, 608, 610) 내의 사용자에 의해 특정된 파라미터들(또는 지정되지 않은 경우, 디폴트 값들)은 파라미터 리스트 또는 어레이(예를 들어, 이중 포인터 어레이 또는 링크된 리스트) 내에 저장되며 컴파일러(510)에 전달된다. 전형적으로, 각각의 노드는 리스트 내에 저장되는 하나 이상의 파라미터들을 갖는다. 예를 들어, 도 6c를 참조하면, 노드 0이 "임계화(thresholding)"를 위한 것이고 노드 1이 "이로전(erosion)"을 위한 것인 경우, 파라미터 리스트 내의 제1 엔트리는 노드 0에 대해 파라미터 윈도우(612)로부터 저장되며, 다음 3개의 엔트리들은 노드 1에 대해 파라미터 윈도우(610)로부터 저장된다. 컴파일러(510)는 파라미터들의 리스트를 이용하여, 각 노드와 연관된 이미지 처리 기능의 값들을 리턴시킨다. 유사하게, 이 리턴 값들은 리스트 또는 어레이 내에 저장된다. User specified parameters (or default values, if not specified) in the parameter windows 606, 608, 610 are stored in a parameter list or array (e.g., a dual pointer array or linked list) Lt; / RTI > Typically, each node has one or more parameters stored in the list. For example, referring to FIG. 6C, if node 0 is for "thresholding" and node 1 is for "erosion", the first entry in the parameter list is for node 0 Are stored from the parameter window 612 and the next three entries are stored from the parameter window 610 for node one. The compiler 510 uses the list of parameters to return the values of the image processing functions associated with each node. Similarly, these return values are stored in a list or array.

도 6d 내지 6f에 도시된 것처럼, 패널(600)은 애플리케이션 개발자가 DMA 이동 및 스케줄링을 그래픽으로 정의할 수 있도록 하는 DMA 탭(620)을 더 포함한다. DMA 소스 및 싱크 노드들(622, 624) 및 DMA 소스 및 싱크 스케줄링 경로들(626, 628)과 같은 그래픽 DMA 엘리먼트들은 적합한 코드 관리 DMA(code governing DMA)를 생성하기 위해 컴파일러(510)에 의해 이후에 사용되는 관련 파라미터들을 갖는다. 이러한 파라미터들 중 일부는 그래픽 신호 흐름(예를 들어, DMA 노드가 연결된 신호 흐름에서의 노드)으로부터 직접 판독될 수 있는 반면에, 다른 것들은 그래픽 DMA 엘리먼트가 선택되는 경우 팝업되는 파라미터 윈도우에 개발자에 의해 입력될 수 있다. 예를 들어, 노드 0 및 1에 대한 파라미터 윈도우들(630, 632)에서의 사용자 입력에 기초하여, 컴파일러(510)는, 도 6h에 도시된 것처럼, 모든 노드에 대해 메모리 어레이에서 2개의 레지스터 엔트리들을 설정할 수 있다. 보다 상세하게는, 컴파일러(510)는 치수 "버퍼 폭" 및 "버퍼 높이"를 갖고 명칭 "버퍼 명칭"을 갖는 버퍼들에 대해 메모리를 할당할 수 있다. 그 후, 그것은 그 노드에 대해 어레이 또는 리스트(634)에서 2개의 레지스터 엔트리들을 설정할 수 있다; 제1 엔트리는 "버퍼 명칭" 및 "오프셋" 파라미터들로부터 자동 생성되는 어드레스이고, 제2 엔트리는 "DMA 스트라이드(DMA Stride)"에 기초한다. 이러한 리스트들은 컴파일러(510)에 의해 다양한 DMA 경로들을 설정하고, 시작하고, 종료하기 위해 사용된다. 따라서, 여기의 다양한 실시예들에서, DMA는 GUI의 필수 부분이고, DMA 이동 및 DMA 스케줄링을 위한 코드는, 그래픽 및/또는 텍스트의 사용자 입력으로부터 자동 생성되어 사용자에게서 이러한 다른 지루한 작업을 없앤다.As shown in Figures 6d through 6f, the panel 600 further includes a DMA tab 620 that allows the application developer to graphically define DMA movement and scheduling. Graphics DMA elements such as DMA source and sink nodes 622 and 624 and DMA source and sink scheduling paths 626 and 628 may be used by compiler 510 to generate appropriate code management DMA Lt; / RTI > Some of these parameters may be read directly from the graphical signal flow (e.g., the node in the signal flow to which the DMA node is connected), while others are stored in the parameter window popped up when the graphic DMA element is selected Can be input. For example, based on the user input in the parameter windows 630 and 632 for nodes 0 and 1, the compiler 510 generates two register entries in the memory array for all nodes, Can be set. More specifically, the compiler 510 may allocate memory for buffers having the dimensions "buffer width" and "buffer height " It can then set two register entries in the array or list 634 for that node; The first entry is an address automatically generated from the "buffer name" and "offset" parameters, and the second entry is based on the "DMA Stride". These lists are used by the compiler 510 to set, start, and terminate the various DMA paths. Thus, in various embodiments herein, the DMA is an integral part of the GUI and the code for DMA movement and DMA scheduling is automatically generated from user input of graphics and / or text to eliminate this other tedious task for the user.

DMA 코드를 생성하기 위해 개발자로부터 요구되는 입력은 일반적으로 각각의 소스 또는 싱크 노드에 대한 외부 메모리 버퍼 어드레스, 이미지/비디오 버퍼의 다음 행 또는 행들로 가기 위한 스트라이드 및, 선택적으로, 각각의 스케줄링 경로와 연관된 처리 노드들뿐만 아니라 하나 또는 다수의 소스/싱크 노드들과 연관된 스케줄링을 포함한다. 개발자가 스케줄링을 특정하지 않는 경우, 컴파일러(510)는 디폴트 규칙들(default rules)에 기초하여 DMA 스케줄링 경로를 자동적으로 생성할 수 있다. 컴파일러는 또한, 포트가 DMA 노드 및 그 노드와 연관된 입력 버퍼로부터 병렬의, 오버랩핑 입력을 수신하는 경우, 그 노드의 소스 포트에 이중 핑퐁 버퍼들(dual ping-pong buffers)을 자동적으로 할당할 수 있다.The input required by the developer to generate the DMA code is generally an external memory buffer address for each source or sink node, a stride to go to the next row or rows of the image / video buffer, and, optionally, Scheduling associated with one or more source / sink nodes as well as associated processing nodes. If the developer does not specify scheduling, the compiler 510 may automatically generate a DMA scheduling path based on default rules. The compiler can also automatically assign dual ping-pong buffers to the source port of the node when receiving a parallel, overlapping input from the DMA node and the input buffer associated with that node have.

DMA 스케줄링은 노드들을 통해 하나의 프로세서가 순환하는 실시예에 대한 예시적인 신호 흐름에 대해 도 6d 내지 6f에서 더 도시된다. 여기에서, 노드 0 및 1은 DMA 소스 노드들과 연관되고, 노드 4 및 5는 DMA 싱크 노드들과 연관된다. DMA 소스 노드들은 외부 메모리로부터 내부 버퍼들로 데이터를 가져오고, 싱크 노드들은 버퍼들로부터 외부 메모리로 데이터를 빼낸다. DMA 노드들 각각은 처리 노드의 출력 포트 또는 소스 포트 중 어느 하나로 연결된다. 개발자는 DMA 소스들 및 싱크들의 스케줄링 경로들을 특정할 수 있다(그렇지 않으면 컴파일러(510)는 자동적으로 적합한 스케줄링 경로들을 찾을 것이다). 모든 DMA 소스 노드들에 대한 하나의 단일 경로(626) 및 모든 DMA 싱크 노드들에 대한 하나의 경로(628)가 있을 수 있다. 그 경로들은 언제 DMA가 시작하거나 종료할지를 표시한다; 예를 들어, 소스 노드들과 연관된 DMA는 노드 3 이전에 시작하고 노드 5 이후에 종료한다; 즉 노드 1 내지 2가 동작하는 동안 출력이 DMA를 통해 판독되는 반면에 노드 3 내지 5가 처리하는 동안 새로운 데이터가 DMA를 통해 들어온다. 마찬가지로, 싱크 노드들에 대한 DMA는 노드 0 이전에 시작하여 노드 2 이후에 종료한다. 이러한 유형의 스케줄링은 하나의 DMA 컨트롤러(또는 DMA 인에이블링 하드웨어/주변장치)만이 신호 흐름에서 모든 DMA 노드들과 연관되는 실시예들에서 일반적이다.DMA scheduling is further illustrated in Figures 6d through 6f for an exemplary signal flow for an embodiment in which one processor cycles through the nodes. Here, nodes 0 and 1 are associated with DMA source nodes, and nodes 4 and 5 are associated with DMA sink nodes. DMA source nodes fetch data from external memory to internal buffers, and sink nodes fetch data from buffers to external memory. Each of the DMA nodes is coupled to either an output port or a source port of the processing node. The developer may specify the scheduling paths of the DMA sources and sinks (otherwise the compiler 510 will automatically find suitable scheduling paths). There may be one single path 626 for all DMA source nodes and one path 628 for all DMA sink nodes. The paths indicate when the DMA starts or ends; For example, the DMA associated with the source nodes starts before node 3 and ends after node 5; That is, while the nodes 1 to 2 are operating, the output is read through the DMA, while the nodes 3 to 5 are processing, new data comes in through the DMA. Similarly, the DMA for sink nodes starts before node 0 and ends after node 2. This type of scheduling is common in embodiments where only one DMA controller (or DMA enabling hardware / peripheral) is associated with all DMA nodes in the signal flow.

대안적인 실시예들에서, 다수의 DMA 컨트롤러들은 DMA 노드들과 연관될 수 있다; 이러한 경우에, DMA 경로들은 도 6e에 도시된 것처럼 중복될 수 있다. 도 6f에 도시된 또 다른 경우에서, 노드 1의 소스 포트(2)에서의 버퍼 및 소스 포트(1)에서의 DMA가 중복된다, 즉 DMA가 다음 행으로부터 소스 포트(2)에서의 버퍼로 데이터를 가져오는 경우, 노드 1은 또한 이전의 행에 대해 소스 포트(2)에서의 버퍼에서 데이터를 처리한다. 이것은 노드 1에 대해 소스 포트(2)에서 버퍼에서의 데이터 병렬성(data parallelism) 또는 데이터 손상(data corruption)을 야기한다. 노드 1의 소스 포트(2)에서의 이중 상태 핑퐁 버퍼(dual-state ping-pong buffer)(당업자에게 공지됨)는, 노드 1에 의한 데이터의 처리 및 노드 1에서의 DMA 입력이 병렬적이지만 독립적으로 진행하도록 하여 이러한 문제를 해결한다. 컴파일러(510)는 자동적으로 이러한 중복의 인스턴스들을 식별할 수 있고 이중 상태 핑퐁 버퍼들을 영향을 받은 소스 포트들에 할당할 수 있다.In alternative embodiments, multiple DMA controllers may be associated with DMA nodes; In this case, the DMA paths can be duplicated as shown in FIG. 6E. 6F, the buffer at the source port 2 of the node 1 and the DMA at the source port 1 are duplicated, that is, the DMA transfers data from the next row to the buffer at the source port 2 Node 1 also processes the data in the buffer at source port 2 for the previous row. This causes data parallelism or data corruption in the buffer at the source port 2 for node 1. A dual-state ping-pong buffer (known to those skilled in the art) at the source port 2 of node 1 is a parallel-to-serial ping-pong buffer To solve this problem. The compiler 510 can automatically identify instances of such redundancy and assign dual status ping buffers to the affected source ports.

상술한 GUI 툴(500)은, 예를 들어 범용 컴퓨터 상에서 실행되는 소프트웨어에서 구현될 수 있다. 도 7은 중앙 처리 유닛(CPU)(700) 및 관련 시스템 메모리(702), 하나 이상의 비휘발성 대용량 저장 디바이스들(및 관련 장치 드라이버들)(704), (예를 들어, 스크린, 키보드, 마우스, 스타일러스와 같은) 입력/출력 디바이스들(706), 및 프로세서 및 메모리가 서로 그리고 다른 시스템 구성요소들과 통신하는 시스템 버스(708)를 포함하는 예시적인 컴퓨터 실시예를 도시한다. 시스템 메모리(702)는 CPU(700)의 동작 및 그것의 다른 하드웨어 구성요소들과의 상호작용을 제어하는, 모듈들의 그룹으로 개념적으로 설명된, 명령어들을 저장한다. 운영체제(710)는 메모리 할당, 파일 관리, 및 저장 디바이스들(704)의 동작과 같은 낮은 레벨의 기본 시스템 기능들의 실행을 지시한다. 더 높은 레벨에서, 하나 이상의 서비스 애플리케이션들은 그래픽 신호 흐름 표시에 기초하여 코드를 자동 생성하기 위한 컴퓨터 기능을 제공한다. 이러한 애플리케이션들은 에디터(502), 컴파일러(510), 및 시뮬레이터(514)를 포함할 수 있다. 물론, 이러한 모듈들은 결합되거나, 더 구분되거나 또는 다르게 조직화될 수 있다; 당업자가 인식할 것처럼, 그 명령어들은 일반적으로 많은 다른 방식으로 그룹화되고 조직화될 수 있다. 시스템 메모리(702)는 또한 처리 블록들의 라이브러리(508)를 또한 저장할 수 있다. 애플리케이션들(502, 510, 514)을 구현하는 명령어들은 C, C++, 베이직, 파스칼, 포트란, 또는 어셈블리 언어를 포함하는(이에 제한되지 않음), 다양한 적합한 프로그래밍 언어들 중 임의의 것으로 프로그래밍될 수 있다.The GUI tool 500 described above can be implemented in software, for example, running on a general purpose computer. 7 illustrates a system 700 that includes a central processing unit (CPU) 700 and associated system memory 702, one or more non-volatile mass storage devices (and associated device drivers) 704, (e.g., Input / output devices 706), and a system bus 708 in which the processor and memory communicate with each other and with other system components. System memory 702 stores instructions that are conceptually described as a group of modules that control the operation of CPU 700 and its interaction with other hardware components. The operating system 710 directs execution of low-level basic system functions such as memory allocation, file management, and the operation of the storage devices 704. At a higher level, one or more service applications provide a computer function for automatically generating code based on a graphical signal flow indication. These applications may include an editor 502, a compiler 510, and a simulator 514. Of course, such modules may be combined, further divided or otherwise organized; As those skilled in the art will appreciate, the instructions may generally be grouped and organized in many different ways. The system memory 702 may also store a library 508 of processing blocks. The instructions implementing the applications 502, 510, 514 may be programmed into any of a variety of suitable programming languages including, but not limited to, C, C ++, BASIC, Pascal, Fortran, or assembly language .

여기에서 이용된 용어들 및 표현들은 제한이 아닌 설명하는 용어들 및 표현들로 사용되며, 그러한 용어들 및 표현들의 사용이, 도시되고 설명된 특징들 또는 그들의 일부의 임의의 등가물을 제외하고자 하는 것은 아니다. 또한, 본 발명의 소정의 실시예들을 설명하는 경우, 여기에 기재된 개념들을 포함하는 다른 실시예들이 본 발명의 의도 및 범위를 벗어나지 않고 사용될 수 있다는 것이 당업자에게 명백할 것이다. 따라서, 설명된 실시예들은 예시적이기만 하고 제한적이지는 않은 것으로서 모든 측면에서 고려될 것이다.The terms and expressions employed herein are used in the following non-limiting descriptive terms and expressions, and the use of such terms and expressions is intended to exclude any equivalents of the features shown or described or portions thereof no. It will also be apparent to those skilled in the art that, in describing certain embodiments of the invention, other embodiments, including the concepts described herein, may be used without departing from the spirit and scope of the invention. Accordingly, the described embodiments are to be considered in all respects as illustrative and not restrictive.

Claims

A storage-efficient method of processing frame data,
Receiving data from an input buffer associated with a second processing node in the series of processing nodes from a first processing node of a series of processing nodes implemented by the one or more computing devices, Wherein the node is performing an operation requiring an input that is a frame data block of a first predetermined size to generate a block of frame data of a second predetermined size, each of the first and second predetermined sizes comprising a plurality A frame data sample;
Determining, by the one or more computing devices, that the size of the data in the input buffer is equal to the first predetermined size; And
In response to a determination that the size of the data in the input buffer is equal to the first predetermined size, the second processing node generates a block of frame data of the second predetermined size To operate on the frame data in the input buffer
&Lt; / RTI >

Claim 2 has been abandoned due to the setting registration fee.

The method according to claim 1,
Wherein the frame data is image frame data.

Claim 3 has been abandoned due to the setting registration fee.

3. The method of claim 2,
Wherein the second predetermined size is an image frame of one row of image frames.

Claim 4 has been abandoned due to the setting registration fee.

The method of claim 3,
Wherein the first predetermined size is an image frame of a plurality of rows.

Claim 5 has been abandoned due to the setting registration fee.

The method according to claim 1,
Wherein the first processing node is a direct memory access (DMA) source node.

The method according to claim 1,
Further comprising: a reuse memory allocated to the input buffer for another buffer associated with another processing node in the series of processing nodes, wherein the another processing node is associated with the first processing node and the second processing node Different methods.

Claim 7 has been abandoned due to the setting registration fee.

The method according to claim 1,
Further comprising: operating in parallel a plurality of processing nodes in the series of processing nodes.

Claim 8 has been abandoned due to the setting registration fee.

The method according to claim 1,
And sequentially operating the processing nodes in the series of processing nodes.

Claim 9 has been abandoned due to the setting registration fee.

The method according to claim 1,
Wherein the step of determining that the size of the data in the input buffer is equal to the first predetermined size,
Maintaining a counter for the input buffer and incrementing the counter in response to receiving units of data from the first processing node.

A storage-efficient system for processing frame data,
At least one computing device for providing a first and a second processing node, the second processing node comprising: means for generating a block of frame data of a second predetermined size, the block being a block of frame data of a first predetermined size Wherein each of the first and second predetermined magnitudes comprises a plurality of frame data samples;
An input buffer associated with the second processing node, the input buffer being sized to store an amount of frame data that is greater than or equal to the first predetermined size; And
A logic switching for causing the second processing node to operate on the frame data in the input buffer, when the input buffer stores an amount of data equal to the first predetermined size, by the at least one computing device Mechanism
&Lt; / RTI >

11. The method of claim 10,
Wherein the logic switching mechanism comprises: a register for storing, for the second processing node, a counter of the number of blocks of the first predetermined size currently stored in the input buffer and a representation of the first predetermined size Systems Included.

Claim 12 is abandoned in setting registration fee.

12. The method of claim 11,
Wherein the register is a hardware register.

Claim 13 has been abandoned due to the set registration fee.

12. The method of claim 11,
Wherein the register is stored in a local memory associated with the at least one processing device.

Claim 14 has been abandoned due to the setting registration fee.

11. The method of claim 10,
Wherein the at least one computing device comprises a digital signal processor.

At least one non-temporary computer-readable medium for generating program code for block-based processing of frame data from a graphical representation of a signal flow defined in a graphical user interface,
Wherein the one or more non-temporary computer-readable media stores instructions, and in response to execution by the one or more computing devices of the system,
Signal processing nodes, each of the nodes providing a library of functions to implement one or more blocks of input frame data having a node specific size comprising a plurality of frame data samples, Performing an operation requiring input;
Providing a user with an editor to graphically define a signal flow comprising a plurality of nodes and connections and to associate each of the nodes with one of the functions from the library;
A compiler provides for generating program code from a graphically defined signal flow and associated functions, the code comprising instructions that, when each of the buffers associated with the node is determined to have stored a block of input frame data of a respective node- Causing the node to run -
One or more non-temporary computer readable media.

16. The method of claim 15,
Wherein the editor causes the user to graphically define direct memory access (DMA) of the signal flow.

Claim 17 has been abandoned due to the setting registration fee.

17. The method of claim 16,
Wherein the editor causes the user to define at least one of a DMA source, a DMA sink, or a DMA scheduling path.

Claim 18 has been abandoned due to the setting registration fee.

17. The method of claim 16,
Wherein the compiler is for generating program code that implements a graphically defined DMA.

Claim 19 is abandoned in setting registration fee.

16. The method of claim 15,
Wherein the instructions further cause the system to provide a display device for displaying the graphical user interface providing the graphical user interface to a display device for a display.

Claim 20 has been abandoned due to the setting registration fee.

The method according to claim 1,
Wherein the second processing node is to provide the DMA sink node with the generated block of frame data of the second predetermined size.

The method according to claim 1,
Wherein the first processing node is to generate the block of frame data of the second predetermined size by performing a filtering operation on the block of frame data of the first predetermined size.

Claim 22 is abandoned in setting registration fee.

The method according to claim 1,
Wherein the step of receiving data from the first processing node in the series of processing nodes in an input buffer associated with the second processing node in the series of processing nodes includes receiving frame data having a size smaller than the first predetermined size &Lt; / RTI >

11. The method of claim 10,
Wherein the second predetermined size is smaller than the first predetermined size.

11. The method of claim 10,
Wherein the first processing node is a DMA source node.

11. The method of claim 10,
Wherein the first predetermined magnitude and the second predetermined magnitude are different magnitudes.

Claim 26 is abandoned in setting registration fee.

11. The method of claim 10,
Wherein the first predetermined size is a first number of rows and the second predetermined size is a second number of rows.