CN116957909A

CN116957909A - Method and system for image processing

Info

Publication number: CN116957909A
Application number: CN202310464455.8A
Authority: CN
Inventors: 林钰杰; 刘鸿钧; 郑博元; 邱勇智; 张嘉祐; 谢政勋; 陈蕾; 陈立民; 汪岱锜
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2022-04-26
Filing date: 2023-04-26
Publication date: 2023-10-27

Abstract

The present disclosure relates to methods and systems for image processing. An image processing system includes one or more processors operable to receive a graph Application Programming Interface (API) call to add a composite node to a graph. The graph includes at least a composite node connected to other nodes by directed and loop-free edges. The one or more processors are further operable to process the composite node at compile time by the graph compiler by iteratively expanding the composite node into a plurality of nodes, wherein each node corresponds to an operation in the image processing pipeline. The system also includes one or more target devices to execute executable code compiled from the various nodes to perform operations of the image processing pipeline. The system also includes a memory for storing the graph compiler and the executable code.

Description

Method and system for image processing

Cross Reference to Related Applications

The present application claims the benefit of U.S. provisional application Ser. No.63/334,728, filed on 26, 4, 2022, and U.S. provisional application Ser. No.63/355,143, filed on 24, 6, 2022, the disclosures of which are incorporated herein by reference in their entireties.

Technical Field

Embodiments of the present application relate to a graph application programming interface (application programming interface, API) that simplifies and accelerates the deployment of computer vision applications on target devices.

Background

Graph-based programming models have been developed to address the increasing complexity of advanced image processing and computer vision problems. Computer vision applications typically include pipelining that may be described by a graph (graph). Nodes of the graph represent operations (e.g., computer vision functions), and directed edges (directed edges) represent data flows. Application developers use graph-based application programming interfaces (application programming interface, APIs) to build computer vision applications.

Several graph-based programming models have been designed to support image processing and computer vision functions on modern hardware architectures such as mobile and embedded systems-on-a-chip (SoC) and desktop systems (desktop systems). Many of these systems are heterogeneous, including multiple processor types, including multi-core central processing units (central processing unit, CPU), digital signal processors (digital signal processor, DSP), graphics processing units (graphics processing unit, GPU), vision processing units (vision processing unit, VPU), and the like. OpenVX published by the Khronos Group (Khronos Group) at month 2 of 2022 ^TM 1.3.1 Specification is one example of a graph-based programming model for computer vision applications. OpenVX provides a graph-based API that separates applications from the underlying hardware (underlying hardware) implementation. OpenVX is designed to maximize functional and performance portability across different hardware platforms, providing a computer vision framework that effectively solves current and future hardware architecture problems with minimal impact on applications.

The hardware vendors implement a graph compiler (compiler) and executor (executor) on their devices that optimizes the performance of the computer vision functions. Through an API (e.g., an OpenVX API), an application developer can build a computer vision application to obtain optimal performance without knowing the underlying hardware implementation. APIs enable application developers to efficiently access computer vision hardware acceleration with functional and performance portability. However, existing APIs can be cumbersome for certain computer vision applications. Thus, there is a need to further enhance existing APIs to make the task of application development simple.

Disclosure of Invention

In one embodiment, a method for image processing is provided, the method comprising the steps of: a graph API call (API call) is received that adds the composite node to the graph. The graph includes at least a composite node connected to other nodes by directed and loop-free edges. The method further comprises the steps of: processing, by a graph compiler, the composite node by expanding the composite node into a plurality of nodes at compile time, wherein each node (each node) corresponds to an operation in an image processing pipeline; and executing executable code compiled from the respective nodes on one or more target devices to perform operations of the image processing pipeline. Wherein in some embodiments, the composite node is processed by a graph compiler at compile time by iteratively expanding the composite node into a plurality of nodes.

In another embodiment, a system for image processing is provided. The system comprises: one or more processors, the one or more processors: a graph API call is received that adds a composite node to a graph. The graph includes at least a composite node connected to other nodes by directed and loop-free edges. The one or more processors are further operable to process the composite node by expanding the composite node into a plurality of nodes at compile time by a graph compiler, wherein each node corresponds to an operation in an image processing pipeline. The system also includes one or more target devices that execute executable code compiled from the respective nodes to perform operations of the image processing pipeline. The system also includes a memory that stores the graph compiler and the executable code.

Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.

Drawings

The present application is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements. It should be noted that different references to "an" or "one" embodiment in the present application are not necessarily to the same embodiment, and such references mean at least one. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

FIG. 1 is a schematic diagram of an image processing diagram according to one embodiment.

FIG. 2 is a schematic diagram illustrating a process for processing a composite node according to one embodiment.

FIG. 3 is a schematic diagram illustrating a process for processing a graph including composite nodes according to one embodiment.

FIG. 4 is a block diagram illustrating a system operable to perform image processing according to one embodiment.

Fig. 5 is a flowchart illustrating an image processing method according to an embodiment.

Detailed Description

In the following description, numerous specific details are set forth. It is understood, however, that embodiments of the application may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. However, it will be understood by those skilled in the art that the present application may be practiced without these specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

Embodiments of the present application provide a graph application programming interface (application programming interface, API) that enables software developers to create graphs that describe image processing pipelines. The image processing pipeline may be part of an image processing application that includes computer vision operations. The graph includes nodes corresponding to operations and edges representing dependencies between the nodes. Edges are directed and loop-free. Different nodes may correspond to different graph-based programming models, such as a computer vision programming model (e.g., openVX) and a deep learning programming model (e.g., tensorFlow or TensorFlow Lite). Other programming models may also be used. These programming models are supported by their respective frameworks, which provide respective libraries and other software infrastructure.

In the OpenVX programming model, a graph includes nodes added to the graph by a node creation function. The nodes may represent computer vision functions associated with the parameters. Nodes are linked together via data dependencies. The data object is processed by the node. As described above, openVX improves the performance and efficiency of computer vision applications by providing APIs as abstractions for common vision functions. These visual functions are optimized to significantly accelerate their execution on the target hardware.

The graph API disclosed herein is an extension of the OpenVX API. The graph API enables a software developer to add nodes of different graph-based programming models to a graph. For example, a software developer may add a composite node to a graph of OpenVX nodes. Each OpenVX node corresponds to an OpenVX function provided by an OpenVX library. The composite node may differ from the OpenVX node in that the composite node may correspond to a series of functions, for example, the series of functions may be general purpose computer vision functions, the series of functions may be client-defined functions, the series of functions may be neural network functions, or the series of functions may be subgraphs of the OpenVX function. The graph API disclosed herein minimizes efforts to integrate various programming models into an image processing pipeline; for example, by integrating neural network operations into computer vision operations. In one embodiment, a buffer is attached to the composite node to store code and parameters associated with the operation of the composite node. The graph compiler, at compile time, may decode the contents of the buffer to convert the operations of the composite node into an intermediate representation (intermediate representation, IR) and then compile the IR into executable code for execution on the target device.

Fig. 1 is a schematic diagram of an image processing graph (graph) 100 ("graph) 100") according to one embodiment. A graph (100) is an example of a graph model representing a series of imaging operations and their connections. A series of imaging operations form an image processing pipeline. Graph 100 includes a plurality of nodes 120 through 125 (represented by circles), each node corresponding to one or more operations. The individual operations may be functions selected from a library of, for example, image processing functions, neural network functions, or other types of functions. Edges (indicated by arrows) of graph 100 connect nodes and define data flows from source nodes 121 and 123 to destination node 125. Graph 100 is directed and acyclic (acyclice); that is, the edges of the graph 100 pass only unidirectionally without looping.

The graph 100 is compiled into executable code and executed on a target device such as image processing hardware. A Central Processing Unit (CPU) may run a graph compiler to compile a graph 100 into executable code. The executable code may be executed by the target device as an image processing pipeline. During execution, data objects, such as input data, output data, and intermediate data, may be stored in temporary buffers 101-108 (represented by rectangles) that are accessible by the target device. The CPU may invoke execution of the image processing pipeline and receive output of the image processing pipeline. The CPU does not invoke execution of individual operations in the image processing pipeline. Thus, during execution of the image processing pipeline, overhead caused by interaction between the CPU and the target device is significantly reduced.

In one embodiment, the graph API enables a software developer to create a graph 100 and add nodes 120 through 125 and edges to the graph 100. In one embodiment, the graph API enables a software developer to add a composite node (e.g., composite node 120) to the graph 100 using a single API call. The composite node corresponds to a plurality of operations performed on the image data segment. For example, the composite node 120 may be a general Computer Vision (CV) node, a node defined by a client, a Neural Network (NN) model node, an OpenVX (VX) graph node, or the like. The generic CV nodes, the customer-defined nodes, the neural network model nodes, and the VX graph nodes are based on different graph-based programming models. Different graph-based programming models provide different libraries of functions. The generic CV node may be based on a proprietary library of computer vision functions provided by a hardware vendor. The customer-defined nodes may contain proprietary code that is programmed by the software developer and provided to the hardware vendor. The proprietary code may contain object-dependent code (target-dependent code) that has been compiled. The neural network model nodes may correspond to the operation of multiple layers of the neural network and may be programmed based on a deep learning library of functions (e.g., tensorFlow, tensorFlowLite, etc.). The VX graph nodes may contain subgraphs (subgraphs) of nodes, where each node is programmed based on an OpenVX function library.

Operations corresponding to the composite node 120 may include computer vision functions, image processing functions, machine learning functions, neural network functions, user-defined functions, and the like. As an example, the composite node 120 may correspond to a plurality of Operation (OP) layers of a deep-learning neural network (deep-learning neural network, DNN). In one embodiment, a graph 100 may include nodes corresponding to different programming models; for example, the composite node 120 may be a neural network model node, and the other nodes in the graph 100 may be OpenVX nodes.

As an example, neural network model nodes may be described in TensorFlowLite, which provides a deep learning library for deploying machine learning models on edge devices. The graph API disclosed herein may provide an api_complex 160 that may be invoked by a software developer to add a composite node to the graph, where the composite node corresponds to the multi-layer neural network model described in TensorFlowLite. In the example shown in fig. 1, api_complex 160 includes a parameter vx_array_tflite corresponding to the composite node.

In one embodiment, as shown in FIG. 1, the parameter vx_graph indicates to which graph the composite node is added, constvx_tensor inputs [ ] indicates the input of the composite node, e.g., the input of the composite node 120 is data/information in 106; constvx_unit32in_num represents that there are several inputs; constvx_array in_scales represents the scale of the input relative to quantization; constvx_ary in_zeropoints represents the quantization-related offset of the input; the constvx_array_tflite represents a buffer of the release model; constvx_tensor outputs [ ] represents the output of the composite node; the constvx_unit 32 out_num indicates that there are several outputs; constvx_array out_scales represents the quantization-related ratio of the output of the composite node; constvx_ary out_zeropoints represents the quantization-dependent offset of the output of the composite node.

In one embodiment, a generic CV node, a client-defined node, a neural network model node, or a VX graph node may be added to the graph as a composite node using a corresponding graph API call. In another embodiment, a composite node may be added to the graph using a unified graph API call, wherein the composite node may be any of a generic CV node, a customer-defined node, a neural network model node, or a VX graph node.

According to an embodiment of the application, the graph compiler decodes the composite node at compile time into a sequence of nodes with respective input and output connections to neighboring nodes. Each node in the sequence corresponds to a predefined function in the library. That is, the composite node is extended to a sequence of nodes at compile time. The software developer uses the disclosed graph API to add the composite node to the graph, and the graph compiler is responsible for decoding the composite node. Thus, the overhead for creating a graph containing multiple graph-based programming models is greatly reduced.

In contrast, in some conventional systems, the node corresponds to a predetermined function. To add multiple functions (such as multiple OP layers of a neural network) to the graph, a software developer will add one node at a time (one OP layer) and connect input/output data objects for each node. The software developer will make API calls to add various nodes, where each node may correspond to one OP layer. Thus, adding multiple OP layers would require multiple API calls by the software developer. In some other conventional systems, a software developer first converts multiple OP layers of a neural network into multiple nodes; for example, multiple OpenVX nodes. These converted OpenVX nodes are then added to the OpenVX graph. The conversion is performed by the software developer before compile time. These conventional methods impose significant overhead on software development.

FIG. 2 is a schematic diagram illustrating a process 200 for processing a composite node according to one embodiment. As an example, the processed composite node may correspond to the operation of a multi-layer neural network model. Process 200 includes three phases: a graph generation stage 210, a graph compilation stage 220, and an execution stage 230. In the graph generation phase 210, the software developer creates a graph at step 211 and adds nodes (e.g., nodes 120 through 125 in FIG. 1) to the graph at step 212 using the graph API 240. When a node is added to the graph, a buffer is attached to the node to store code and parameters associated with the node in step 213. Thus, in the description herein, it should be understood that the code contained in a node is stored in a buffer attached to the node.

After all nodes are added to the graph, the graph compilation stage 220 begins with the graph compiler 250 processing the various nodes in the graph at step 221. If the graph includes a composite node (e.g., composite node 120 in FIG. 1), then graph compiler 250 decodes the composite node by decoding information in a buffer attached to the composite node at step 222. The decoding of the composite node produces a set of nodes, each of which is added to the graph at step 223. Each node added to the graph corresponds to a predefined function in the function library. The graph compiler 250 then compiles the graph into executable code at step 224. Process 200 proceeds to execute stage 230 where target device 260 executes the executable code at step 231. Non-limiting examples of target devices 260 include visual processing units (vision processing unit, VPUs) 261, direct memory access (direct memory access, DMA) and/or enhanced DMA devices (enhanced DMA device, edma) 262, deep learning accelerators (deep-learning accelerator, DLAs) 263, and the like.

FIG. 3 is a schematic diagram illustrating a process 300 for processing FIG. 305 including a composite node, according to one embodiment. In this example, the composite node being processed may correspond to a generic CV node, a node defined by a customer, a neural network model node, or a VX graph node. Graph 305 may be created and nodes added by graph API 240 in fig. 2. In one embodiment, process 300 may be performed by graph compiler 250 in FIG. 2.

Fig. 3 shows an iterative process for the nodes in fig. 305. For each node to be processed, the graph compiler 250 first identifies a graph-based programming model used by the node at step 310; for example, whether the node is a generic CV node, a client-defined node, a neural network model node, a VX graph node, or the like. For OpenVX nodes (e.g., nodes 121 through 125 in fig. 1) corresponding to a single OpenVX function, the OpenVX nodes may be processed as VX graph nodes (i.e., the rightmost path of the process 300 seen) containing a sub-graph of one node. In one embodiment, a graph API call that adds a composite node to a graph identifies a graph-based programming model used by the composite node.

If the node is a generic CV node or a node defined by a customer, then the graph compiler 250 converts the parameters of that node into parameters consistent with the other nodes in the graph 305 at step 320; for example, parameters based on the OpenVX programming model. At step 360, the graph compiler 250 maps code contained in the generic CV node or the customer-defined node to an Intermediate Representation (IR). In one case, the node defined by the client may contain a target-dependent code (target-dependent code) that has been compiled. Thus, the mapping at step 360 may alternatively include copying the compiled code into a command archive for execution. In one embodiment, the Intermediate Representation (IR) may be in a proprietary format provided by the vendor of the target device 260.

If the node is a neural network model node, then the graph compiler 250 decodes the model buffer attached to the neural network model node (i.e., decodes the code in the model buffer) at step 330. As shown in the example of fig. 2, the neural network model nodes may comprise a series of nodes, each node corresponding to a neural network function; i.e. the neural network OP layer. The graph compiler 250 maps each decoding node to a neural network function in the intermediate representation at step 360, where the neural network function is provided by a deep learning function library. If the node is a VX graph node containing an OpenVX sub-graph, then graph compiler 250 recursively processes the sub-graph one node at a time at steps 310, 340, and 350, and then maps the individual nodes in the sub-graph to an intermediate representation at step 360, where the sub-graph is processed at step 340. The graph compiler 250 determines in step 370 whether there are any unprocessed nodes in the graph 305. If there are unprocessed nodes in graph 305, process 300 proceeds to step 310 to continue graph processing. After all nodes in graph 305 are processed, the intermediate representation is compiled into machine executable code and executed by target device 260.

Fig. 4 is a block diagram of a system 400 operable to perform image processing according to one embodiment. The system 400 may be embodied in many form factors (form factors) such as a computer system, server computer, mobile device, handheld device, wearable device, etc. System 400 includes processing hardware 410, memory 420, and network interface 430. It will be appreciated that the system 400 is simplified for illustration; additional hardware and software components are not shown. Non-limiting examples of processing hardware 410 may include one or more processors including, but not limited to, a central processing unit (central processing unit, CPU), a graph processing unit (graphics processing unit, GPU), a visual processing unit (vision processing unit, VPU), a deep learning accelerator (deep learning accelerator, DLA), a DMA/eDMA device, and the like, on which graph compiler 460 may be run. One or more of the processors, processing units, and/or devices in the processing hardware 410 may be a target device that performs image processing pipelining in accordance with executable code 450 compiled from the figures. The graph compiler 460 may be an example of the graph compiler 250 in fig. 2.

Memory 420 may store a graph compiler 460, a function library 470, and executable code 450. Different libraries may support different graph-based programming models. Memory 420 may include dynamic random access memory (dynamic random access memory, DRAM) devices, flash memory devices, and/or other volatile or non-volatile memory devices. The graph compiler 460 compiles the graph received through the graph API call into executable code 450 for execution on a target device (e.g., VPU, eDMA device, DLA, etc.). The system 400 may receive the graph API call through a network interface 430, which may be a wired interface or a wireless interface.

Fig. 5 is a flow diagram illustrating a method 500 for image processing according to one embodiment. In one embodiment, image processing includes processing a graph including composite nodes. In one embodiment, the method 500 may be performed by a system, such as the system 400 in fig. 4.

The method 500 begins at step 510 when the system receives a graph API call to add a composite node to a graph, the graph including at least a composite node connected to other nodes by directed and loop-free edges. At step 520, the system processes the composite node at compile time using a graph compiler by expanding the composite node into a plurality of nodes, wherein each node corresponds to an operation in the image processing pipeline. Wherein, if the node is a VX graph node containing an OpenVX subgraph, the graph compiler may process the composite node at compile time by iteratively expanding the composite node into multiple nodes. At step 530, the system executes executable code compiled from the various nodes on one or more target devices to perform operations of the image processing pipeline.

In one embodiment, each of the other nodes in the graph corresponds to a computer vision operation in the image processing pipeline. In one embodiment, the composite node represents a multi-layer neural network model, and each node of the composite node corresponds to one operational layer of the neural network model. In another embodiment, the composite node represents a subgraph of nodes corresponding to a plurality of computer vision operations. In yet another embodiment, the composite node corresponds to a customer-defined code for image processing.

In one embodiment, operations corresponding to the composite node are selected from a first library of functions, and operations corresponding to other nodes in the graph are selected from a second library of functions. The first library of functions and the second library of functions may be based on different programming models optimized for different types of operations. In one embodiment, the first library of functions is provided by a deep learning framework or by a user, and the second library of functions is provided by a computer vision framework. In another embodiment, operations corresponding to the composite node and other nodes in the graph are each selected from the same computer vision function library.

In one embodiment, the composite node may be a customer-defined node representing customer-defined CV operations, where the operations corresponding to the composite node are selected from a customer-defined separate CV library.

In one embodiment, the graph API call identifies a graph-based programming model used by operations corresponding to the composite node. In one embodiment, the graph API call identifies the composite node as one of: proprietary Computer Vision (CV) nodes, customer-defined nodes, neural network model nodes, and VX graph nodes based on OpenVX.

The operation of the flow chart of fig. 5 has been described with reference to the exemplary embodiments of fig. 2 and 4. However, it should be understood that the operations of the flowchart of fig. 5 may be performed by embodiments of the present application other than the embodiments of fig. 2 and 4, and that the embodiments of fig. 2 and 4 may perform operations different from those discussed with reference to the flowchart. While the flow diagram of fig. 5 shows a particular order of operations performed by certain embodiments of the application, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

Various functional components or blocks have been described herein. As will be appreciated by those skilled in the art, the functional blocks will preferably be implemented by circuits (either dedicated circuits or general purpose circuits that operate under the control of one or more processors and encoded instructions) that will typically include transistors configured in such a way as to control the operation of the circuits in accordance with the functions and operations described herein.

While the application has been described in terms of several embodiments, those skilled in the art will recognize that the application is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims

1. A method for image processing, the method comprising the steps of:

receiving a graph application programming interface API call to add a composite node to a graph, wherein the graph includes at least a composite node connected to other nodes by directed and loop-free edges;

processing, by a graph compiler, the composite node at compile time by expanding the composite node into a plurality of nodes, wherein each node of the plurality of nodes corresponds to an operation in an image processing pipeline; and

executable code compiled from the various nodes is used to perform the operations of the image processing pipeline.

2. The method of claim 1, wherein each of the other nodes corresponds to a computer vision operation in the image processing pipeline.

3. The method of claim 1, wherein the composite nodes represent a multi-layer neural network model, and each of the composite nodes corresponds to one operational layer of the multi-layer neural network model.

4. The method of claim 1, wherein the composite node represents a subgraph of nodes corresponding to a plurality of computer vision operations.

5. The method of claim 1, wherein the composite node corresponds to a customer-defined code for image processing.

6. The method of claim 1, wherein operations corresponding to the composite node are selected from a first library of functions and operations corresponding to other nodes in the graph are selected from a second library of functions, and wherein the first and second libraries of functions are based on different programming models optimized for different types of operations.

7. The method of claim 6, wherein the first library of functions is provided by a deep learning framework or a customer, and the second library of functions is provided by a computer vision framework.

8. The method of claim 1, wherein the operations corresponding to the composite node and the other nodes in the graph are each selected from a same computer vision function library.

9. The method of claim 1, wherein the graph API call identifies a graph-based programming model used by operations corresponding to the composite node.

10. The method of claim 1, wherein the graph API call identifies the composite node as one of: proprietary Computer Vision (CV) nodes, customer-defined nodes, neural network model nodes, and VX graph nodes based on OpenVX.

11. A system for image processing, the system comprising:

one or more processors that perform:

receiving a graph application programming interface API call to add a composite node to a graph, wherein the graph includes at least a composite node connected to other nodes by directed and loop-free edges; and is also provided with

Processing, by a graph compiler, the composite node by expanding the composite node into a plurality of nodes at compile time, wherein each node of the plurality of nodes corresponds to an operation in an image processing pipeline;

one or more target devices executing executable code compiled from respective nodes to perform operations of the image processing pipeline; and

a memory storing the graph compiler and the executable code.

12. The system of claim 11, wherein each of the other nodes corresponds to a computer vision operation in the image processing pipeline.

13. The system of claim 11, wherein the composite nodes represent a multi-layer neural network model, and each of the composite nodes corresponds to one operational layer of the multi-layer neural network model.

14. The system of claim 11, wherein the composite node represents a subgraph of nodes corresponding to a plurality of computer vision operations.

15. The system of claim 11, wherein the composite node corresponds to a customer-defined code for image processing.

16. The system of claim 11, wherein operations corresponding to the composite node are selected from a first library of functions and operations corresponding to other nodes in the graph are selected from a second library of functions, and wherein the first and second libraries of functions are based on different programming models optimized for different types of operations.

17. The system of claim 16, wherein the first library of functions is optimized for deep learning or customer provided and the second library of functions is optimized for computer vision.

18. The system of claim 11, wherein the operations corresponding to the composite node and the other nodes in the graph are each selected from a same computer vision function library.

19. The system of claim 11, wherein the graph API call identifies a graph-based programming model used by operations corresponding to the composite node.

20. The system of claim 11, wherein the graph API call identifies the composite node as one of: proprietary Computer Vision (CV) nodes, customer-defined nodes, neural network model nodes, and VX graph nodes based on OpenVX.