WO2019136754A1 - 人工智能处理装置的编译方法及系统、存储介质及终端 - Google Patents

人工智能处理装置的编译方法及系统、存储介质及终端 Download PDF

Info

Publication number
WO2019136754A1
WO2019136754A1 PCT/CN2018/072667 CN2018072667W WO2019136754A1 WO 2019136754 A1 WO2019136754 A1 WO 2019136754A1 CN 2018072667 W CN2018072667 W CN 2018072667W WO 2019136754 A1 WO2019136754 A1 WO 2019136754A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
artificial intelligence
processing device
intelligence processing
network model
Prior art date
Application number
PCT/CN2018/072667
Other languages
English (en)
French (fr)
Inventor
肖梦秋
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to PCT/CN2018/072667 priority Critical patent/WO2019136754A1/zh
Priority to CN201880002764.0A priority patent/CN109496294A/zh
Publication of WO2019136754A1 publication Critical patent/WO2019136754A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of software processing, and in particular, to a method and system for compiling an artificial intelligence processing device, a storage medium, and a terminal.
  • Deep learning stems from the study of artificial neural networks.
  • a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • Deep learning is a method based on the representation of data in machine learning. Observations (e.g., an image) can be represented in a variety of ways, such as a vector of each pixel intensity value, or more abstractly represented as a series of edges, regions of a particular shape, and the like. It is easier to learn tasks from instances (eg, face recognition or facial expression recognition) using some specific representation methods.
  • the advantage of deep learning is the use of unsupervised or semi-supervised feature learning and hierarchical feature extraction efficient algorithms instead of manual acquisition features.
  • CNN Convolutional Neural Networks
  • DBN Deep Belief Nets
  • CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification. Since the network avoids the complicated pre-processing of images, it can directly input the original image, and thus has been widely used.
  • the basic structure of the CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance. In addition, since the neurons on one mapping surface share weights, the number of network free parameters is reduced.
  • Each convolutional layer in the convolutional neural network is followed by a computational layer for local averaging and quadratic extraction. This unique two-feature extraction structure reduces feature resolution.
  • CNN is mainly used to identify two-dimensional graphics of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of the CNN learns through the training data, when the CNN is used, the feature extraction of the display is avoided, and the learning data is implicitly learned; and the weights of the neurons on the same feature mapping surface are the same. So the network can learn in parallel, which is also a big advantage of the convolutional network relative to the neural network connected to each other.
  • the convolutional neural network has unique advantages in speech recognition and image processing with its special structure of local weight sharing. Its layout is closer to the actual biological neural network, and weight sharing reduces the complexity of the network, especially multidimensional.
  • the feature that the input vector image can be directly input into the network avoids the complexity of data reconstruction during feature extraction and classification.
  • an object of the present invention is to provide a method and system for compiling an artificial intelligence processing device, a storage medium and a terminal, which can be quickly implemented in hardware by compiling a deep learning algorithm.
  • the present invention provides a method for compiling an artificial intelligence processing device, comprising the steps of: performing precision compression on a depth learning network model data based on a recognition accuracy of an artificial intelligence processing device to obtain deep learning data. Performing a graph analysis on the depth learning data map to obtain a deep learning data flow graph conforming to the protocol definition; generating executable software code based on the deep learning data flow graph, and inputting the executable software code into the An artificial intelligence processing device; generating a hardware bit stream based on the deep learning data flow graph, and inputting the hardware bit stream into the artificial intelligence processing device.
  • performing accuracy compression on the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device includes the following steps:
  • the deep learning network model adopts a Tensorflow training model.
  • the artificial intelligence processing apparatus includes a CPU and an FPGA, and the executable software code is input to the CPU, and the hardware bit stream is input to the FPGA.
  • the present invention provides a compiler system for an artificial intelligence processing device, including a precision compression module, a graph analysis module, a code generation module, and a bit stream generation module;
  • the precision compression module is configured to perform precision compression on the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device to obtain a deep learning data map;
  • the graph analysis module is configured to perform graph analysis on the deep learning data graph to obtain a deep learning data flow graph that conforms to the protocol definition;
  • the code generating module is configured to generate executable software code based on the deep learning data flow graph, and input the executable software code into the artificial intelligence processing device;
  • the bitstream generation module is configured to generate a hardware bitstream based on the deep learning dataflow graph, and input the hardware bitstream into the artificial intelligence processing device.
  • the precision compression module performs the following steps on the precision compression of the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device:
  • the deep learning network model adopts a Tensorflow training model.
  • the artificial intelligence processing apparatus includes a CPU and an FPGA, and the executable software code is input to the CPU, and the hardware bit stream is input to the FPGA.
  • the present invention provides a storage medium having stored thereon a computer program that, when executed by a processor, implements a method of compiling the artificial intelligence processing device.
  • the present invention provides a terminal, including: a processor and a memory;
  • the memory is for storing a computer program
  • the processor is configured to execute the computer program stored by the memory to cause the terminal to execute a compilation method of the artificial intelligence processing device.
  • the method and system for compiling the artificial intelligence processing device of the present invention, the storage medium, and the terminal have the following
  • FIG. 1 is a flow chart showing an embodiment of a method for compiling an artificial intelligence processing device of the present invention
  • FIG. 2 is a schematic diagram showing the result of a compiling system of the artificial intelligence processing device of the present invention in an embodiment
  • FIG. 3 is a schematic structural view of a terminal according to an embodiment of the present invention.
  • the compiling method and system, the storage medium and the terminal of the artificial intelligence processing device of the present invention can be quickly implemented on the artificial intelligence processing device by compiling the deep learning algorithm, so that the artificial intelligence processing device fully utilized can calculate the speed quickly.
  • the artificial intelligence processing device includes a CPU and an FPGA, wherein the CPU is configured to execute executable software code, and the FPGA is configured to run a hardware bit stream to complete a deep learning algorithm such as CNN.
  • a method for compiling an artificial intelligence processing device of the present invention includes the following steps:
  • Step S1 Perform precision compression on the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device to obtain a deep learning data map.
  • the depth learning network model data needs to be accurately compressed to be adapted to the artificial intelligence processing device.
  • the deep learning network model data after precision compression is a deep learning data graph.
  • performing accuracy compression on the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device includes the following steps:
  • curing ie, freezing
  • the graph structure of the deep learning network model and the weight of the model are solidified together.
  • quantization refers to the process of approximating a continuous value of a signal (or a large number of possible discrete values) to a finite number (or fewer) of discrete values. Quantization is mainly used in the conversion from continuous signals to digital signals. The continuous signal is sampled into a discrete signal, and the discrete signal is quantized to become a digital signal. Note that discrete signals do not usually require a quantized process, but may not be discrete in the range or require a quantized process.
  • the present invention quantizes the solidified learning network model data after curing using a certain quantization algorithm.
  • the quantification belongs to the mature prior art, and therefore will not be described herein.
  • the deep learning network model data after the solidification and the quantized deep learning network model data are generated and generated.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • Step S2 Perform a graph analysis on the deep learning data map to obtain a deep learning data flow graph that conforms to the protocol definition.
  • a hardware compatible map is first generated, a data stream graph is generated, and then the data stream graph is optimized, and finally a deep learning data flow graph defined by the symbol protocol is obtained.
  • Step S3 Generate executable software code based on the deep learning data flow graph, and input the executable software code into the artificial intelligence processing device.
  • the deep learning data flow graph is processed to match the software resources of the artificial intelligence processing device, and the software-driven related parameters for executing the deep learning network model are obtained, thereby obtaining executable software code. And inputting a software processing module of the artificial intelligence processing device.
  • Step S4 Generate a hardware bit stream based on the deep learning data flow graph, and input the hardware bit stream into the artificial intelligence processing device.
  • the deep learning data flow graph is processed to match the hardware resources of the artificial intelligence processing device, and a hardware bit stream capable of running on the hardware resource is obtained, and the artificial intelligence processing is input.
  • the hardware processing module of the device is processed to match the hardware resources of the artificial intelligence processing device, and a hardware bit stream capable of running on the hardware resource is obtained, and the artificial intelligence processing is input.
  • the hardware bit stream is input into a hardware processing module of the artificial intelligence processing device by means of a pipeline, and can be sequentially executed by the hardware processing module.
  • the hardware processing module is configured to perform convolution calculation of the CNN, and the hardware bit stream flows into the hardware processing module through a pipeline, so that each convolution layer and the full connection layer of the CNN are in an active state.
  • the compilation system of the artificial intelligence processing apparatus of the present invention includes a precision compression module 21, a graph analysis module 22, a code generation module 23, and a bit stream generation module 24.
  • the precision compression module 21 is configured to perform precision compression on the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device to obtain a depth learning data map.
  • the depth learning network model data needs to be accurately compressed to be adapted to the artificial intelligence processing device.
  • the deep learning network model data after precision compression is a deep learning data graph.
  • the precision compression module 21 performs the following steps on the precision compression of the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device:
  • curing ie, freezing
  • the graph structure of the deep learning network model and the weight of the model are solidified together.
  • quantization refers to the process of approximating a continuous value of a signal (or a large number of possible discrete values) to a finite number (or fewer) of discrete values. Quantization is mainly used in the conversion from continuous signals to digital signals. The continuous signal is sampled into a discrete signal, and the discrete signal is quantized to become a digital signal. Note that discrete signals do not usually require a quantized process, but may not be discrete in the range or require a quantized process.
  • the present invention quantizes the solidified learning network model data after curing using a certain quantization algorithm.
  • the quantification belongs to the mature prior art, and therefore will not be described herein.
  • the deep learning network model data after the solidification and the quantized deep learning network model data are generated and generated.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • the graph analysis module 22 is connected to the precision compression module 21 for performing graph analysis on the deep learning data graph to obtain a deep learning data flow graph conforming to the protocol definition.
  • a hardware compatible map is first generated, a data stream graph is generated, and then the data stream graph is optimized, and finally a deep learning data flow graph defined by the symbol protocol is obtained.
  • the code generation module 23 is coupled to the graph analysis module 22 for generating executable software code based on the deep learning data flow graph and inputting the executable software code to the artificial intelligence processing device.
  • the deep learning data flow graph is processed to match the software resources of the artificial intelligence processing device, and the software-driven related parameters for executing the deep learning network model are obtained, thereby obtaining executable software code. And inputting a software processing module of the artificial intelligence processing device.
  • the bitstream generation module 24 is coupled to the graph analysis module 22 for generating a hardware bitstream based on the deep learning dataflow graph and inputting the hardware bitstream to the artificial intelligence processing apparatus.
  • the deep learning data flow graph is processed to match the hardware resources of the artificial intelligence processing device, and a hardware bit stream capable of running on the hardware resource is obtained, and the artificial intelligence processing is input.
  • the hardware processing module of the device is processed to match the hardware resources of the artificial intelligence processing device, and a hardware bit stream capable of running on the hardware resource is obtained, and the artificial intelligence processing is input.
  • the hardware bit stream is input into a hardware processing module of the artificial intelligence processing device by means of a pipeline, and can be sequentially executed by the hardware processing module.
  • the hardware processing module is configured to perform convolution calculation of the CNN, and the hardware bit stream flows into the hardware processing module through a pipeline, so that each convolution layer and the full connection layer of the CNN are in an active state.
  • each module of the above system is only a division of logical functions, and the actual implementation may be integrated into one physical entity in whole or in part, or may be physically separated.
  • these modules can all be implemented by software in the form of processing component calls; or all of them can be implemented in hardware form; some modules can be realized by processing component calling software, and some modules are realized by hardware.
  • the x module may be a separately set processing element, or may be integrated in one of the above-mentioned devices, or may be stored in the memory of the above device in the form of program code, by a processing element of the above device. Call and execute the functions of the above x modules.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated or implemented independently.
  • the processing element described herein can be an integrated circuit that has signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or an instruction in a form of software.
  • the above modules may be one or more integrated circuits configured to implement the above method, for example, one or more specific integrated circuits (ASICs), or one or more microprocessors (digitalsingnal processors, referred to as DSP), or one or more Field Programmable Gate Arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSP digital signal processors
  • FPGAs Field Programmable Gate Arrays
  • the processing component may be a general-purpose processor, such as a central processing unit (CPU) or other processor that can call the program code.
  • these modules can be integrated and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the storage medium of the present invention stores a computer program that, when executed by the processor, implements the compilation method of the artificial intelligence processing device.
  • the storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the terminal of the present invention includes a processor 31 and a memory 32.
  • the memory 32 is used to store a computer program.
  • the memory 32 includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the processor 31 is coupled to the memory 32 for executing a computer program stored by the memory 32 to cause the terminal to execute a compilation method of the artificial intelligence processing device.
  • the processor 32 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc., or a digital signal processor (DSP).
  • CPU central processing unit
  • NP network processor
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • other programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • the compiling method and system, the storage medium and the terminal of the artificial intelligence processing device of the present invention can be quickly implemented on the hardware by compiling the deep learning algorithm; the compilation efficiency is high and the practicability is strong. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种人工智能处理装置的编译方法及系统、存储介质及终端,包括以下步骤:基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图(S1);对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图(S2);基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置(S3);基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置(S4)。该人工智能处理装置的编译方法及系统、存储介质及终端通过对深度学习算法进行编译,使其能够在硬件上快速实现。

Description

人工智能处理装置的编译方法及系统、存储介质及终端 技术领域
本发明涉及软件处理的技术领域,特别是涉及一种人工智能处理装置的编译方法及系统、存储介质及终端。
背景技术
深度学习的概念源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。
深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示,如每个像素强度值的向量,或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如,人脸识别或面部表情识别)。深度学习的好处是用非监督式或半监督式的特征学习和分层特征提取高效算法来替代手工获取特征。
同机器学习方法一样,深度机器学习方法也有监督学习与无监督学习之分.不同的学习框架下建立的学习模型很是不同.例如,卷积神经网络(Convolutional neural networks,CNN)就是一种深度的监督学习下的机器学习模型,而深度置信网(Deep Belief Nets,DBN)就是一种无监督学习下的机器学习模型。
目前,CNN已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。一般地,CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。
CNN主要用来识别位移、缩放及其他形式扭曲不变性的二维图形。由于CNN的特征检测层通过训练数据进行学习,所以在使用CNN时,避免了显示的特征抽取,而隐式地从训 练数据中进行学习;再者由于同一特征映射面上的神经元权值相同,所以网络可以并行学习,这也是卷积网络相对于神经元彼此相连网络的一大优势。卷积神经网络以其局部权值共享的特殊结构在语音识别和图像处理方面有着独特的优越性,其布局更接近于实际的生物神经网络,权值共享降低了网络的复杂性,特别是多维输入向量的图像可以直接输入网络这一特点避免了特征提取和分类过程中数据重建的复杂度。
因此,如何实现深度学习算法的编译使其能够在硬件上实现成为当前的热点研究课题之一。
发明内容
鉴于以上所述现有技术的缺点,本发明的目的在于提供一种人工智能处理装置的编译方法及系统、存储介质及终端,通过对深度学习算法进行编译,使其能够在硬件上快速实现。
为实现上述目的及其他相关目的,本发明提供一种人工智能处理装置的编译方法,包括以下步骤:基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图;对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图;基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置;基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
于本发明一实施例中,基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩包括以下步骤:
对所述深度学习网络模型数据进行固化;
对固化后的所述深度学习网络模型数据进行量化;
根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
于本发明一实施例中,所述人工智能处理装置包括CPU和FPGA,所述可执行软件代码输入所述CPU,所述硬件比特流输入所述FPGA。
对应地,本发明提供一种人工智能处理装置的编译系统,包括精度压缩模块、图分析模块、代码生成模块和比特流生成模块;
所述精度压缩模块用于基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图;
所述图分析模块用于对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图;
所述代码生成模块用于基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置;
所述比特流生成模块用于基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
于本发明一实施例中,所述精度压缩模块基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩执行以下步骤:
对所述深度学习网络模型数据进行固化;
对固化后的所述深度学习网络模型数据进行量化;
根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
于本发明一实施例中,所述人工智能处理装置包括CPU和FPGA,所述可执行软件代码输入所述CPU,所述硬件比特流输入所述FPGA。
本发明提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述人工智能处理装置的编译方法。
最后,本发明提供一种终端,包括:处理器及存储器;
所述存储器用于存储计算机程序;
所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行上述人工智能处理装置的编译方法。
如上所述,本发明的人工智能处理装置的编译方法及系统、存储介质及终端,具有以下
有益效果:
(1)通过对深度学习算法进行编译,使其能够在硬件上快速实现;
(2)编译效率高,实用性强。
附图说明
图1显示为本发明的人工智能处理装置的编译方法于一实施例中的流程图;
图2显示为本发明的人工智能处理装置的编译系统于一实施例中的结果示意图;
图3显示为本发明的终端于一实施例中的结构示意图。
元件标号说明
21       精度压缩模块
22       图分析模块
23       代码生成模块
24       比特流生成模块
31       处理器
32       存储器
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
本发明的人工智能处理装置的编译方法及系统、存储介质及终端通过对深度学习算法进行编译,使其能够在人工智能处理装置上快速实现,从而充分利用的人工智能处理装置的计算速度快等优势。于本发明一实施例中,所述人工智能处理装置包括CPU和FPGA,其中,CPU用于运行可执行软件代码,FPGA用于运行硬件比特流,以完成CNN等深度学习算法。
如图1所示,于一实施例中,本发明的人工智能处理装置的编译方法包括以下步骤:
步骤S1、基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图。
具体地,根据人工智能处理装置的识别准确率,需要对深度学习网络模型数据进行精度压缩,以适配人工智能处理装置。经过精度压缩后的深度学习网络模型数据便为深度学习数据图。
于本发明一实施例中,基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩包括以下步骤:
11)对所述深度学习网络模型数据进行固化。
具体地,固化,即freeze,表示将深度学习网络模型的图结构和该模型的权重固化到一起。
12)对固化后的所述深度学习网络模型数据进行量化。
在数字信号处理领域,量化指将信号的连续取值(或者大量可能的离散取值)近似为有限多个(或较少的)离散值的过程。量化主要应用于从连续信号到数字信号的转换中。连续信号经过采样成为离散信号,离散信号经过量化即成为数字信号。注意离散信号通常情况下并不需要经过量化的过程,但可能在值域上并不离散,还是需要经过量化的过程。
具体地,本发明采用一定的量化算法对固化后的所述深度学习网络模型数据进行量化。对于本领域技术人员而言,量化属于成熟的现有技术,故在此不再赘述。
13)根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
具体地,将固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图,并输出。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
步骤S2、对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图。
具体地,通过对深度学习数据图进行图分析,首先生成硬件可兼容的图,再生成数据流图,然后对数据流图进行优化,最后输出得到符号协议定义的深度学习数据流图。
步骤S3、基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置。
具体地,对所述深度学习数据流图进行处理,使其与所述人工智能处理装置的软件资源相匹配,得到执行所述深度学习网络模型的软件驱动的相关参数,从而得到可执行软件代码,并输入所述人工智能处理装置的软件处理模块。
步骤S4、基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
具体地,对所述深度学习数据流图进行处理,使其与所述人工智能处理装置的硬件资源相匹配,得到能够在所述硬件资源上运行的硬件比特流,并输入所述人工智能处理装置的硬件处理模块。
优选地,所述硬件比特流通过流水线(pipeline)的方式输入所述人工智能处理装置的硬件处理模块,并能够依次被所述硬件处理模块所执行。例如,所述硬件处理模块用于执行CNN的卷积计算,所述硬件比特流通过pipeline的方式流入所述硬件处理模块,使得CNN的各个卷积层和全连接层均处于工作状态。
如图2所示,于一实施例中,本发明的人工智能处理装置的编译系统包括精度压缩模块21、图分析模块22、代码生成模块23和比特流生成模块24。
精度压缩模块21用于基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图。
具体地,根据人工智能处理装置的识别准确率,需要对深度学习网络模型数据进行精度压缩,以适配人工智能处理装置。经过精度压缩后的深度学习网络模型数据便为深度学习数据图。
于本发明一实施例中,精度压缩模块21基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩执行以下步骤:
11)对所述深度学习网络模型数据进行固化。
具体地,固化,即freeze,表示将深度学习网络模型的图结构和该模型的权重固化到一起。
12)对固化后的所述深度学习网络模型数据进行量化。
在数字信号处理领域,量化指将信号的连续取值(或者大量可能的离散取值)近似为有限多个(或较少的)离散值的过程。量化主要应用于从连续信号到数字信号的转换中。连续信号经过采样成为离散信号,离散信号经过量化即成为数字信号。注意离散信号通常情况下并不需要经过量化的过程,但可能在值域上并不离散,还是需要经过量化的过程。
具体地,本发明采用一定的量化算法对固化后的所述深度学习网络模型数据进行量化。对于本领域技术人员而言,量化属于成熟的现有技术,故在此不再赘述。
13)根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
具体地,将固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图,并输出。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
图分析模块22与精度压缩模块21相连,用于对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图。
具体地,通过对深度学习数据图进行图分析,首先生成硬件可兼容的图,再生成数据流图,然后对数据流图进行优化,最后输出得到符号协议定义的深度学习数据流图。
代码生成模块23与图分析模块22相连,用于基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置。
具体地,对所述深度学习数据流图进行处理,使其与所述人工智能处理装置的软件资源相匹配,得到执行所述深度学习网络模型的软件驱动的相关参数,从而得到可执行软件代码,并输入所述人工智能处理装置的软件处理模块。
比特流生成模块24与图分析模块22相连,用于基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
具体地,对所述深度学习数据流图进行处理,使其与所述人工智能处理装置的硬件资源相匹配,得到能够在所述硬件资源上运行的硬件比特流,并输入所述人工智能处理装置的硬件处理模块。
优选地,所述硬件比特流通过流水线(pipeline)的方式输入所述人工智能处理装置的硬件处理模块,并能够依次被所述硬件处理模块所执行。例如,所述硬件处理模块用于执行CNN的卷积计算,所述硬件比特流通过pipeline的方式流入所述硬件处理模块,使得CNN的各个卷积层和全连接层均处于工作状态。
需要说明的是,应理解以上系统的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,x模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上x模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里 所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC),或,一个或多个微处理器(digitalsingnalprocessor,简称DSP),或,一个或者多个现场可编程门阵列(FieldProgrammableGateArray,简称FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(CentralProcessingUnit,简称CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。
本发明的存储介质上存储有计算机程序,该程序被处理器执行时实现上述人工智能处理装置的编译方法。优选地,所述存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
如图3所示,于一实施例中,本发明的终端包括处理器31及存储器32。
所述存储器32用于存储计算机程序。
优选地,所述存储器32包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
所述处理器31与所述存储器32相连,用于执行所述存储器32存储的计算机程序,以使所述终端执行上述人工智能处理装置的编译方法。
优选地,所述处理器32可以是通用处理器,包括中央处理器(CentralProcessingUnit,简称CPU)、网络处理器(NetworkProcessor,简称NP)等;还可以是数字信号处理器(DigitalSignalProcessing,简称DSP)、专用集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC)、现场可编程门阵列(Field-ProgrammableGateArray,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
综上所述,本发明的人工智能处理装置的编译方法及系统、存储介质及终端通过对深度学习算法进行编译,使其能够在硬件上快速实现;编译效率高,实用性强。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等 效修饰或改变,仍应由本发明的权利要求所涵盖。

Claims (10)

  1. 一种人工智能处理装置的编译方法,其特征在于,包括以下步骤:
    基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图;
    对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图;
    基于所述深度学习数据流图生成可执行软件代码,并将所述可执行软件代码输入所述人工智能处理装置;
    基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
  2. 根据权利要求1所述的人工智能处理装置的编译方法,其特征在于,基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩包括以下步骤:
    对所述深度学习网络模型数据进行固化;
    对固化后的所述深度学习网络模型数据进行量化;
    根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
  3. 根据权利要求1所述的人工智能处理装置的编译方法,其特征在于,所述深度学习网络模型采用Tensorflow训练模型。
  4. 根据权利要求1所述的人工智能处理装置的编译方法,其特征在于,所述人工智能处理装置包括CPU和FPGA,所述可执行软件代码输入所述CPU,所述硬件比特流输入所述FPGA。
  5. 一种人工智能处理装置的编译系统,其特征在于,包括精度压缩模块、图分析模块、代码生成模块和比特流生成模块;
    所述精度压缩模块用于基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩,以得到深度学习数据图;
    所述图分析模块用于对所述深度学习数据图进行图分析,以得到符合协议定义的深度学习数据流图;
    所述代码生成模块用于基于所述深度学习数据流图生成可执行软件代码,并将所述 可执行软件代码输入所述人工智能处理装置;
    所述比特流生成模块用于基于所述深度学习数据流图生成硬件比特流,并将所述硬件比特流输入所述人工智能处理装置。
  6. 根据权利要求5所述的人工智能处理装置的编译系统,其特征在于,所述精度压缩模块基于人工智能处理装置的识别准确率对深度学习网络模型数据进行精度压缩执行以下步骤:
    对所述深度学习网络模型数据进行固化;
    对固化后的所述深度学习网络模型数据进行量化;
    根据固化后的所述深度学习网络模型数据和量化后的所述深度学习网络模型数据生成深度学习数据图。
  7. 根据权利要求5所述的人工智能处理装置的编译系统,其特征在于,所述深度学习网络模型采用Tensorflow训练模型。
  8. 根据权利要求5所述的人工智能处理装置的编译系统,其特征在于,所述人工智能处理装置包括CPU和FPGA,所述可执行软件代码输入所述CPU,所述硬件比特流输入所述FPGA。
  9. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至4中任一项所述人工智能处理装置的编译方法。
  10. 一种终端,其特征在于,包括:处理器及存储器;
    所述存储器用于存储计算机程序;
    所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行权利要求1至4中任一项所述人工智能处理装置的编译方法。
PCT/CN2018/072667 2018-01-15 2018-01-15 人工智能处理装置的编译方法及系统、存储介质及终端 WO2019136754A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/072667 WO2019136754A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置的编译方法及系统、存储介质及终端
CN201880002764.0A CN109496294A (zh) 2018-01-15 2018-01-15 人工智能处理装置的编译方法及系统、存储介质及终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072667 WO2019136754A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置的编译方法及系统、存储介质及终端

Publications (1)

Publication Number Publication Date
WO2019136754A1 true WO2019136754A1 (zh) 2019-07-18

Family

ID=65713888

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072667 WO2019136754A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置的编译方法及系统、存储介质及终端

Country Status (2)

Country Link
CN (1) CN109496294A (zh)
WO (1) WO2019136754A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10942710B1 (en) 2019-09-24 2021-03-09 Rockwell Automation Technologies, Inc. Industrial automation domain-specific language programming paradigm
CN112559315A (zh) * 2019-09-26 2021-03-26 罗克韦尔自动化技术公司 用于自动化对象的测试框架
EP3798948A1 (en) * 2019-09-26 2021-03-31 Rockwell Automation Technologies, Inc. Ai design analysis and recommendations
US11048483B2 (en) 2019-09-24 2021-06-29 Rockwell Automation Technologies, Inc. Industrial programming development with an extensible integrated development environment (IDE) platform
US11163536B2 (en) 2019-09-26 2021-11-02 Rockwell Automation Technologies, Inc. Maintenance and commissioning
US11308447B2 (en) 2020-04-02 2022-04-19 Rockwell Automation Technologies, Inc. Cloud-based collaborative industrial automation design environment
US11392112B2 (en) 2019-09-26 2022-07-19 Rockwell Automation Technologies, Inc. Virtual design environment
US11733687B2 (en) 2019-09-26 2023-08-22 Rockwell Automation Technologies, Inc. Collaboration tools

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021000638A1 (zh) * 2019-07-03 2021-01-07 上海寒武纪信息科技有限公司 深度学习算法的编译方法、装置及相关产品
CN110598855B (zh) * 2019-09-23 2023-06-09 Oppo广东移动通信有限公司 深度学习模型生成方法、装置、设备及存储介质
CN110908667B (zh) * 2019-11-18 2021-11-16 北京迈格威科技有限公司 神经网络联合编译的方法、装置和电子设备
CN111752709B (zh) * 2020-06-22 2024-04-30 深圳鲲云信息科技有限公司 Ai计算配置方法、装置、设备及存储介质
CN115495093B (zh) * 2022-11-07 2023-07-21 深圳鲲云信息科技有限公司 一种混合编译方法、装置、电子设备和存储介质
CN116011544B (zh) * 2022-12-31 2024-03-05 安徽先数科技有限公司 一种基于离散向量的深度学习系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679863A (zh) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 一种基于深度学习的以图搜图方法和系统
CN106227851A (zh) * 2016-07-29 2016-12-14 汤平 基于深度卷积神经网络端对端的通过分层深度搜索的图像检索方法
CN107018422A (zh) * 2017-04-27 2017-08-04 四川大学 基于深度卷积神经网络的静止图像压缩方法
CN107239829A (zh) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 一种优化人工神经网络的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858263B2 (en) * 2016-05-05 2018-01-02 Conduent Business Services, Llc Semantic parsing using deep neural networks for predicting canonical forms
CN105956660A (zh) * 2016-05-16 2016-09-21 浪潮集团有限公司 一种用于实时图像识别的神经元网络芯片实现方法
CN106713929B (zh) * 2017-02-16 2019-06-28 清华大学深圳研究生院 一种基于深度神经网络的视频帧间预测增强方法
CN107239315B (zh) * 2017-04-11 2019-11-15 赛灵思公司 面向神经网络异构计算平台的编程模型

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679863A (zh) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 一种基于深度学习的以图搜图方法和系统
CN106227851A (zh) * 2016-07-29 2016-12-14 汤平 基于深度卷积神经网络端对端的通过分层深度搜索的图像检索方法
CN107239829A (zh) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 一种优化人工神经网络的方法
CN107018422A (zh) * 2017-04-27 2017-08-04 四川大学 基于深度卷积神经网络的静止图像压缩方法

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11269598B2 (en) 2019-09-24 2022-03-08 Rockwell Automation Technologies, Inc. Industrial automation domain-specific language programming paradigm
US12001818B2 (en) 2019-09-24 2024-06-04 Rockwell Automation Technologies, Inc. Extensible IDE platform with open APIs
US11681502B2 (en) 2019-09-24 2023-06-20 Rockwell Automation Technologies, Inc. Industrial automation domain-specific language programming paradigm
US11669309B2 (en) 2019-09-24 2023-06-06 Rockwell Automation Technologies, Inc. Extensible integrated development environment (IDE) platform with open application programming interfaces (APIs)
US11048483B2 (en) 2019-09-24 2021-06-29 Rockwell Automation Technologies, Inc. Industrial programming development with an extensible integrated development environment (IDE) platform
US10942710B1 (en) 2019-09-24 2021-03-09 Rockwell Automation Technologies, Inc. Industrial automation domain-specific language programming paradigm
US11481313B2 (en) 2019-09-26 2022-10-25 Rockwell Automation Technologies, Inc. Testing framework for automation objects
US11042362B2 (en) 2019-09-26 2021-06-22 Rockwell Automation Technologies, Inc. Industrial programming development with a trained analytic model
CN112559315A (zh) * 2019-09-26 2021-03-26 罗克韦尔自动化技术公司 用于自动化对象的测试框架
US11392112B2 (en) 2019-09-26 2022-07-19 Rockwell Automation Technologies, Inc. Virtual design environment
US11080176B2 (en) 2019-09-26 2021-08-03 Rockwell Automation Technologies, Inc. Testing framework for automation objects
US11640566B2 (en) 2019-09-26 2023-05-02 Rockwell Automation Technologies, Inc. Industrial programming development with a converted industrial control program
CN112559315B (zh) * 2019-09-26 2024-01-09 罗克韦尔自动化技术公司 用于自动化对象的测试框架
US11163536B2 (en) 2019-09-26 2021-11-02 Rockwell Automation Technologies, Inc. Maintenance and commissioning
EP3798948A1 (en) * 2019-09-26 2021-03-31 Rockwell Automation Technologies, Inc. Ai design analysis and recommendations
US11733687B2 (en) 2019-09-26 2023-08-22 Rockwell Automation Technologies, Inc. Collaboration tools
US11822906B2 (en) 2019-09-26 2023-11-21 Rockwell Automation Technologies, Inc. Industrial programming development with a converted industrial control program
US11829121B2 (en) 2019-09-26 2023-11-28 Rockwell Automation Technologies, Inc. Virtual design environment
US11663553B2 (en) 2020-04-02 2023-05-30 Rockwell Automation Technologies, Inc. Cloud-based collaborative industrial automation design environment
US11308447B2 (en) 2020-04-02 2022-04-19 Rockwell Automation Technologies, Inc. Cloud-based collaborative industrial automation design environment

Also Published As

Publication number Publication date
CN109496294A (zh) 2019-03-19

Similar Documents

Publication Publication Date Title
WO2019136754A1 (zh) 人工智能处理装置的编译方法及系统、存储介质及终端
Deng et al. Vector neurons: A general framework for so (3)-equivariant networks
CN109949255B (zh) 图像重建方法及设备
US10949737B2 (en) Method for neural network and apparatus performing same method
WO2019136758A1 (zh) 人工智能处理装置硬件优化方法、系统、存储介质、终端
Liu et al. Real-time marine animal images classification by embedded system based on mobilenet and transfer learning
Liu et al. Fg-net: A fast and accurate framework for large-scale lidar point cloud understanding
WO2019136756A1 (zh) 人工智能处理装置设计模型建立方法、系统、存储介质、终端
US20210089955A1 (en) Quantum inspired convolutional kernels for convolutional neural networks
CN113011568B (zh) 一种模型的训练方法、数据处理方法及设备
WO2020061884A1 (en) Composite binary decomposition network
TW201633181A (zh) 用於經非同步脈衝調制的取樣信號的事件驅動型時間迴旋
WO2022012668A1 (zh) 一种训练集处理方法和装置
US11119507B2 (en) Hardware accelerator for online estimation
Duggal et al. Shallow SqueezeNext: An Efficient & Shallow DNN
US11429771B2 (en) Hardware-implemented argmax layer
Song et al. DSACNN: Dynamically local self-attention CNN for 3D point cloud analysis
CN116977265A (zh) 缺陷检测模型的训练方法、装置、计算机设备和存储介质
WO2023059723A1 (en) Model compression via quantized sparse principal component analysis
Peng et al. Mwformer: mesh understanding with window-based transformer
WO2019136755A1 (zh) 人工智能处理装置设计模型优化方法、系统、存储介质、终端
Han et al. Deltaframe-bp: An algorithm using frame difference for deep convolutional neural networks training and inference on video data
O’Mahony et al. Convolutional Neural Networks for 3D Vision System Data: A review
Wang et al. Acceleration and implementation of convolutional neural network based on FPGA
US20230186487A1 (en) Vectorized bilinear shift for replacing grid sampling in optical flow estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18899323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 08/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18899323

Country of ref document: EP

Kind code of ref document: A1