WO2019136756A1 - 人工智能处理装置设计模型建立方法、系统、存储介质、终端 - Google Patents

人工智能处理装置设计模型建立方法、系统、存储介质、终端 Download PDF

Info

Publication number
WO2019136756A1
WO2019136756A1 PCT/CN2018/072669 CN2018072669W WO2019136756A1 WO 2019136756 A1 WO2019136756 A1 WO 2019136756A1 CN 2018072669 W CN2018072669 W CN 2018072669W WO 2019136756 A1 WO2019136756 A1 WO 2019136756A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing device
artificial intelligence
data flow
flow graph
intelligence processing
Prior art date
Application number
PCT/CN2018/072669
Other languages
English (en)
French (fr)
Inventor
肖梦秋
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to CN201880002758.5A priority Critical patent/CN109643336A/zh
Priority to PCT/CN2018/072669 priority patent/WO2019136756A1/zh
Publication of WO2019136756A1 publication Critical patent/WO2019136756A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the technical field of software processing, and in particular, to a method, system, storage medium and terminal for establishing an artificial intelligence processing device design model.
  • Deep learning stems from the study of artificial neural networks.
  • a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • Deep learning is a method based on the representation of data in machine learning. Observations (e.g., an image) can be represented in a variety of ways, such as a vector of each pixel intensity value, or more abstractly represented as a series of edges, regions of a particular shape, and the like. It is easier to learn tasks from instances (eg, face recognition or facial expression recognition) using some specific representation methods.
  • the advantage of deep learning is the use of unsupervised or semi-supervised feature learning and hierarchical feature extraction efficient algorithms instead of manual acquisition features.
  • CNN Convolutional Neural Networks
  • DBN Deep Belief Nets
  • CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification. Since the network avoids the complicated pre-processing of images, it can directly input the original image, and thus has been widely used.
  • the basic structure of the CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance. In addition, since the neurons on one mapping surface share weights, the number of network free parameters is reduced.
  • Each convolutional layer in the convolutional neural network is followed by a computational layer for local averaging and quadratic extraction. This unique two-feature extraction structure reduces feature resolution.
  • CNN is mainly used to identify two-dimensional graphics of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of the CNN learns through the training data, when the CNN is used, the feature extraction of the display is avoided, and the learning data is implicitly learned; and the weights of the neurons on the same feature mapping surface are the same. So the network can learn in parallel, which is also a big advantage of the convolutional network relative to the neural network connected to each other.
  • the convolutional neural network has unique advantages in speech recognition and image processing with its special structure of local weight sharing. Its layout is closer to the actual biological neural network, and weight sharing reduces the complexity of the network, especially multidimensional.
  • the feature that the input vector image can be directly input into the network avoids the complexity of data reconstruction during feature extraction and classification.
  • an object of the present invention is to provide a method, system, storage medium and terminal for establishing an artificial intelligence processing device design model, which can be analyzed by analyzing a data graph of a deep learning algorithm. Works effectively on software and hardware.
  • the present invention provides a method for establishing an artificial intelligence processing device design model, comprising the steps of: generating a hardware adaptation compatible with an artificial intelligence processing device based on a deep learning data map of a deep learning network model; Generating a data flow graph of the deep learning network model based on the hardware adaptation map; optimizing the data flow graph; and outputting the optimized data flow graph according to a protocol definition.
  • the deep learning network model adopts a Tensorflow training model.
  • the artificial intelligence processing device includes an FPGA, and the hardware adaptation map is compatible with the FPGA.
  • optimizing the data flow graph includes the following steps:
  • the present invention provides an artificial intelligence processing device design model establishing system, including a first generating module, a second generating module, an optimizing module, and an output module;
  • the first generation module is configured to generate a hardware adaptation map compatible with the artificial intelligence processing device based on the deep learning data map of the deep learning network model;
  • the second generating module is configured to generate a data flow graph of the deep learning network model based on the hardware adaptation map;
  • the optimization module is configured to optimize the data flow graph
  • the output module is configured to output the optimized data flow graph according to a protocol definition.
  • the deep learning network model adopts a Tensorflow training model.
  • the artificial intelligence processing device includes an FPGA, and the hardware adaptation map is compatible with the FPGA.
  • the optimization module optimizes the data flow graph to perform the following steps:
  • the present invention provides a storage medium on which a computer program is stored, which is implemented by a processor to implement the above-described artificial intelligence processing device design model establishing method.
  • the present invention provides a terminal, including: a processor and a memory;
  • the memory is for storing a computer program
  • the processor is configured to execute the computer program stored in the memory to enable the terminal to execute the artificial intelligence processing device design model establishing method.
  • the artificial intelligence processing device design model establishing method, system, storage medium, and terminal of the present invention have the following beneficial effects:
  • FIG. 1 is a flow chart showing an embodiment of a method for establishing a design pattern of an artificial intelligence processing device according to the present invention
  • FIG. 2 is a schematic structural view showing an artificial intelligence processing device design model establishing system according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention in an embodiment.
  • the artificial intelligence processing device design model establishing method, system, storage medium and terminal of the invention analyze the data graph of the deep learning algorithm to enable it to operate effectively in software and hardware, thereby ensuring the stability of the artificial intelligence processing device. Run effectively.
  • the artificial intelligence processing device design model establishing method of the present invention includes the following steps:
  • Step S1 Generate a hardware adaptation map compatible with the artificial intelligence processing device based on the deep learning data map of the deep learning network model.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • the deep learning data map of the deep learning network model is converted into a hardware adaptation map compatible with the artificial intelligence processing device according to the hardware parameters of the artificial intelligence processing device.
  • the artificial intelligence processing device includes an FPGA, and the hardware adaptation map is compatible with the FPGA.
  • Step S2 Generate a data flow graph of the deep learning network model based on the hardware adaptation map.
  • the hardware adaptation map is processed to obtain a data flow graph of the deep learning network model.
  • Data Flow Graph is a graphical representation of the logical functions of the system, the logical flow of data within the system and the logical transformation process from the perspective of data transfer and processing. It is the main method of structured system analysis.
  • An expression tool and a graphical method for representing a software model According to the hierarchical data flow graph, it is divided into a top data flow graph, a middle data flow graph, and an underlying data flow graph. In addition to the top-level data flow graph, other data flow graphs are numbered starting from zero.
  • the top-level data flow graph contains only one processing representation of the entire system; the output data stream and the input data stream are the system's input and output data, indicating the scope of the system and the data exchange relationship with the external environment.
  • the middle-level data flow graph is a refinement of a certain processing in the parent layer data flow graph, and one of its processing can be refined again to form a sub-graph; the number of intermediate levels is generally determined by the complexity of the system.
  • the underlying data flow graph refers to a data flow graph whose processing can no longer be decomposed, and its processing is called "atomic processing".
  • Step S3 optimizing the data flow graph.
  • the data input to the hardware processing module can sequentially perform corresponding hardware operations without waiting, thereby reusing the hardware resources of the hardware processing module without causing additional resource waste.
  • optimizing the data flow graph includes the following steps:
  • the coefficients of each data block are input into the hardware processing module by means of a pipeline, and the corresponding hardware operations can be sequentially performed to implement the pipeline operation.
  • the convolutional neural network according to the present invention may adopt a standard convolution or a depth separable convolution, and may be selected according to actual usage scenarios.
  • Step S4 Output the optimized data flow graph according to the protocol definition.
  • the optimized data flow graph is output to a corresponding software processing module and a hardware processing module to complete the function of the artificial intelligence processing device.
  • the artificial intelligence processing device design model establishing system of the present invention includes a first generating module 21, a second generating module 22, an optimizing module 23, and an output module 24.
  • the first generation module 21 is configured to generate a hardware adaptation map compatible with the artificial intelligence processing device based on the deep learning data map of the deep learning network model.
  • the deep learning network model adopts a Tensorflow training model.
  • Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle.
  • Tensor means an N-dimensional array.
  • Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other.
  • Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.
  • the deep learning data map of the deep learning network model is converted into a hardware adaptation map compatible with the artificial intelligence processing device according to the hardware parameters of the artificial intelligence processing device.
  • the artificial intelligence processing device includes an FPGA, and the hardware adaptation map is compatible with the FPGA.
  • the second generation module 22 is connected to the first generation module 21, and is configured to generate a data flow graph of the deep learning network model based on the hardware adaptation map.
  • the hardware adaptation map is processed to obtain a data flow graph of the deep learning network model.
  • Data Flow Graph is a graphical representation of the logical functions of the system, the logical flow of data within the system and the logical transformation process from the perspective of data transfer and processing. It is the main method of structured system analysis.
  • An expression tool and a graphical method for representing a software model According to the hierarchical data flow graph, it is divided into a top data flow graph, a middle data flow graph, and an underlying data flow graph. In addition to the top-level data flow graph, other data flow graphs are numbered starting from zero.
  • the top-level data flow graph contains only one processing representation of the entire system; the output data stream and the input data stream are the system's input and output data, indicating the scope of the system and the data exchange relationship with the external environment.
  • the middle-level data flow graph is a refinement of a certain processing in the parent layer data flow graph, and one of its processing can be refined again to form a sub-graph; the number of intermediate levels is generally determined by the complexity of the system.
  • the underlying data flow graph refers to a data flow graph whose processing can no longer be decomposed, and its processing is called "atomic processing".
  • the optimization module 23 is connected to the second generation module 22 for optimizing the data flow graph.
  • the data input to the hardware processing module can sequentially perform corresponding hardware operations without waiting, thereby reusing the hardware resources of the hardware processing module without causing additional resource waste.
  • the optimization module 23 optimizes the data flow graph to perform the following steps:
  • the coefficients of each data block are input into the hardware processing module by means of a pipeline, and the corresponding hardware operations can be sequentially performed to implement the pipeline operation.
  • the convolutional neural network according to the present invention may adopt a standard convolution or a depth separable convolution, and may be selected according to actual usage scenarios.
  • the output module 24 is coupled to the optimization module 23 for outputting the optimized data flow graph according to a protocol definition.
  • the optimized data flow graph is output to a corresponding software processing module and a hardware processing module to complete the function of the artificial intelligence processing device.
  • each module of the above system is only a division of logical functions, and the actual implementation may be integrated into one physical entity in whole or in part, or may be physically separated.
  • these modules can all be implemented by software in the form of processing component calls; or all of them can be implemented in hardware form; some modules can be realized by processing component calling software, and some modules are realized by hardware.
  • the x module may be a separately set processing element, or may be integrated in one of the above-mentioned devices, or may be stored in the memory of the above device in the form of program code, by a processing element of the above device. Call and execute the functions of the above x modules.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated or implemented independently.
  • the processing elements described herein can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or an instruction in a form of software.
  • the above modules may be one or more integrated circuits configured to implement the above method, for example, one or more specific integrated circuits (ASICs), or one or more microprocessors (digitalsingnal processors, referred to as DSP), or one or more Field Programmable Gate Arrays (FPGAs).
  • ASICs application specific integrated circuits
  • DSP digital signal processors
  • FPGAs Field Programmable Gate Arrays
  • the processing component may be a general-purpose processor, such as a central processing unit (CPU) or other processor that can call the program code.
  • these modules can be integrated and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the storage medium of the present invention stores a computer program, and when the program is executed by the processor, the artificial intelligence processing device design model establishing method is implemented.
  • the storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the terminal of the present invention includes a processor 31 and a memory 32.
  • the memory 32 is used to store a computer program.
  • the memory 32 includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the processor 31 is connected to the memory 32 for executing a computer program stored by the memory 32 to enable the terminal to execute the artificial intelligence processing device design model establishing method.
  • the processor 31 may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), dedicated integration.
  • CPU central processing unit
  • NP Network Processor
  • DSP digital signal processor
  • Circuit ApplicationSpecific Integrated Circuit, ASIC for short
  • FPGA Field-Programmable Gate Array
  • FPGA field-Programmable Gate Array
  • the artificial intelligence processing device design model establishing method, system, storage medium, and terminal of the present invention perform graph analysis on the data graph of the deep learning algorithm, so that it can effectively run on software and hardware; the obtained deep learning
  • the data flow diagram conforms to the protocol definition and is highly practical. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种人工智能处理装置设计模型建立方法、系统、存储介质、终端,该方法包括以下步骤:基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图(S1);基于所述硬件适配图生成所述深度学习网络模型的数据流图(S2);对所述数据流图进行优化(S3);将优化后的数据流图根据协议定义输出(S4)。该人工智能处理装置设计模型建立方法、系统、存储介质、终端通过对深度学习算法的数据图进行图分析,使其能够在软件和硬件上有效运行。

Description

人工智能处理装置设计模型建立方法、系统、存储介质、终端 技术领域
本发明涉及软件处理的技术领域,特别是涉及一种人工智能处理装置设计模型建立方法、系统、存储介质、终端。
背景技术
深度学习的概念源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。
深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示,如每个像素强度值的向量,或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如,人脸识别或面部表情识别)。深度学习的好处是用非监督式或半监督式的特征学习和分层特征提取高效算法来替代手工获取特征。
同机器学习方法一样,深度机器学习方法也有监督学习与无监督学习之分.不同的学习框架下建立的学习模型很是不同.例如,卷积神经网络(Convolutional neural networks,CNN)就是一种深度的监督学习下的机器学习模型,而深度置信网(Deep Belief Nets,DBN)就是一种无监督学习下的机器学习模型。
目前,CNN已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。一般地,CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个卷积层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。
CNN主要用来识别位移、缩放及其他形式扭曲不变性的二维图形。由于CNN的特征检测层通过训练数据进行学习,所以在使用CNN时,避免了显示的特征抽取,而隐式地从训 练数据中进行学习;再者由于同一特征映射面上的神经元权值相同,所以网络可以并行学习,这也是卷积网络相对于神经元彼此相连网络的一大优势。卷积神经网络以其局部权值共享的特殊结构在语音识别和图像处理方面有着独特的优越性,其布局更接近于实际的生物神经网络,权值共享降低了网络的复杂性,特别是多维输入向量的图像可以直接输入网络这一特点避免了特征提取和分类过程中数据重建的复杂度。
因此,如何实现深度学习算法的数据流分析使其能够在软件和硬件上有效运行成为当前的热点研究课题之一。
发明内容
鉴于以上所述现有技术的缺点,本发明的目的在于提供一种人工智能处理装置设计模型建立方法、系统、存储介质、终端,通过对深度学习算法的数据图进行图分析,使其能够在软件和硬件上有效运行。
为实现上述目的及其他相关目的,本发明提供一种人工智能处理装置设计模型建立方法,包括以下步骤:基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图;基于所述硬件适配图生成所述深度学习网络模型的数据流图;对所述数据流图进行优化;将优化后的数据流图根据协议定义输出。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
于本发明一实施例中,当所述深度学习网络模型采用卷积神经网络模型时,对所述数据流图进行优化包括以下步骤:
检测所述数据流图中每个数据块的类型;
确定所述卷积神经网络模型中每个固定层的系数;
获取所述数据流图中每个数据块的系数。
对应地,本发明提供一种人工智能处理装置设计模型建立系统,包括第一生成模块、第二生成模块、优化模块和输出模块;
所述第一生成模块用于基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图;
所述第二生成模块用于基于所述硬件适配图生成所述深度学习网络模型的数据流图;
所述优化模块用于对所述数据流图进行优化;
所述输出模块用于将优化后的数据流图根据协议定义输出。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
于本发明一实施例中,当所述深度学习网络模型采用卷积神经网络模型时,所述优化模块对所述数据流图进行优化执行以下步骤:
检测所述数据流图中每个数据块的类型;
确定所述卷积神经网络模型中每个固定层的系数;
获取所述数据流图中每个数据块的系数。
本发明提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述人工智能处理装置设计模型建立方法。
最后,本发明提供一种终端,包括:处理器及存储器;
所述存储器用于存储计算机程序;
所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行上述人工智能处理装置设计模型建立方法。
如上所述,本发明的人工智能处理装置设计模型建立方法、系统、存储介质、终端,具有以下有益效果:
(1)通过对深度学习算法的数据图进行图分析,使其能够在软件和硬件上有效运行;
(2)得到的深度学习数据流图符合协议定义,实用性强。
附图说明
图1显示为本发明的人工智能处理装置设计模型建立方法于一实施例中的流程图;
图2显示为本发明的人工智能处理装置设计模型建立系统于一实施例中的结构示意图;
图3显示为本发明的终端于一实施例终端于一实施例中的结构示意图。
元件标号说明
21                     第一生成模块
22                     第二生成模块
23                     优化模块
24                     输出模块
31                     处理器
32                     存储器
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。
需要说明的是,本实施例中所提供的图示仅以示意方式说明本发明的基本构想,遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
本发明的人工智能处理装置设计模型建立方法、系统、存储介质、终端通过对深度学习算法的数据图进行图分析,使其能够在软件和硬件上有效运行,从而保证了人工智能处理装置的稳定有效运行。
如图1所示,于一实施例中,本发明的人工智能处理装置设计模型建立方法包括以下步骤:
步骤S1、基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
具体地,根据人工智能处理装置的硬件参数,将深度学习网络模型的深度学习数据图转换为与人工智能处理装置相兼容的硬件适配图。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
步骤S2、基于所述硬件适配图生成所述深度学习网络模型的数据流图。
具体地,对所述硬件适配图进行处理,从而获得所述深度学习网络模型的数据流图。其 中,数据流图(Data Flow Graph,DFG)是从数据传递和加工角度,以图形方式来表达系统的逻辑功能、数据在系统内部的逻辑流向和逻辑变换过程,是结构化系统分析方法的主要表达工具及用于表示软件模型的一种图示方法。根据层级数据流图分为顶层数据流图、中层数据流图和底层数据流图。除顶层数据流图外,其他数据流图从零开始编号。顶层数据流图只含有一个加工表示整个系统;输出数据流和输入数据流为系统的输入数据和输出数据,表明系统的范围,以及与外部环境的数据交换关系。中层数据流图是对父层数据流图中某个加工进行细化,而它的某个加工也可以再次细化,形成子图;中间层次的多少,一般视系统的复杂程度而定。底层数据流图是指其加工不能再分解的数据流图,其加工称为“原子加工”。
对于本领域技术人员而言,数据流图的生成方法是成熟的现有技术,故在此不再赘述。
步骤S3、对所述数据流图进行优化。
具体地,通过对所述数据流图进行优化,使得输入硬件处理模块的数据能够依次执行相应的硬件操作,无需等待,从而重复利用硬件处理模块的硬件资源,不造成额外的资源浪费。
于本发明一实施例中,当所述深度学习网络模型采用卷积神经网络模型时,对所述数据流图进行优化包括以下步骤:
31)检测所述数据流图中每个数据块的类型。
32)确定所述卷积神经网络模型中每个固定层的系数。
33)获取所述数据流图中每个数据块的系数。
因此,将每个数据块的系数采用流水线(pipeline)的方式输入硬件处理模块,即可依次执行相应的硬件操作,实现流水化操作。需要说明的是,本发明所涉及的卷积神经网络可以采用标准的卷积,也可以采用深度可分离卷积,具体根据实际使用场景来选择。
步骤S4、将优化后的数据流图根据协议定义输出。
具体地,根据预设的数据传输协议定义,将优化后的数据流图输出至相应的软件处理模块和硬件处理模块,以完成人工智能处理装置的功能。
如图2所示,于一实施例中,本发明的人工智能处理装置设计模型建立系统包括第一生成模块21、第二生成模块22、优化模块23和输出模块24。
第一生成模块21用于基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图。
于本发明一实施例中,所述深度学习网络模型采用Tensorflow训练模型。Tensorflow是谷歌基于DistBelief进行研发的第二代人工智能学习系统,其命名来源于本身的运行原理。 Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,Tensorflow为张量从流图的一端流动到另一端计算过程。Tensorflow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。
具体地,根据人工智能处理装置的硬件参数,将深度学习网络模型的深度学习数据图转换为与人工智能处理装置相兼容的硬件适配图。
于本发明一实施例中,所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
第二生成模块22与第一生成模块21相连,用于基于所述硬件适配图生成所述深度学习网络模型的数据流图。
具体地,对所述硬件适配图进行处理,从而获得所述深度学习网络模型的数据流图。其中,数据流图(Data Flow Graph,DFG)是从数据传递和加工角度,以图形方式来表达系统的逻辑功能、数据在系统内部的逻辑流向和逻辑变换过程,是结构化系统分析方法的主要表达工具及用于表示软件模型的一种图示方法。根据层级数据流图分为顶层数据流图、中层数据流图和底层数据流图。除顶层数据流图外,其他数据流图从零开始编号。顶层数据流图只含有一个加工表示整个系统;输出数据流和输入数据流为系统的输入数据和输出数据,表明系统的范围,以及与外部环境的数据交换关系。中层数据流图是对父层数据流图中某个加工进行细化,而它的某个加工也可以再次细化,形成子图;中间层次的多少,一般视系统的复杂程度而定。底层数据流图是指其加工不能再分解的数据流图,其加工称为“原子加工”。
对于本领域技术人员而言,数据流图的生成方法是成熟的现有技术,故在此不再赘述。
优化模块23与第二生成模块22相连,用于对所述数据流图进行优化。
具体地,通过对所述数据流图进行优化,使得输入硬件处理模块的数据能够依次执行相应的硬件操作,无需等待,从而重复利用硬件处理模块的硬件资源,不造成额外的资源浪费。
于本发明一实施例中,当所述深度学习网络模型采用卷积神经网络模型时,优化模块23对所述数据流图进行优化执行以下步骤:
31)检测所述数据流图中每个数据块的类型。
32)确定所述卷积神经网络模型中每个固定层的系数。
33)获取所述数据流图中每个数据块的系数。
因此,将每个数据块的系数采用流水线(pipeline)的方式输入硬件处理模块,即可依次执行相应的硬件操作,实现流水化操作。需要说明的是,本发明所涉及的卷积神经网络可以 采用标准的卷积,也可以采用深度可分离卷积,具体根据实际使用场景来选择。
输出模块24与优化模块23相连,用于将优化后的数据流图根据协议定义输出。
具体地,根据预设的数据传输协议定义,将优化后的数据流图输出至相应的软件处理模块和硬件处理模块,以完成人工智能处理装置的功能。
需要说明的是,应理解以上系统的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,x模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上x模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC),或,一个或多个微处理器(digitalsingnalprocessor,简称DSP),或,一个或者多个现场可编程门阵列(FieldProgrammableGateArray,简称FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(CentralProcessingUnit,简称CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。
本发明的存储介质上存储有计算机程序,该程序被处理器执行时实现上述人工智能处理装置设计模型建立方法。优选地,所述存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
如图4所示,于一实施例中,本发明的终端包括处理器31及存储器32。
所述存储器32用于存储计算机程序。
优选地,所述存储器32包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
所述处理器31与所述存储器32相连,用于执行所述存储器32存储的计算机程序,以使所述终端执行上述人工智能处理装置设计模型建立方法。
优选地,处理器31可以是通用处理器,包括中央处理器(CentralProcessingUnit,简称CPU)、网络处理器(NetworkProcessor,简称NP)等;还可以是数字信号处理器(DigitalSignalProcessing,简称DSP)、专用集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC)、现场可编程门阵列(Field-ProgrammableGateArray,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
综上所述,本发明的人工智能处理装置设计模型建立方法、系统、存储介质、终端通过对深度学习算法的数据图进行图分析,使其能够在软件和硬件上有效运行;得到的深度学习数据流图符合协议定义,实用性强。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本发明的权利要求所涵盖。

Claims (10)

  1. 一种人工智能处理装置设计模型建立方法,其特征在于:包括以下步骤:
    基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图;
    基于所述硬件适配图生成所述深度学习网络模型的数据流图;
    对所述数据流图进行优化;
    将优化后的数据流图根据协议定义输出。
  2. 根据权利要求1所述的人工智能处理装置设计模型建立方法,其特征在于:所述深度学习网络模型采用Tensorflow训练模型。
  3. 根据权利要求1所述的人工智能处理装置设计模型建立方法,其特征在于:所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
  4. 根据权利要求1所述的人工智能处理装置设计模型建立方法,其特征在于:当所述深度学习网络模型采用卷积神经网络模型时,对所述数据流图进行优化包括以下步骤:
    检测所述数据流图中每个数据块的类型;
    确定所述卷积神经网络模型中每个固定层的系数;
    获取所述数据流图中每个数据块的系数。
  5. 一种人工智能处理装置设计模型建立系统,其特征在于:包括第一生成模块、第二生成模块、优化模块和输出模块;
    所述第一生成模块用于基于深度学习网络模型的深度学习数据图,生成与人工智能处理装置相兼容的硬件适配图;
    所述第二生成模块用于基于所述硬件适配图生成所述深度学习网络模型的数据流图;
    所述优化模块用于对所述数据流图进行优化;
    所述输出模块用于将优化后的数据流图根据协议定义输出。
  6. 根据权利要求5所述的人工智能处理装置设计模型建立系统,其特征在于:所述深度学习网络模型采用Tensorflow训练模型。
  7. 根据权利要求5所述的人工智能处理装置设计模型建立系统,其特征在于:所述人工智能处理装置包括FPGA,所述硬件适配图与所述FPGA相兼容。
  8. 根据权利要求5所述的人工智能处理装置设计模型建立系统,其特征在于:当所述深度学习网络模型采用卷积神经网络模型时,所述优化模块对所述数据流图进行优化执行以下步骤:
    检测所述数据流图中每个数据块的类型;
    确定所述卷积神经网络模型中每个固定层的系数;
    获取所述数据流图中每个数据块的系数。
  9. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至4中任一项所述人工智能处理装置设计模型建立方法。
  10. 一种终端,其特征在于,包括:处理器及存储器;
    所述存储器用于存储计算机程序;
    所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行权利要求1至4中任一项所述人工智能处理装置设计模型建立方法。
PCT/CN2018/072669 2018-01-15 2018-01-15 人工智能处理装置设计模型建立方法、系统、存储介质、终端 WO2019136756A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880002758.5A CN109643336A (zh) 2018-01-15 2018-01-15 人工智能处理装置设计模型建立方法、系统、存储介质、终端
PCT/CN2018/072669 WO2019136756A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置设计模型建立方法、系统、存储介质、终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072669 WO2019136756A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置设计模型建立方法、系统、存储介质、终端

Publications (1)

Publication Number Publication Date
WO2019136756A1 true WO2019136756A1 (zh) 2019-07-18

Family

ID=66060200

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072669 WO2019136756A1 (zh) 2018-01-15 2018-01-15 人工智能处理装置设计模型建立方法、系统、存储介质、终端

Country Status (2)

Country Link
CN (1) CN109643336A (zh)
WO (1) WO2019136756A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880807A (zh) * 2020-07-31 2020-11-03 Oppo广东移动通信有限公司 深度学习编译方法、装置、设备及存储介质
CN113673039A (zh) * 2021-09-06 2021-11-19 江南造船(集团)有限责任公司 基于热环境仿真的船舶通风系统设计方法、系统、介质及终端

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114365148A (zh) * 2019-10-22 2022-04-15 深圳鲲云信息科技有限公司 神经网络运行系统和方法
CN110955530A (zh) * 2020-02-25 2020-04-03 深圳鲲云信息科技有限公司 深度学习引擎并行处理数据方法、装置、设备及储存介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (zh) * 2016-07-30 2016-12-14 复旦大学 基于fpga的深度卷积神经网络实现方法
CN107463990A (zh) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 一种卷积神经网络的fpga并行加速方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328644A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Adaptive selection of artificial neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463990A (zh) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 一种卷积神经网络的fpga并行加速方法
CN106228240A (zh) * 2016-07-30 2016-12-14 复旦大学 基于fpga的深度卷积神经网络实现方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HWANG, W. J. ET AL.: "An Efficient FPGA-Based Architecture for Convolutional Neural Networks", 2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP, 7 July 2017 (2017-07-07), pages 582 - 588, XP033232258 *
YU , ZIJIAN: "FPGA-based Accelerator for Convolutional Neural Network", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 July 2016 (2016-07-15), pages 6 - 38 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880807A (zh) * 2020-07-31 2020-11-03 Oppo广东移动通信有限公司 深度学习编译方法、装置、设备及存储介质
CN113673039A (zh) * 2021-09-06 2021-11-19 江南造船(集团)有限责任公司 基于热环境仿真的船舶通风系统设计方法、系统、介质及终端
CN113673039B (zh) * 2021-09-06 2023-12-12 江南造船(集团)有限责任公司 基于热环境仿真的船舶通风系统设计方法、系统、介质及终端

Also Published As

Publication number Publication date
CN109643336A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2019136754A1 (zh) 人工智能处理装置的编译方法及系统、存储介质及终端
Liu et al. Implementation of training convolutional neural networks
WO2019136758A1 (zh) 人工智能处理装置硬件优化方法、系统、存储介质、终端
WO2019136756A1 (zh) 人工智能处理装置设计模型建立方法、系统、存储介质、终端
WO2021159714A1 (zh) 一种数据处理方法及相关设备
US20220004935A1 (en) Ensemble learning for deep feature defect detection
CN111401406B (zh) 一种神经网络训练方法、视频帧处理方法以及相关设备
CN109783666B (zh) 一种基于迭代精细化的图像场景图谱生成方法
Liu et al. Real-time marine animal images classification by embedded system based on mobilenet and transfer learning
JP7196218B2 (ja) 画像質問応答方法、装置、コンピュータ装置、媒体及びプログラム
WO2022111617A1 (zh) 一种模型训练方法及装置
Baykal et al. Comparing deep learning performance on BigData by using CPUs and GPUs
Kuang et al. Preview on structures and algorithms of deep learning
WO2022012668A1 (zh) 一种训练集处理方法和装置
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN112989792B (zh) 事例检测方法和电子设备
CN111797992A (zh) 一种机器学习优化方法以及装置
Malik Technical perspective: what led computer vision to deep learning?
JP2015036939A (ja) 特徴抽出プログラム及び情報処理装置
CN113627163A (zh) 一种注意力模型、特征提取方法及相关装置
CN116188941A (zh) 一种基于松弛标注的流形正则化宽度学习方法及系统
WO2023273934A1 (zh) 一种模型超参数的选择方法及相关装置
WO2023122854A1 (zh) 数据处理的方法和装置
Bai et al. Real-time 3d human pose estimation without skeletal a priori structures
Ferguson et al. A standardized PMML format for representing convolutional neural networks with application to defect detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18899222

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18899222

Country of ref document: EP

Kind code of ref document: A1