WO2020041960A1 - 芯片适配确定方法及相关产品 - Google Patents

芯片适配确定方法及相关产品 Download PDF

Info

Publication number
WO2020041960A1
WO2020041960A1 PCT/CN2018/102633 CN2018102633W WO2020041960A1 WO 2020041960 A1 WO2020041960 A1 WO 2020041960A1 CN 2018102633 W CN2018102633 W CN 2018102633W WO 2020041960 A1 WO2020041960 A1 WO 2020041960A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
chip
learning model
data
user
Prior art date
Application number
PCT/CN2018/102633
Other languages
English (en)
French (fr)
Inventor
熊超
牛昕宇
蔡权雄
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to PCT/CN2018/102633 priority Critical patent/WO2020041960A1/zh
Priority to CN201880083334.6A priority patent/CN111527501B/zh
Publication of WO2020041960A1 publication Critical patent/WO2020041960A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of computers and artificial intelligence technologies, and in particular, to a chip adaptation method and related products.
  • the embodiments of the present application provide a chip adaptation method and related products, which adapt a chip's deep learning algorithm through a software architecture platform, and reduce the complexity of adapting a deep learning algorithm by displaying the adaptation situation of the image interface image. Professional requirements to reduce costs.
  • an embodiment of the present application provides a chip adaptation method.
  • the method includes the following steps:
  • Graphically display the deep learning development platform prompt the user to upload the data set to be trained, and prompt the user to label the data set;
  • Receiving the data set input by the user receiving the data set labeled by the user to obtain the labeled data set, and converting the labeled data set into data in a set format;
  • the trained deep learning model is input into the chip, the hardware configuration of the chip is performed to obtain the adapted chip according to the chip's hardware configuration file, and the runtime file is started to complete the deep learning model's curing and deployment on the chip.
  • the model structure includes: n layers, parameters corresponding to n layers, and connection relationships between n layers.
  • the training parameters include: optimization algorithm, training parameters, and loss function, where n is an integer greater than or equal to 2. .
  • the method further includes:
  • test data input the test data to the adapted chip to perform the operation to obtain the operation result, and display the operation result.
  • the method further includes:
  • the development platform includes:
  • a data preparation unit is used to graphically display the deep learning development platform, prompting the user to upload a data set that needs training, and prompting the user to mark the data set; receiving the data set input by the user, and receiving the user's mark on the data set Get the labeled data set, and convert the labeled data set into data in a set format;
  • the algorithm development unit is used to determine the structure and training parameters of the deep learning model selected by the user; input the formatted data into the deep learning model structure to realize the training of the deep learning model, and obtain the trained deep learning model;
  • the hardware optimization unit analyzes the trained deep learning model to obtain a data flow graph, determines the compilation parameter file of the chip according to the data flow graph, and performs hardware compilation according to the data flow graph and the compilation parameter file to obtain an optimized and accelerated target chip.
  • Configuration file
  • An adaptation unit is used to input the trained deep learning model into the chip, perform hardware configuration on the chip according to the hardware configuration file of the chip to obtain the adapted chip, and start the runtime file to complete the curing and deep learning model on the chip. deploy.
  • the model structure includes: n layers, parameters corresponding to n layers, and connection relationships between n layers.
  • the training parameters include: optimization algorithm, training parameters, and loss function, where n is an integer greater than or equal to 2. .
  • the adapting unit is further configured to obtain test data, input the test data to the adapted chip to perform an operation to obtain an operation result, and display the operation result.
  • the algorithm development unit is further configured to perform storage and structural optimization on the trained deep learning model to obtain an optimized deep learning model.
  • a computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method as provided in the first aspect.
  • a computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided by the first aspect.
  • this application provides a fully graphical and automated chip algorithm development function. It greatly reduces the difficulty for users to develop chip-level deep learning algorithms and improves development efficiency. In addition, for algorithm developers, this application provides rapid deployment testing of algorithms to chips to facilitate the migration of existing server algorithms to front-end nodes.
  • FIG. 1 is a schematic structural diagram of a platform system.
  • Figure 1a is a schematic diagram of a data preparation process.
  • Figure 1b is a schematic diagram of the algorithm development process.
  • Figure 1c is a flowchart of data flow graph optimization.
  • Figure 1d is a flowchart of the chip-side deployment test.
  • FIG. 2 is a schematic flowchart of a method for determining a framework of a deep learning algorithm.
  • FIG. 3 is a structural diagram of a development platform provided by the present application.
  • an embodiment herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they independent or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
  • the application of the server-side algorithm requires that the front-end nodes of the Internet of Things collect data and return them to the server for processing, and the server returns the results.
  • the number of nodes and data collected in the Internet of Things has exploded.
  • Such a large amount of data on the one hand greatly increases the load on the server side, resulting in a slow processing response speed.
  • it requires a lot of network bandwidth for transmission.
  • the existing algorithm processing architecture can no longer meet the needs of the Internet of Things.
  • the existing front-end applications of the Internet of Things mainly rely on ARM, but because the ARM chip uses an instruction set architecture, the resource utilization rate is low, and it cannot support real-time processing of complex depth algorithms.
  • the custom chip introduces the concept of custom computing, and uses a Streaming Graph to optimize specific algorithms, thereby improving resource utilization, that is, improving the effective performance of the chip.
  • the high performance and low power consumption of custom chips fit well with the requirements of deep learning algorithms, so the deep learning algorithms relying on custom chips are more suitable for the current Internet of Things scenarios.
  • This application proposes a graphical deep learning algorithm development framework, so that users can conveniently and quickly develop AI algorithms that can be applied to the front end without any algorithm and specialized chip expertise.
  • the framework introduces a compiler function. For different deep learning algorithms, it automatically extracts the data flow graphs of the algorithm and maps them to the configuration of the customized chip to achieve automatic adaptation and optimization of the chip architecture, thereby solving the problem. The problem of poor flexibility.
  • FIG. 1 is a schematic structural diagram of a system composition. As shown in FIG. 1, the system may include 4 parts. Specifically: data preparation, algorithm development, data flow graph optimization, and chip-side deployment testing.
  • FIG. 1a is a data preparation process. This can include:
  • S1-1 users upload the data sets that need training.
  • the data sets can be images, videos, texts, voices,
  • the platform automatically converts the data and corresponding annotations into a structure file that can be read by the algorithm, and one format is converted into another format.
  • FIG. 1b is a process of algorithm development. This can include:
  • S2-1 defines the model structure of the deep learning network, including the specific parameters of each layer (the number of invisible nodes, the size of the convolution kernel, etc.) and the connection relationship between layers.
  • S2-2 defines algorithm training parameters, including optimization algorithms and training parameters, loss functions, etc.
  • S2-3 model training is performed on the server side, completing the learning and optimization of all parameters of the deep learning network.
  • S2-4 optimize the storage and structure of the trained model to reduce the hard disk memory usage of the model and improve the model calculation efficiency.
  • FIG. 1c is a data flow graph optimization process, which may specifically include:
  • This step mainly completes the compilation of the server training model to the customized chip.
  • the S3-2 data flow diagram is optimized for the target chip architecture to generate a configuration file
  • FIG. 1d is a flowchart of a chip-side deployment test, which may specifically include:
  • S4-1 configure the chip, allocate resources, start the runtime file
  • S4-2 users upload test data, and the platform transmits the test data to the chip
  • the S4-3 chip runs the test and returns the results
  • This application provides a fully graphical and automated chip algorithm development function. It greatly reduces the difficulty for users to develop chip-level deep learning algorithms and improves development efficiency. In addition, for algorithm developers, this application provides rapid deployment testing of algorithms to chips to facilitate the migration of existing server algorithms to front-end nodes.
  • FIG. 2 provides a chip adaptation method.
  • the method may be executed by a terminal.
  • the terminal may specifically be a computer, a server, a cloud platform, and other devices and platforms with computing capabilities.
  • the method includes the following steps:
  • Step S201 Graphically display the deep learning development platform, prompt the user to upload a data set to be trained, and prompt the user to mark the data set.
  • the above data set includes, but is not limited to, one of image, video, text, and speech.
  • the data set may also include other forms of data sets. This application does not limit the specific expressions of the above data sets.
  • Step S202 Receive the data set input by the user, receive the data set labeled by the user to obtain the labeled data set, and convert the labeled data set into data in a set format;
  • the above-mentioned formatted data may be the format specified by different deep learning frameworks for the training data, and the specific format may be determined according to the algorithm developer, which is not limited here, such as binary format, txt format, and so on.
  • Step S203 Determine a deep learning model structure and training parameters selected by the user.
  • the model structure includes, but is not limited to, n-layers, n-layer corresponding parameters, and connection relationships between n-layers.
  • the training parameters include, but are not limited to, optimization algorithms. , Training parameters, loss functions, etc.
  • n is an integer of 2 or more.
  • Step S204 Input the data in the format into the structure of the deep learning model to implement training of the deep learning model, and obtain a trained deep learning model.
  • Step S205 Analyze the trained deep learning model to obtain a data flow graph, determine the compilation parameter file of the chip according to the data flow graph, and perform hardware compilation according to the data flow graph and the compilation parameter file to obtain an optimized and accelerated configuration for the target chip. file.
  • the above data flow diagram specifically defines the operation structure diagram and control logic of the chip. Specifically, for a chip containing a large number of calculation units, how the data is stored and transmitted in the chip, and how many calculation units are allowed for each calculation module. .
  • Step S206 Input the trained deep learning model into a chip, and perform hardware configuration on the chip according to the hardware configuration file of the chip to obtain an adapted chip.
  • This application provides a fully graphical and automated chip algorithm development function. It greatly reduces the difficulty for users to develop chip-level deep learning algorithms and improves development efficiency. In addition, for algorithm developers, this application provides rapid deployment testing of algorithms to chips to facilitate the migration of existing server algorithms to front-end nodes.
  • the foregoing method may further include:
  • test data input the test data to the adapted chip to perform the operation to obtain the operation result, and display the operation result.
  • This technical solution realizes testing of the adapted chip, and can detect whether the adapted chip is successfully adapted.
  • the above operation results may specifically be graphic annotation, object positioning in an image, etc., of course, it may also be other classification results, such as the determination of face recognition and the like.
  • FIG. 3 provides a development platform, which includes:
  • a data preparation unit 301 is used to graphically display the deep learning development platform, prompting the user to upload a data set that needs training, and prompting the user to mark the data set; receiving the data set input by the user, and receiving the user's Annotate to obtain the annotated data set, and convert the annotated data set into data in a set format;
  • An algorithm development unit 302 is configured to determine a structure and training parameters of a deep learning model selected by a user; input data in a set format into the deep learning model structure to implement training of the deep learning model, and obtain a trained deep learning model;
  • a hardware optimization unit 303 a hardware optimization unit, analyzes the trained deep learning model to obtain a data flow graph, determines a compilation parameter file of the chip according to the data flow graph, and performs hardware compilation according to the data flow graph and the compilation parameter file to obtain a target Configuration files after chip optimization and acceleration;
  • An adaptation unit 304 is configured to input the trained deep learning model into the chip, perform hardware configuration on the chip according to the hardware configuration file of the chip to obtain the adapted chip, and start the runtime file to complete the deep learning model curing on the chip. And deployment. .
  • the model structure includes: n layers, parameters corresponding to n layers, and connection relationships between n layers.
  • the training parameters include: optimization algorithm, training parameters, and loss function, where n is an integer greater than or equal to 2. .
  • the adapting unit is further configured to obtain test data, input the test data to the adapted chip to perform an operation to obtain an operation result, and display the operation result.
  • the algorithm development unit is further configured to perform storage and structural optimization on the trained deep learning model to obtain an optimized deep learning model.
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes a computer to execute an adaptation method of any chip as described in the foregoing method embodiments. Some or all of the steps.
  • An embodiment of the present application further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the operations described in the foregoing method embodiments. Part or all of the steps of any chip adaptation method.
  • processors and chips in the various embodiments of the present application may be integrated in one processing unit, or may exist separately physically, or two or more pieces of hardware may be integrated in one unit.
  • the computer-readable storage medium or computer-readable program may be stored in a computer-readable memory.
  • the technical solution of the present application essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a memory.
  • Several instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the aforementioned memories include: U disks, Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disks, magnetic disks, or optical disks, which can store program codes.
  • the program may be stored in a computer-readable memory, and the memory may include a flash disk , Read-only memory (English: Read-Only Memory, referred to as ROM), random access device (English: Random Access Memory, referred to as RAM), magnetic disks or optical disks, etc.
  • ROM Read-only memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Stored Programmes (AREA)

Abstract

一种芯片适配方法及相关产品,所述方法包括如下步骤:以图像化显示深度学习开发平台,提示用户上传需要训练的数据集;接收用户对该数据集的标注,将标注后的数据集转换成设定格式的数据;确定用户选择的深度学习模型结构以及训练参数;将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的配置文件,依据该数据流图以及配置文件进行硬件编译加速得到芯片的硬件配置文件;将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片。所述方法具有开发门槛低、算法运算效率高、成本低的优点。

Description

芯片适配确定方法及相关产品 技术领域
本申请涉及计算机以及人工智能技术领域,具体涉及一种芯片适配方法及相关产品。
背景技术
随着深度学习领域的一系列进展,越来越多的领域开始利用基于深度学习的人工智能算法解决实际问题,并取得了很好的效果。当前绝大多数深度学习算法有三种应用场景:基于GPU的服务器应用、基于ARM的前端应用和基于定制化芯片的前端应用。
现有的用户依据不同的需求可能需要对深度学习的构架进行不同的配置,所以芯片成为了大家的首选,但由于大部分的厂家对芯片的技术有限,无法对芯片进行深度学习算法的适配,即使能够适配也需要非常专业的人员,所以现有的芯片的深度学习适配的成本高。
申请内容
本申请实施例提供了一种芯片适配方法及相关产品,其通过软件构架平台对芯片的深度学习算法进行适配,通过图像界面形象的显示适配的情况,降低对深度学习算法适配的专业要求,降低成本。
第一方面,本申请实施例提供一种芯片适配方法,所述方法包括如下步骤:
以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注;
接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
确定用户选择的深度学习模型结构以及训练参数;
将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;
对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件;
将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片,,启动运行时文件完成深度学习模型在芯片上的固化和部署。
可选的,所述模型结构包括:n层、n层对应的参数以及n层之间的连接关系,所述训练参数包括:优化算法、训练参数、损失函数,上述n为大于等于2的整数。
可选的,所述方法还包括:
获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
可选的,所述方法得到训练后的深度学习模型之后还包括:
对训练后的深度学习模型做存储和结构上的优化得到优化后的深度学习模型。
第二方面,提供一种开发平台,所述开发平台包括:
数据准备单元,用于以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注;接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
算法开发单元,用于确定用户选择的深度学习模型结构以及训练参数;将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;
硬件优化单元,对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件;
适配单元,用于将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片,启动运行时文件完成深度学习模型在芯片上的固化和部署。
可选的,所述模型结构包括:n层、n层对应的参数以及n层之间的连接关 系,所述训练参数包括:优化算法、训练参数、损失函数,上述n为大于等于2的整数。
可选的,所述适配单元,还用于获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
可选的,所述算法开发单元,还用于对训练后的深度学习模型做存储和结构上的优化得到优化后的深度学习模型。
第三方面,提供一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如第一方面提供的方法。
第四方面,提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行第一方面提供的方法。
实施本申请实施例,具有如下有益效果:
可以看出,本申请提供了全图形化、自动化的芯片算法开发功能。极大的降低了用户开发芯片级深度学习算法的难度,提升了开发效率。另外,针对于算法开发人员,本申请提供算法到芯片的快速化部署测试,便于已有服务器算法向前端节点的移植。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是一种平台系统的结构示意图。
图1a是数据准备流程示意图。
图1b是算法开发的流程示意图。
图1c是数据流图优化的流程图。
图1d是芯片端部署测试的流程图。
图2是一种深度学习算法的构架确定方法的流程示意图。
图3是本申请提供的一种开发平台的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及所述附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
服务器端的算法应用要求物联网前端节点采集到数据,并全部回传到服务器进行处理,服务器再将结果返回。随着物联网技术的普及,物联网的节点数量和采集到的数据呈爆炸性增长。如此大的数据量一方面大大增加了服务器端的负荷,导致处理响应速度慢,另一方面需要占用大量的网络带宽进行传输,已有的算法处理架构已经不能适应物联网的需求。
为了解决这一问题,人们希望把计算能力前移到物联网前端节点。具体来说,就是通过植入智能芯片赋予物联网节点处理数据的能力,从而降低对带宽和服务器的依赖。现有的物联网前端应用主要依靠于ARM,但由于ARM芯片采用指令集的架构,资源利用率低,不能支持复杂深度算法的实时处理。定制化芯片引入了定制计算的概念,利用数据流图(Streaming Graph)对于特定算法进行优化,从而提高资源利用率,即提高芯片有效性能。定制化芯片的高性能和低功耗很好的契合了深度学习算法的要求,因此依托于定制化芯片的深度学习算法更加适合当前的物联网场景。
针对于定制化芯片算法开发存在三大难点:第一、很多传统领域的企业不具备算法开发的能力;第二、定制化芯片对于算法的适配和优化需要很强的硬 件专业知识;第三、定制化芯片对于每一种应用需要做专门优化,灵活性较差。这三方面原因导致基于定制化芯片的算法应用开发开发周期长,难度大,不宜落地。
本申请提出一种图形化的深度学习算法开发框架,使得用户无需任何算法和定制化芯片专业知识就可以方便快捷的开发可以应用在前端的AI算法。此外,该框架引入了编译器功能,对于不同的深度学习算法,自动的提取算法种的数据流图,并映射到定制化芯片的配置中,实现对于芯片架构的自动适配和优化,从而解决了灵活性差的问题。
参阅图1,图1为一种系统组成的结构示意图,如图1所示,该系统可以包括4个部分。具体包括:数据准备、算法开发、数据流图优化、芯片端部署测试。
参阅图1a,图1a为数据准备流程。具体可以包括:
S1-1用户上传需要训练的数据集,数据集可以是图像、视频、文本、语音,
S1-2用户对于上传数据进行标注。
S1-3全部数据标注完成后,平台自动将数据和对应标注转化为算法可以读取的结构文件,一个格式转换成另外一个格式。
参阅图1b,图1b为算法开发的流程。具体可以包括:
分为两个步骤—模型定义和模型训练。
S2-1定义深度学习网络的模型结构,包括每一层的具体参数(隐形节点数量,卷积核大小等)和层与层之间的连接关系
S2-2定义算法训练参数,包括优化算法及训练参数、损失函数等。
S2-3模型训练在服务器端执行,完成深度学习网络所有参数的学习和优化。
S2-4对训练得到的模型做存储和结构上的优化,以减小模型硬盘内存占用,提高模型计算效率。
参阅图1c,图1c为数据流图优化的流程,具体可以包括:
该步骤主要完成服务端训练模型到定制化芯片的编译。
S3-1解析得到的深度学习网络模型,并转化为数据流图
S3-2数据流图针对于目标芯片架构进行优化,生成配置文件
S3-3根据配置文件,依靠指令流进行硬件编译,并生成运行时文件。
参阅图1d,图1d为芯片端部署测试的流程,具体可以包括:
S4-1配置芯片,进行资源调度分配,启动运行时文件
S4-2用户上传测试数据,平台将测试数据传输给芯片
S4-3芯片运行测试,并返回结果
S4-4平台显示结果
本申请提供了全图形化、自动化的芯片算法开发功能。极大的降低了用户开发芯片级深度学习算法的难度,提升了开发效率。另外,针对于算法开发人员,本申请提供算法到芯片的快速化部署测试,便于已有服务器算法向前端节点的移植。
参阅图2,图2提供了一种芯片的适配方法,该方法可以由终端执行,该终端具体可以为,计算机、服务器、云平台等等具有计算能力的设备以及平台。参阅图2,该方法包括如下步骤:
步骤S201、以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注。
上述数据集包括但不限于:图像、视频、文本、语音中的一种。当然在实际应用中,该数据集还可以包括其他形式的数据集。本申请并不限制上述数据集的具体表现形式。
步骤S202、接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
上述设定格式的数据可以为不同深度学习框架对训练数据规定的格式,具体的格式可以依据算法开发厂家来确定,这里不限定,例如二进制格式、txt格式等等。
步骤S203、确定用户选择的深度学习模型结构以及训练参数,该模型结构包括但不限于:n层、n层对应的参数以及n层之间的连接关系,该训练参数包括但不限于:优化算法、训练参数、损失函数等。上述n为大于等于2的整数。
步骤S204、将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型。
步骤S205、对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件。
上述数据流图具体定义了芯片的运算结构图和控制逻辑,具体的,对于芯 片包含大量计算单元时,可以为数据在芯片中如何存储和传输,以及各运算模块允许使用多少计算单元。。
步骤S206、将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片。
本申请提供了全图形化、自动化的芯片算法开发功能。极大的降低了用户开发芯片级深度学习算法的难度,提升了开发效率。另外,针对于算法开发人员,本申请提供算法到芯片的快速化部署测试,便于已有服务器算法向前端节点的移植。
可选的,上述方法在步骤S206之后还可以包括:
获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
该技术方案实现对适配后的芯片进行测试,能够对该适配后的芯片进行是否适配成功进行检测。上述运算结果具体可以为,图形的标注、物体在图像中的定位等等,当然还可以为其他的分类结果,例如人脸识别的确定等等。
参阅图3,图3提供了一种开发平台,所述开发平台包括:
数据准备单元301,用于以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注;接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
算法开发单元302,用于确定用户选择的深度学习模型结构以及训练参数;将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;
硬件优化单元303,硬件优化单元,对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件;;
适配单元304,用于将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片,启动运行时文件完成深度学习模型在芯片上的固化和部署。。
可选的,所述模型结构包括:n层、n层对应的参数以及n层之间的连接关系,所述训练参数包括:优化算法、训练参数、损失函数,上述n为大于等于2 的整数。
可选的,所述适配单元,还用于获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
可选的,所述算法开发单元,还用于对训练后的深度学习模型做存储和结构上的优化得到优化后的深度学习模型。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种芯片的适配方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种芯片的适配方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的
另外,在本申请各个实施例中的处理器、芯片可以集成在一个处理单元中,也可以是单独物理存在,也可以两个或两个以上硬件集成在一个单元中。计算机可读存储介质或计算机可读程序可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等 各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种芯片适配方法,其特征在于,所述方法包括如下步骤:
    以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注;
    接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
    确定用户选择的深度学习模型结构以及训练参数;
    将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;
    对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件;
    将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片,启动运行时文件完成深度学习模型在芯片上的固化和部署。
  2. 根据权利要求1所述的方法,其特征在于,
    所述模型结构包括:n层、n层对应的参数以及n层之间的连接关系,所述训练参数包括:优化算法、训练参数、损失函数,上述n为大于等于2的整数。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
  4. 根据权利要求1所述的方法,其特征在于,所述方法得到训练后的深度学习模型之后还包括:
    对训练后的深度学习模型做存储和结构上的优化得到优化后的深度学习模型。
  5. 一种开发平台,其特征在于,所述开发平台包括:
    数据准备单元,用于以图像化显示深度学习开发平台,提示用户上传需要训练的数据集,并提示用户对该数据集进行标注;接收用户输入的该数据集,接收用户对该数据集的标注得到标注后的数据集,将标注后的数据集转换成设定格式的数据;
    算法开发单元,用于确定用户选择的深度学习模型结构以及训练参数;将设定格式的数据输入深度学习模型结构实现对深度学习模型的训练,得到训练后的深度学习模型;
    硬件优化单元,用于对该训练后的深度学习模型解析得到数据流图,依据该数据流图确定芯片的编译参数文件,依据该数据流图以及编译参数文件进行硬件编译得到针对目标芯片优化加速后的配置文件;
    适配单元,用于将训练后的深度学习模型输入芯片,依据该芯片的硬件配置文件对该芯片执行硬件配置得到适配后的芯片,,启动运行时文件完成深度学习模型在芯片上的固化和部署。
  6. 根据权利要求5所述的开发平台,其特征在于,
    所述模型结构包括:n层、n层对应的参数以及n层之间的连接关系,所述训练参数包括:优化算法、训练参数、损失函数,上述n为大于等于2的整数。
  7. 根据权利要求5所述的开发平台,其特征在于,
    所述适配单元,还用于获取测试数据,将测试数据输入到适配后的芯片进行运算得到运算结果,将该运算结果进行显示。
  8. 根据权利要求5所述的开发平台,其特征在于,
    所述算法开发单元,还用于对训练后的深度学习模型做存储和结构上的优化得到优化后的深度学习模型。
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-4中任意一项所述的方法。
  10. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1-4中任意一项所述的方法。
PCT/CN2018/102633 2018-08-28 2018-08-28 芯片适配确定方法及相关产品 WO2020041960A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/102633 WO2020041960A1 (zh) 2018-08-28 2018-08-28 芯片适配确定方法及相关产品
CN201880083334.6A CN111527501B (zh) 2018-08-28 2018-08-28 芯片适配确定方法及相关产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/102633 WO2020041960A1 (zh) 2018-08-28 2018-08-28 芯片适配确定方法及相关产品

Publications (1)

Publication Number Publication Date
WO2020041960A1 true WO2020041960A1 (zh) 2020-03-05

Family

ID=69642712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102633 WO2020041960A1 (zh) 2018-08-28 2018-08-28 芯片适配确定方法及相关产品

Country Status (2)

Country Link
CN (1) CN111527501B (zh)
WO (1) WO2020041960A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737134A (zh) * 2020-06-23 2020-10-02 湖南国科微电子股份有限公司 一种芯片测试方法、装置、电子设备以及存储介质
CN112764390A (zh) * 2021-02-01 2021-05-07 浙江一木智能科技有限公司 一种基于深度学习算法的零配件压配系统及方法
CN114282079A (zh) * 2021-11-25 2022-04-05 中国科学院深圳先进技术研究院 一种数据标注系统、方法、终端以及存储介质
CN116757260A (zh) * 2023-08-14 2023-09-15 北京向量栈科技有限公司 一种大型预训练模型的训练方法和系统
CN117370809A (zh) * 2023-11-02 2024-01-09 快朵儿(广州)云科技有限公司 一种基于深度学习的人工智能模型构建方法、系统及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112130896B (zh) * 2020-08-17 2022-03-25 深圳云天励飞技术股份有限公司 神经网络模型迁移方法、装置、电子设备及存储介质
CN112947932B (zh) * 2021-02-24 2024-06-07 上海商汤智能科技有限公司 对编译过程中的向量化进行优化的方法、装置及电子设备
CN115081628B (zh) * 2022-08-15 2022-12-09 浙江大华技术股份有限公司 一种深度学习模型适配度的确定方法及装置
CN115437642B (zh) * 2022-11-07 2024-05-14 深圳鲲云信息科技有限公司 一种模型编译方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956660A (zh) * 2016-05-16 2016-09-21 浪潮集团有限公司 一种用于实时图像识别的神经元网络芯片实现方法
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
US20180075339A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN108319456A (zh) * 2018-01-29 2018-07-24 徐磊 一种免编程深度学习应用的开发方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2952576C (en) * 2014-06-20 2022-07-26 Miovision Technologies Incorporated Machine learning platform for performing large scale data analytics
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
CN107067365A (zh) * 2017-04-25 2017-08-18 中国石油大学(华东) 基于深度学习的分布嵌入式实时视频流处理系统及方法
CN107480725A (zh) * 2017-08-23 2017-12-15 京东方科技集团股份有限公司 基于深度学习的图像识别方法、装置和计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956660A (zh) * 2016-05-16 2016-09-21 浪潮集团有限公司 一种用于实时图像识别的神经元网络芯片实现方法
US20180075339A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
CN106650922A (zh) * 2016-09-29 2017-05-10 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
CN108319456A (zh) * 2018-01-29 2018-07-24 徐磊 一种免编程深度学习应用的开发方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737134A (zh) * 2020-06-23 2020-10-02 湖南国科微电子股份有限公司 一种芯片测试方法、装置、电子设备以及存储介质
CN111737134B (zh) * 2020-06-23 2023-09-26 湖南国科微电子股份有限公司 一种芯片测试方法、装置、电子设备以及存储介质
CN112764390A (zh) * 2021-02-01 2021-05-07 浙江一木智能科技有限公司 一种基于深度学习算法的零配件压配系统及方法
CN114282079A (zh) * 2021-11-25 2022-04-05 中国科学院深圳先进技术研究院 一种数据标注系统、方法、终端以及存储介质
CN116757260A (zh) * 2023-08-14 2023-09-15 北京向量栈科技有限公司 一种大型预训练模型的训练方法和系统
CN116757260B (zh) * 2023-08-14 2024-03-19 北京向量栈科技有限公司 一种大型预训练模型的训练方法和系统
CN117370809A (zh) * 2023-11-02 2024-01-09 快朵儿(广州)云科技有限公司 一种基于深度学习的人工智能模型构建方法、系统及存储介质

Also Published As

Publication number Publication date
CN111527501A (zh) 2020-08-11
CN111527501B (zh) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2020041960A1 (zh) 芯片适配确定方法及相关产品
US11176448B2 (en) Enhancing processing performance of a DNN module by bandwidth control of fabric interface
CN105931638B (zh) 面向智能机器人的对话系统数据处理方法及装置
EP3731161A1 (en) Model application method and system, and model management method and server
Xu et al. The case for FPGA-based edge computing
US10671147B2 (en) Dynamic power management for artificial intelligence hardware accelerators
Jiang et al. Accelerating mobile applications at the network edge with software-programmable FPGAs
WO2017177661A1 (zh) 基于卷积神经网络的视频检索方法及系统
US20150178292A1 (en) Methods and systems for data serialization and deserialization
Ogden et al. {MODI}: Mobile deep inference made efficient by edge computing
JP2022553252A (ja) 画像処理方法、画像処理装置、サーバ、及びコンピュータプログラム
US20180293987A1 (en) Speech recognition method, device and system based on artificial intelligence
Zhang et al. A locally distributed mobile computing framework for DNN based android applications
Kaushi et al. A computation offloading framework to optimize energy utilisation in mobile cloud computing environment
US20200286012A1 (en) Model application method, management method, system and server
Mirkovic et al. {DEW}: Distributed Experiment Workflows
He et al. AI Chinese sign language recognition interactive system based on audio-visual integration
Gong et al. Wwof: an energy efficient offloading framework for mobile webpage
Zhou et al. Design and implementation of Python teaching platform based on container and jupyter
KR20110083243A (ko) 태스크 이동 시스템 및 그 방법
Jiang et al. To offload selective search: improving performance of fast R-CNN based on a mobile cloud offloading framework
WO2020156212A1 (zh) 一种数据处理的方法、装置及电子设备
CN116700703B (zh) 一种业务处理方法、装置、设备及存储介质
Arya et al. Energy-Efficient Cloud Computing for Smart Phones
Yang et al. Research and practice of swoole asynchronous multithreading design method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 21/04/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18931527

Country of ref document: EP

Kind code of ref document: A1