WO2021031137A1 - 人工智能应用开发系统、计算机设备及存储介质 - Google Patents

人工智能应用开发系统、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021031137A1
WO2021031137A1 PCT/CN2019/101684 CN2019101684W WO2021031137A1 WO 2021031137 A1 WO2021031137 A1 WO 2021031137A1 CN 2019101684 W CN2019101684 W CN 2019101684W WO 2021031137 A1 WO2021031137 A1 WO 2021031137A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
subsystem
model
artificial intelligence
module
Prior art date
Application number
PCT/CN2019/101684
Other languages
English (en)
French (fr)
Inventor
朱焱
汤鉴
姜浩
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to CN201980066985.9A priority Critical patent/CN113168552A/zh
Priority to PCT/CN2019/101684 priority patent/WO2021031137A1/zh
Publication of WO2021031137A1 publication Critical patent/WO2021031137A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to artificial intelligence application development systems, computer equipment and storage media.
  • the purpose of the embodiments of the present application is to propose an artificial intelligence application development system, computer equipment, and storage medium, so as to lower the threshold of artificial intelligence application development and improve development efficiency.
  • an embodiment of the present application provides an artificial intelligence application development system, which adopts the following technical solutions:
  • the artificial intelligence application development system includes:
  • Neural network generation subsystem used to construct, train and verify neural network models
  • the neural network hardware execution subsystem is used to accept data input to the neural network model, and output the result after calculation by the neural network model;
  • the deployment subsystem is used to compile the neural network model generated by the neural network generation subsystem and deploy it to the neural network hardware execution subsystem.
  • the neural network generation subsystem is also used to provide training data for the neural network model and label the training data.
  • neural network hardware execution subsystem is implemented based on FPGA.
  • deployment subsystem includes:
  • Compilation module used to analyze the neural network model and generate model structure files and data files
  • the running module is used to allocate hardware computing resources according to the structure files and data files of the model
  • the driving module is configured to call the corresponding hardware computing resource according to the allocation result of the running module, and the hardware computing resource includes the neural network hardware execution subsystem implemented based on FPGA.
  • the allocation of hardware computing resources by the running module according to the structure file and data file of the model includes:
  • the FPGA-based neural network hardware execution subsystem includes an FPGA core module and an extension module.
  • the FPGA core module includes a core chip, a memory chip, a SAMTEC interface, and a JTAG interface.
  • the expansion module includes a network interface, a UART port, a GPIO port, and a SAMTEC interface, and the FPGA core module and the expansion module are connected and communicated through the SAMTEC interface.
  • the embodiments of the present application also provide a computer device, which adopts the following technical solutions:
  • the computer device includes a memory and a processor, and a computer program is stored in the memory.
  • the processor executes the computer program, the processor implements the artificial intelligence application development system described in any one of the embodiments of the present application.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the function of the artificial intelligence application development system mentioned in any one of the embodiments of the present application is realized.
  • an artificial intelligence application development system includes a neural network generation subsystem for constructing, training and verifying neural network models, and for receiving input.
  • the neural network hardware execution subsystem that outputs the results after calculating the neural network model data and the neural network model is used to compile and deploy the neural network model generated by the neural network generation subsystem to the neural network hardware execution sub-system
  • the deployment subsystem of the system Construct a training neural network model through the visual neural network generation subsystem, and automatically deploy the trained neural network model to the neural network hardware execution subsystem for execution through the deployment subsystem, which can lower the threshold for artificial intelligence application development and improve development effectiveness.
  • FIG. 1 shows a schematic structural diagram of an embodiment of an artificial intelligence application development system 100 according to the present application
  • FIG. 2 shows a schematic structural diagram of an embodiment of the deployment subsystem 103 of the artificial intelligence application development system according to the present application
  • FIG. 3 shows a schematic structural diagram of an embodiment of the neural network hardware execution subsystem 102 of the artificial intelligence application development system according to the present application
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • Fig. 1 shows a schematic structural diagram of an embodiment of an artificial intelligence application development system according to the present application.
  • the artificial intelligence application development system 100 includes:
  • the neural network generation subsystem 101 is used to construct, train and verify the neural network model.
  • neural network model training is used The annotated data is iteratively trained on the neural network algorithm model built to minimize the loss of the model; neural network model verification refers to the effect verification of the trained neural network model through the verification data, and the user can upload image data and voice Data etc. are used as the input of the model, and the output result after model detection and recognition can verify the effect of the model, as well as the accuracy and speed of recognition.
  • the neural network generation subsystem 101 can provide a visual interface through WEB (web page) technology to help developers quickly develop a neural network model, and train and verify the model, that is, the developer accesses the aforementioned neural network generation through a web page Subsystem 101 provides interfaces to obtain corresponding services, such as the construction of neural network models.
  • WEB web page
  • the construction, training and verification of visual neural network models provided through web pages can improve the development efficiency of developers.
  • the neural network hardware execution subsystem 102 is configured to accept data input to the neural network model, and output the result after calculation by the neural network model.
  • the neural network hardware execution subsystem 102 may be a general-purpose processor (such as a CPU, etc.) that has stored and can execute the above-mentioned neural network model 101, or may be a dedicated processor (such as an FPGA, etc.) that solidifies the above-mentioned neural network model 101.
  • the neural network hardware execution subsystem 102 can also provide a network interface or other interfaces to receive and store data input from the outside world, and then input it into the neural network model for calculation, that is, extract features, classification or clustering , Regression or prediction, etc., to obtain the prediction or recognition result.
  • the deployment subsystem 103 is configured to compile the neural network model generated by the neural network generation subsystem 101 and deploy it to the neural network hardware execution subsystem 102.
  • the neural network model includes the neural network diagram (neural network structure) and the parameters corresponding to the structure.
  • the structure of the neural network is based on the layer as the computing unit, including but not limited to convolutional layer, pooling layer, ReLU (activation function) ), fully connected layer, etc.
  • each layer in the neural network structure also has a large number of parameters, including but not limited to: weight (weight), bias (bias), etc.
  • the above neural network model is compiled into a model file (including the structure file and data file of the model) by a compiler (such as TVM, etc.), and the hardware resources required by the corresponding model are automatically allocated according to the above model file, for example
  • the calculation unit, the cache unit, and the pipeline unit that can perform timing optimization, etc., that is, the aforementioned hardware resource is called from the aforementioned neural network hardware execution subsystem 102 and then executed.
  • an artificial intelligence application development system including a neural network generation subsystem for building, training and verifying neural network models, and for receiving data input to the neural network model and passing the neural network
  • the neural network hardware execution subsystem that outputs the result after model calculation is used to compile and deploy the neural network model generated by the neural network generation subsystem to the deployment subsystem of the neural network hardware execution subsystem. Construct a training neural network model through the visual neural network generation subsystem, and automatically deploy the trained neural network model to the neural network hardware execution subsystem for execution through the deployment subsystem, which can lower the threshold for artificial intelligence application development and improve development effectiveness.
  • the neural network generation subsystem 101 is also used to provide training data for the neural network model and label the training data.
  • the neural network generation subsystem 101 can also provide developers with functional modules such as creating a new database, uploading data, data annotation, etc., to prepare data for subsequent neural network model training, and the marked data can make the model update Train well quickly.
  • neural network hardware execution subsystem 102 is implemented based on FPGA.
  • FPGA has programmability. Developers can connect the logic blocks inside FPGA through programming according to their own needs, and realize the corresponding functions more freely and flexibly.
  • GPU acceleration design is that the algorithm model adapts to the hardware structure
  • the FPGA acceleration design is the hardware structure adapts to the algorithm model, that is, the corresponding hardware structure is designed (or called) according to the algorithm model. This accelerated design method can accelerate the depth more quickly Learn the neural network algorithm model.
  • FPGAs compared to GPUs, FPGAs have a better energy efficiency ratio.
  • ASIC is superior to FPGA in performance and power consumption, it needs to go through a lot of verification and physical design during design and manufacturing, which leads to a long development cycle.
  • ASIC is a dedicated hardware and hardware structure designed for a certain type of application. It cannot be changed after generation.
  • the deep learning neural network algorithm is currently in a stage of rapid development. For some widely used but immature application scenarios, it is very difficult to design a high-performance general-purpose ASIC to adapt to all application scenarios.
  • FPGA is more suitable for accelerating the deep learning neural network algorithm model that is currently in the rapid development stage. Therefore, the neural network hardware execution subsystem 102 in this embodiment utilizes FPGA to accelerate the execution efficiency of the deep learning neural network.
  • FIG. 2 shows a schematic structural diagram of an embodiment of the deployment subsystem 103 of the artificial intelligence application development system according to the present application.
  • the deployment subsystem 103 includes:
  • the compiling module 1031 is used to analyze the neural network model and generate the structure file and data file of the model;
  • the running module 1032 is used to allocate hardware computing resources according to the structure files and data files of the model;
  • the driving module 1033 is configured to call the corresponding hardware computing resource according to the allocation result of the running module, and the hardware computing resource includes the neural network hardware execution subsystem implemented based on FPGA.
  • the compilation module 1031 can call a neural network compiler (such as TVM) to analyze the neural network model according to the structure of the neural network model generated by the neural network generation subsystem 101, and extract the network structure and The weight data is saved to a file to obtain the structure file and data file of the model.
  • a neural network compiler such as TVM
  • the file format can be json or xml, etc.; the running module 1032 can automatically allocate hardware computing resources, including calculations, according to the structure file and data file of the above neural network model Units and cache units, and pipeline units that can be optimized for timing; then through the drive module 1033, the corresponding hardware computing resources provided by the neural network hardware execution subsystem 102 implemented by FPGA are called to perform calculations and output the calculation results; among them, the neural network outputs
  • the result of is the feature value, which can be understood as an abstract representation of the input picture or data, and then through some calculation methods, the abstract representation, that is, the feature value, is converted into a meaningful output, such as the picture category and the corresponding probability in the classification problem , In the detection problem, the target category, probability and coordinates contained in the picture.
  • the allocation of hardware computing resources by the running module according to the structure file and data file of the model includes:
  • the structure of the neural network model is based on the layer as the computing unit, including but not limited to the input layer, convolutional layer, pooling layer, ReLU (activation function), fully connected layer, etc.
  • different neural networks pass Different types and different numbers of layers are combined to form a neural network structure with different functions; each layer in the neural network structure has a large number of parameters in addition to receiving the data stream output by the previous layer. These parameters include but are not limited to: weight (Weight), bias, etc.
  • the network structure and parameter data of the model can be stored in files, read as node information when calculating each node of each layer, and according to the node information, the hardware resources required by the corresponding node can be dynamically allocated, for example, according to The calculation function and data type of the node, the corresponding calculation unit and storage unit are allocated for calculation operations, and the calculation results are stored in the register cache unit, which is convenient for the next layer to read quickly, saves data copy time, and accelerates the calculation of the neural network Speed, timing optimization of neural network calculations can also be done through pipeline units, which can improve the efficiency of neural network calculations.
  • FIG. 3 shows a schematic structural diagram of an embodiment of the neural network hardware execution subsystem 102 of the artificial intelligence application development system according to the present application.
  • the FPGA-based neural network hardware execution subsystem 102 includes an FPGA core module 1021 and an expansion module 1022.
  • the FPGA core module 1021 includes a core chip 10211, a memory chip 10212, a SAMTEC interface 10214, and a 6-pin JTAG interface 10213;
  • the expansion module 1022 includes a network interface 10222, a 3-pin UART port 10223, and a 40-pin GPIO port 10224 and a SAMTEC interface 10221.
  • the FPGA core module 1021 and the expansion module 1022 are connected and communicated through the SAMTEC interface 10214 of the core module 1021 and the SAMTEC interface 10221 of the expansion module 1022.
  • the above-mentioned core chip is used to provide computing resources and realize the calculation of the neural network
  • Intel Arria 10Soc FPGA can be used as the core chip
  • the memory chip is used to store the weight and other parameter data of the neural network and intermediate calculation data
  • the JTAG interface can be used for data transmission between the core module 1021 and other devices, for example, can be used to download the initial program of the FPGA.
  • the network interface of the expansion module 1022 is used to communicate with the host computer, program downloading and data transmission, etc., for example, it can be used to obtain the data input to the above neural network model through the network, etc.
  • the network interface can be an RJ45 Ethernet interface (USB- C.
  • the USB port replaces RJ45, the universality of the extended interface); the UART port is used to debug the expansion module 1022 and print related debugging information; the GPIO port can provide additional I/O interfaces for remote serial communication or control, for example, The camera or microphone is controlled through the GPIO port; the core module 1021 and the expansion module 1022 are connected and communicated through the SAMTEC interface, so that the core module 1021 can call the resources of the expansion module 1022 to implement corresponding functions.
  • the implementation of all or part of the subsystems in the above-mentioned embodiment system can be accomplished by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer readable storage medium.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 2 includes a memory 21, a processor 22, and a network interface 23 that communicate with each other through a system bus. It should be pointed out that the figure only shows the computer device 2 with components 21-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes but is not limited to microprocessors, dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 21 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, for example, a plug-in hard disk, a smart media card (SMC), and a secure digital device equipped on the computer device 2.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store an operating system and various application software installed in the computer device 2, such as program codes of an artificial intelligence application development system.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the program code of the artificial intelligence application development system.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
  • This application also provides another implementation manner, that is, a computer-readable storage medium storing a program of an artificial intelligence application development system, and the program of the artificial intelligence application development system can be at least One processor executes, so that the at least one processor executes the steps of the program of the artificial intelligence application development system described above to realize corresponding functions.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

一种人工智能应用开发系统(100)、计算机设备及存储介质,属于人工智能领域,系统(100)包括:用于构建训练并验证神经网络模型的神经网络生成子系统(101)、用于接受输入所述神经网络模型的数据并经所述神经网络模型计算后输出结果的神经网络硬件执行子系统(102)、用于将所述神经网络生成子系统(101)生成的神经网络模型编译后部署到所述神经网络硬件执行子系统(102)的部署子系统(103)。通过可视化的神经网络生成子系统(101)构建训练神经网络模型,并将该训练好的神经网络模型通过部署子系统(103)自动部署到神经网络硬件执行子系统(102)进行执行,可以降低人工智能应用开发的门槛,提高开发效率。

Description

人工智能应用开发系统、计算机设备及存储介质 技术领域
本申请涉及人工智能技术领域,尤其涉及人工智能应用开发系统、计算机设备及存储介质。
背景技术
目前,随着大数据时代的来临,数据呈爆发式增长。面对海量的数据,相比于以前手工提取数据特征的方式,更加倾向于采用能够提高特征完备性的人工智能深度学习(神经网络)技术,可以有效避免手工提取繁杂性和低效率。且随着深度学习技术在众多领域中发挥越来越重要的作用,如图像识别、语音识别及智能管理等领域,很多领域的应用场景对数据标注、算法模型搭建、模型训练、算法部署、硬件设备的性能、功耗等有着越来越严格的要求,因此对应用开发者的开发技能要求很高,使得许多应用开发者们望而却步,特别是对于刚刚踏入这个领域的新手,耗费的成本会很高,而开发的效率却很低。
发明内容
本申请实施例的目的在于提出一种人工智能应用开发系统、计算机设备及存储介质,以降低人工智能应用开发的门槛,提高开发效率。
为了解决上述技术问题,本申请实施例提供一种人工智能应用开发系统,采用了如下所述的技术方案:
所述人工智能应用开发系统包括:
神经网络生成子系统,用于构建、训练并验证神经网络模型;
神经网络硬件执行子系统,用于接受输入所述神经网络模型的数据,经所述神经网络模型计算后输出结果;
部署子系统,用于将所述神经网络生成子系统生成的神经网络模型编译 后部署到所述神经网络硬件执行子系统。
进一步的,所述神经网络生成子系统还用于为所述神经网络模型提供训练数据并对训练数据进行标注。
进一步的,所述神经网络硬件执行子系统基于FPGA实现。
进一步的,所述部署子系统包括:
编译模块,用于对所述神经网络模型进行解析并生成模型的结构文件和数据文件;
运行模块,用于根据所述模型的结构文件和数据文件分配硬件计算资源;
驱动模块,用于根据所述运行模块的分配结果调用对应的硬件计算资源,所述硬件计算资源包括所述基于FPGA实现的神经网络硬件执行子系统。
进一步的,所述运行模块的根据所述模型的结构文件和数据文件分配硬件计算资源,包括:
根据所述模型的结构文件和数据文件获取每个计算节点的信息;
基于所述每个计算节点的信息给每个计算节点分配硬件计算资源。
进一步的,所述基于FPGA实现的神经网络硬件执行子系统包括FPGA核心模块和扩展模块。
进一步的,所述FPGA核心模块包括核心芯片、内存芯片、SAMTEC接口以及JTAG接口。
进一步的,所述扩展模块包括网络接口、UART口、GPIO口以及SAMTEC接口,所述FPGA核心模块和扩展模块通过所述SAMTEC接口进行连接和通信。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:
所述计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现本申请实施例中提出的任一项所述的人工智能应用开发系统的功能。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:
所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例中提出的任一项所述的人工智能应用开发系统的功能。
与现有技术相比,本申请实施例主要有以下有益效果:提供一种人工智能应用开发系统,系统包括用于构建训练并验证神经网络模型的神经网络生成子系统、用于接受输入所述神经网络模型的数据并经所述神经网络模型计算后输出结果的神经网络硬件执行子系统、用于将所述神经网络生成子系统生成的神经网络模型编译后部署到所述神经网络硬件执行子系统的部署子系统。通过可视化的神经网络生成子系统构建训练神经网络模型,并将该训练好的神经网络模型通过部署子系统自动部署到神经网络硬件执行子系统进行执行,可以降低人工智能应用开发的门槛,提高开发效率。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本申请的人工智能应用开发系统100的一个实施例的结构示意图;
图2示出了根据本申请的人工智能应用开发系统的部署子系统103的一个实施例的结构示意图;
图3示出了根据本申请的人工智能应用开发系统的神经网络硬件执行子系统102的一个实施例的结构示意图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技 术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
如图1所示,图1示出了根据本申请的人工智能应用开发系统的一个实施例的结构示意图。所述的人工智能应用开发系统100,包括:
神经网络生成子系统101,用于构建、训练并验证神经网络模型。
其中,神经网络模型的构建有两种方法,一种是基于标注数据自动生成的深度学习神经网络算法,一种是用户根据需求可进行自定义定制的神经网络算法模型;神经网络模型训练是用标注好的数据对搭建的神经网络算法模型进行迭代训练,使模型的损失值收敛到最小;神经网络模型验证指通过验证数据对训练好的神经网络模型进行效果验证,用户可上传图像数据、语音数据等作为模型的输入,经模型检测识别后输出结果,可验证模型的效果,及识别的准确率和识别速度等。在本实施例中,神经网络生成子系统101可以通过WEB(网页)技术提供可视化的界面帮助开发人员快速开发神经网络模型,并对模型进行训练和验证,即开发人员通过网页访问上述神经网络生成子系统101提供的接口来获得相应的服务,如神经网络模型的构建等。通过网页提供可视化的神经网络模型的构建、训练和验证,可以提高开发人员的 开发效率。
神经网络硬件执行子系统102,用于接受输入所述神经网络模型的数据,经所述神经网络模型计算后输出结果。
其中,神经网络硬件执行子系统102可以是已经存储并可以执行上述神经网络模型101的通用处理器(如CPU等),或可以是固化了上述神经网络模型101的专用处理器(如FPGA等);上述神经网络硬件执行子系统102除了提供硬件计算资源外,还可以提供网络接口或其他接口接收并存储外界输入的数据,然后输入到上述神经网络模型进行计算,即提取特征、分类或聚类、回归或者预测等,得到预测或识别结果。
部署子系统103,用于将所述神经网络生成子系统101生成的神经网络模型编译后部署到所述神经网络硬件执行子系统102。
其中,神经网络模型包含神经网络图(神经网络结构)及对应该结构的参数,其中神经网络的结构是以层为计算单元的,包含且不限于卷积层、池化层、ReLU(激活函数)、全连接层等。神经网络结构中的每一层除了接收上一层输出的数据流外还具有大量的参数,这些参数包含且不限于:weight(权重)、bias(偏置)等。在本实施例中,将上述神经网络模型通过编译器(如TVM等)编译成模型文件(包括模型的结构文件和数据文件),并根据上述模型文件自动分配对应模型所需要的硬件资源,例如计算单元和缓存单元以及可进行时序优化的流水线单元等,即从上述神经网络硬件执行子系统102调用上述硬件资源然后执行。
在本发明实施例中,提供一种人工智能应用开发系统,包括用于构建训练并验证神经网络模型的神经网络生成子系统、用于接受输入所述神经网络模型的数据并经所述神经网络模型计算后输出结果的神经网络硬件执行子系统、用于将所述神经网络生成子系统生成的神经网络模型编译后部署到所述神经网络硬件执行子系统的部署子系统。通过可视化的神经网络生成子系统构建训练神经网络模型,并将该训练好的神经网络模型通过部署子系统自动部署到神经网络硬件执行子系统进行执行,可以降低人工智能应用开发的门 槛,提高开发效率。
进一步的,所述神经网络生成子系统101还用于为所述神经网络模型提供训练数据并对训练数据进行标注。
在本实施例中,神经网络生成子系统101还可以为开发者提供新建数据库、上传数据、数据标注等功能模块,为后续的神经网络模型训练准备数据,并通过标注好的数据可以使模型更快地训练好。
进一步的,所述神经网络硬件执行子系统102基于FPGA实现。
其中,与GPU及ASIC固定的硬件结构不同,FPGA具有可编程性,开发者可以根据自己的需要通过编程将FPGA内部的逻辑块连接起来,实现相应的功能比较自由灵活。另外,GPU加速设计是算法模型适应硬件结构,而FPGA的加速设计是硬件结构适应算法模型,即根据算法模型设计(或调用)对应的硬件结构,这种加速设计方式可以更快速的去加速深度学习神经网络算法模型。此外,相比于GPU,FPGA有较好的能效比。虽然ASIC在性能和功耗上优于FPGA,但是其在设计和制造时需要经过很多的验证和物理设计,导致开发周期较长,同时ASIC是针对某一类应用而设计的专用硬件且硬件结构在生成后无法改变,然而目前深度学习神经网络算法正处于快速发展的阶段,对于一些使用广泛但算法并不成熟的应用场景,想要设计一个高性能的通用ASIC来适应所有应用场景非常困难。FPGA更适合加速目前处于快速发展阶段的深度学习神经网络算法模型。因此,本实施例中神经网络硬件执行子系统102利用FPGA,可以加速深度学习神经网络的执行效率。
进一步的,如图2所示,图2示出了根据本申请的人工智能应用开发系统的部署子系统103的一个实施例的结构示意图。所述部署子系统103包括:
编译模块1031,用于对所述神经网络模型进行解析并生成模型的结构文件和数据文件;
运行模块1032,用于根据所述模型的结构文件和数据文件分配硬件计算资源;
驱动模块1033,用于根据所述运行模块的分配结果调用对应的硬件计算 资源,所述硬件计算资源包括所述基于FPGA实现的神经网络硬件执行子系统。
在本实施例中,编译模块1031可以根据上述神经网络生成子系统101生成的神经网络模型的结构,调用神经网络编译器(如TVM等)对上述神经网络模型进行解析,提取模型的网络结构和权重数据并保存到文件,得到模型的结构文件和数据文件,文件的格式可以是json或者xml等;运行模块1032可以根据上述神经网络模型的结构文件和数据文件,自动分配硬件计算资源,包括计算单元和缓存单元以及可进行时序优化的流水线单元等;然后通过驱动模块1033调用上述利用FPGA实现的神经网络硬件执行子系统102提供的对应硬件计算资源进行计算并输出计算结果;其中,神经网络输出的结果是特征值,可以理解为是对于输入图片或数据的一种抽象表征,然后通过一些计算方法将抽象的表征即特征值转换为有意义的输出,如分类问题中图片类别及对应的概率,检测问题中,图片中包含的目标类别、概率及坐标等。通过部署子系统103的三个子系统,可以实现上述神经网络模型的自动编译、灵活调度硬件计算资源以及性能优化。
进一步的,所述运行模块的根据所述模型的结构文件和数据文件分配硬件计算资源,包括:
根据所述模型的结构文件和数据文件获取每个计算节点的信息;
基于所述每个计算节点的信息给每个计算节点分配硬件计算资源。
在本实施例中,神经网络模型的结构是以层为计算单元的,包含且不限于输入层、卷积层、池化层、ReLU(激活函数)、全连接层等,不同的神经网络通过不同类型和不同数量的层进行组合形成有不同功能的神经网络结构;神经网络结构中的每一层除了接收上一层输出的数据流外还具有大量的参数,这些参数包含且不限于:weight(权重)、bias(偏置)等。模型的网络结构和参数数据可以通过文件进行存储,在计算每一层的每个节点的时候作为节点信息读取出来,并根据该节点信息,可以动态分配对应节点所需要的硬件资源,例如根据节点的计算函数和数据类型,分配相应的计算单元和存储单元进行计算操作,并将计算结果通过寄存器缓存单元存储起来,方便下一 层快速读取,节省数据的拷贝时间,加速神经网络的计算速度,还可通过流水线单元对神经网络的计算进行时序优化等,从而可以提高神经网络计算的效率。
进一步的,如图3所示,图3示出了根据本申请的人工智能应用开发系统的神经网络硬件执行子系统102的一个实施例的结构示意图。所述基于FPGA实现的神经网络硬件执行子系统102包括FPGA核心模块1021和扩展模块1022。其中,所述FPGA核心模块1021包括核心芯片10211、内存芯片10212、SAMTEC接口10214以及6针脚的JTAG接口10213;所述扩展模块1022包括网络接口10222、3针脚的UART口10223、40针脚的GPIO口10224以及SAMTEC接口10221,所述FPGA核心模块1021和扩展模块1022通过所述核心模块1021的SAMTEC接口10214和扩展模块1022的SAMTEC接口10221进行连接和通信。
在本实施例中,上述核心芯片用于提供计算资源,实现神经网络的计算,可以采用intel arria 10Soc FPGA作为核心芯片;内存芯片用于存储神经网络的权重等参数数据和中间的计算数据等;JTAG接口可用于核心模块1021与其他设备之间的数据传输,例如可用于下载FPGA的初始程序。扩展模块1022的网络接口用来与上位机进行通信、程序下载和数据传输等,例如可以用来通过网络获取输入上述神经网络模型的数据等,该网络接口可以是RJ45以太网接口(可用USB-C、USB口替代RJ45,扩展接口的通用性);UART口用于扩展模块1022的调试、打印相关调试信息;GPIO口可以提供额外的I/O接口进行远端串行通信或控制,例如可以通过该GPIO口对摄像头或者麦克风等设备进行控制;核心模块1021和扩展模块1022通过所述SAMTEC接口进行连接和通信,使得核心模块1021可以调用扩展模块1022的资源,实现对应的功能。
本领域普通技术人员可以理解实现上述实施例系统中的全部或部分子系统,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可实现包括如上述各子系统的实施例的功能。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random  Access Memory,RAM)等。
应该理解的是,虽然附图的结构示意图中的各个子系统按照箭头的指示依次显示,但是这些子系统并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些子系统的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的结构示意图中的至少一部分子系统在执行时可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。
所述计算机设备2包括通过系统总线相互通信连接存储器21、处理器22、网络接口23。需要指出的是,图中仅示出了具有组件21-23的计算机设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述计算机 设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,所述存储器21也可以是所述计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述计算机设备2的操作系统和各类应用软件,例如人工智能应用开发系统的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述计算机设备2的总体操作。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述人工智能应用开发系统的程序代码。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述计算机设备2与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有人工智能应用开发系统的程序,所述人工智能应用开发系统的程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的人工智能应用开发系统的程序的步骤,实现相应的功能。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光 盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (10)

  1. 一种人工智能应用开发系统,其特征在于,包括:
    神经网络生成子系统,用于构建、训练并验证神经网络模型;
    神经网络硬件执行子系统,用于接受输入所述神经网络模型的数据,经所述神经网络模型计算后输出结果;
    部署子系统,用于将所述神经网络生成子系统生成的神经网络模型编译后部署到所述神经网络硬件执行子系统。
  2. 如权利要求1所述的系统,其特征在于,所述神经网络生成子系统还用于为所述神经网络模型提供训练数据并对训练数据进行标注。
  3. 如权利要求1所述的系统,其特征在于,所述神经网络硬件执行子系统基于FPGA实现。
  4. 如权利要求3所述的系统,其特征在于,所述部署子系统包括:
    编译模块,用于对所述神经网络模型进行解析并生成模型的结构文件和数据文件;
    运行模块,用于根据所述模型的结构文件和数据文件分配硬件计算资源;
    驱动模块,用于根据所述运行模块的分配结果调用对应的硬件计算资源,所述硬件计算资源包括所述基于FPGA实现的神经网络硬件执行子系统。
  5. 如权利要求4所述的系统,其特征在于,所述运行模块的根据所述模型的结构文件和数据文件分配硬件计算资源,包括:
    根据所述模型的结构文件和数据文件获取每个计算节点的信息;
    基于所述每个计算节点的信息给每个计算节点分配硬件计算资源。
  6. 如权利要求5所述的系统,其特征在于,所述基于FPGA实现的神经网络硬件执行子系统包括FPGA核心模块和扩展模块。
  7. 如权利要求6所述的系统,其特征在于,所述FPGA核心模块包括核心芯片、内存芯片、SAMTEC接口以及JTAG接口。
  8. 如权利要求7所述的系统,其特征在于,所述扩展模块包括网络接口、UART口、GPIO口以及SAMTEC接口,所述FPGA核心模块和扩展模块通过所 述SAMTEC接口进行连接和通信。
  9. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8中任一项所述的人工智能应用开发系统的功能。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的人工智能应用开发系统的功能。
PCT/CN2019/101684 2019-08-21 2019-08-21 人工智能应用开发系统、计算机设备及存储介质 WO2021031137A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980066985.9A CN113168552A (zh) 2019-08-21 2019-08-21 人工智能应用开发系统、计算机设备及存储介质
PCT/CN2019/101684 WO2021031137A1 (zh) 2019-08-21 2019-08-21 人工智能应用开发系统、计算机设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/101684 WO2021031137A1 (zh) 2019-08-21 2019-08-21 人工智能应用开发系统、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021031137A1 true WO2021031137A1 (zh) 2021-02-25

Family

ID=74659981

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/101684 WO2021031137A1 (zh) 2019-08-21 2019-08-21 人工智能应用开发系统、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN113168552A (zh)
WO (1) WO2021031137A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900734B (zh) * 2021-10-11 2023-09-22 北京百度网讯科技有限公司 一种应用程序文件配置方法、装置、设备及存储介质
CN114449295A (zh) * 2022-01-30 2022-05-06 京东方科技集团股份有限公司 视频处理方法、装置、电子设备及存储介质
CN114282641B (zh) * 2022-03-07 2022-07-05 麒麟软件有限公司 一种通用异构加速框架的构建方法
CN117709400A (zh) * 2022-09-02 2024-03-15 深圳忆海原识科技有限公司 层次化系统、运算方法、运算装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022472A (zh) * 2016-05-23 2016-10-12 复旦大学 一种嵌入式深度学习处理器
CN107992299A (zh) * 2017-11-27 2018-05-04 郑州云海信息技术有限公司 神经网络超参数提取转换方法、系统、装置及存储介质
CN108881446A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于深度学习的人工智能平台系统
CN108920177A (zh) * 2018-06-28 2018-11-30 郑州云海信息技术有限公司 深度学习模型配置文件到fpga配置文件的映射方法
CN108921289A (zh) * 2018-06-20 2018-11-30 郑州云海信息技术有限公司 一种fpga异构加速方法、装置及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664766B2 (en) * 2016-01-27 2020-05-26 Bonsai AI, Inc. Graphical user interface to an artificial intelligence engine utilized to generate one or more trained artificial intelligence models
CN108762768B (zh) * 2018-05-17 2021-05-18 烽火通信科技股份有限公司 网络服务智能化部署方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022472A (zh) * 2016-05-23 2016-10-12 复旦大学 一种嵌入式深度学习处理器
CN107992299A (zh) * 2017-11-27 2018-05-04 郑州云海信息技术有限公司 神经网络超参数提取转换方法、系统、装置及存储介质
CN108921289A (zh) * 2018-06-20 2018-11-30 郑州云海信息技术有限公司 一种fpga异构加速方法、装置及系统
CN108881446A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于深度学习的人工智能平台系统
CN108920177A (zh) * 2018-06-28 2018-11-30 郑州云海信息技术有限公司 深度学习模型配置文件到fpga配置文件的映射方法

Also Published As

Publication number Publication date
CN113168552A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2021031137A1 (zh) 人工智能应用开发系统、计算机设备及存储介质
Kuchaiev et al. Nemo: a toolkit for building ai applications using neural modules
US11900113B2 (en) Data flow processing method and related device
US10025566B1 (en) Scheduling technique to transform dataflow graph into efficient schedule
CN101231589B (zh) 用于原位开发嵌入式软件的系统和方法
CN112748914B (zh) 一种应用程序开发方法、装置、电子设备和存储介质
US11568232B2 (en) Deep learning FPGA converter
WO2020062086A1 (zh) 选择处理器的方法和装置
JPH11513512A (ja) ディジタル信号プロセッサの製造方法
JP2014504767A (ja) 正規表現をコンパイルするための方法および装置
US10761822B1 (en) Synchronization of computation engines with non-blocking instructions
US8725486B2 (en) Apparatus and method for simulating a reconfigurable processor
CN114548384A (zh) 具有抽象资源约束的脉冲神经网络模型构建方法和装置
US10970449B2 (en) Learning framework for software-hardware model generation and verification
Hao et al. The implementation of a deep recurrent neural network language model on a Xilinx FPGA
US20230120227A1 (en) Method and apparatus having a scalable architecture for neural networks
US20210247997A1 (en) Method for data center storage evaluation framework simulation
CN113435582B (zh) 基于句向量预训练模型的文本处理方法及相关设备
CN115269204B (zh) 一种用于神经网络编译的内存优化方法及装置
CN112148276A (zh) 用于深度学习的可视化编程
TW202111610A (zh) 定義和執行用於指定神經網路架構的程式碼之技術
US7984416B2 (en) System and method for providing class definitions in a dynamically typed array-based language
CN118014022A (zh) 面向深度学习的fpga通用异构加速方法及设备
CN116796678A (zh) 一种基于解析式技术的fpga布局方法
US11061654B1 (en) Synchronization of concurrent computation engines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941813

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 090822)

122 Ep: pct application non-entry in european phase

Ref document number: 19941813

Country of ref document: EP

Kind code of ref document: A1