WO2019136755A1

WO2019136755A1 - Method and system for optimizing design model of artificial intelligence processing device, storage medium, and terminal

Info

Publication number: WO2019136755A1
Application number: PCT/CN2018/072668
Authority: WO
Inventors: 肖梦秋
Original assignee: 深圳鲲云信息科技有限公司
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2019-07-18

Abstract

A method and system for optimizing a design model of an artificial intelligence processing device, a storage medium, and a terminal. The method comprises the following steps: solidifying deep learning network model data on the basis of a recognition accuracy of an artificial intelligence processing device (S1); quantifying the solidified deep learning network model data on the basis of the recognition accuracy of the artificial intelligence processing device (S2); and generating a deep learning data diagram according to the solidified deep learning network model data and the quantified deep learning network model data (S3). According to the method and system for optimizing the design model of the artificial intelligence processing device, the storage medium, and the terminal, a deep learning algorithm is optimized so as to be run on the artificial intelligence processing device.

Description

Artificial intelligence processing device design model optimization method, system, storage medium, terminal

Technical field

The present invention relates to the technical field of software processing, and in particular, to an artificial intelligence processing device design model optimization method, system, storage medium, and terminal.

Background technique

The concept of deep learning stems from the study of artificial neural networks. A multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.

Deep learning is a method based on the representation of data in machine learning. Observations (e.g., an image) can be represented in a variety of ways, such as a vector of each pixel intensity value, or more abstractly represented as a series of edges, regions of a particular shape, and the like. It is easier to learn tasks from instances (eg, face recognition or facial expression recognition) using some specific representation methods. The advantage of deep learning is the use of unsupervised or semi-supervised feature learning and hierarchical feature extraction efficient algorithms instead of manual acquisition features.

Like machine learning methods, deep machine learning methods also have supervised learning and unsupervised learning. The learning models established under different learning frameworks are very different. For example, Convolutional Neural Networks (CNN) is a kind of depth. The machine learning model under the supervision of learning, and Deep Belief Nets (DBN) is a machine learning model under unsupervised learning.

At present, CNN has become one of the research hotspots in many scientific fields, especially in the field of pattern classification. Since the network avoids the complicated pre-processing of images, it can directly input the original image, and thus has been widely used. Generally, the basic structure of the CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal. The feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance. In addition, since the neurons on one mapping surface share weights, the number of network free parameters is reduced. Each convolutional layer in the convolutional neural network is followed by a computational layer for local averaging and quadratic extraction. This unique two-feature extraction structure reduces feature resolution.

CNN is mainly used to identify two-dimensional graphics of displacement, scaling and other forms of distortion invariance. Since the feature detection layer of the CNN learns through the training data, when the CNN is used, the feature extraction of the display is avoided, and the learning data is implicitly learned; and the weights of the neurons on the same feature mapping surface are the same. So the network can learn in parallel, which is also a big advantage of the convolutional network relative to the neural network connected to each other. The convolutional neural network has unique advantages in speech recognition and image processing with its special structure of local weight sharing. Its layout is closer to the actual biological neural network, and weight sharing reduces the complexity of the network, especially multidimensional. The feature that the input vector image can be directly input into the network avoids the complexity of data reconstruction during feature extraction and classification.

Therefore, how to optimize the deep learning algorithm to enable it to become one of the current hot research topics in hardware.

Summary of the invention

In view of the above disadvantages of the prior art, an object of the present invention is to provide an artificial intelligence processing device design model optimization method, system, storage medium, and terminal, which can be processed in artificial intelligence by optimizing the depth learning algorithm. Run on the device.

To achieve the above and other related objects, the present invention provides an artificial intelligence processing device design model optimization method, which includes the following steps: solidifying the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device; based on the artificial intelligence processing device The recognition accuracy rate quantifies the deep learning network model data after the solidification; and generates a deep learning data map according to the solidified network model data after the solidification and the quantized depth learning network model data.

In an embodiment of the invention, the method further includes performing the evaluation of the deep learning network model according to the solidified network model data after the solidification and the quantized depth learning network model data.

In an embodiment of the invention, the deep learning network model data is 32 bit fixed point data or 32 bit floating point data.

In an embodiment of the invention, the deep learning network model adopts a Tensorflow training model.

Correspondingly, the present invention provides an artificial intelligence processing device design model optimization system, including a curing module, a quantization module, and a generation module;

The curing module is configured to solidify the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device;

The quantization module is configured to quantize the solidified network model data after curing based on a recognition accuracy of the artificial intelligence processing device;

The generating module is configured to generate a depth learning data map according to the solidified network model data after the solidification and the quantized depth learning network model data.

In an embodiment of the invention, the method further includes an evaluation module, configured to perform the evaluation of the deep learning network model according to the solidified network model data after the solidification and the quantized depth learning network model data.

The present invention provides a storage medium having stored thereon a computer program that, when executed by a processor, implements the above-described artificial intelligence processing device design model optimization method.

Finally, the present invention provides a terminal, including: a processor and a memory;

The memory is for storing a computer program;

The processor is configured to execute the computer program stored in the memory to cause the terminal to execute the artificial intelligence processing device design model optimization method.

As described above, the artificial intelligence processing device design model optimization method, system, storage medium, and terminal of the present invention have the following beneficial effects:

(1) By optimizing the depth learning algorithm, it can be run on the artificial intelligence processing device;

(2) The optimization efficiency is high and the practicability is strong.

DRAWINGS

1 is a flow chart showing an optimization method of a design method of an artificial intelligence processing device according to an embodiment of the present invention;

2 is a schematic structural view of an artificial intelligence processing device design model optimization system according to an embodiment of the present invention;

FIG. 3 is a schematic structural view of a terminal according to an embodiment of the present invention.

Component label description

21 curing module

22 Quantization module

23 generation module

31 processor

32 memory

Detailed ways

The embodiments of the present invention are described below by way of specific examples, and those skilled in the art can readily understand other advantages and effects of the present invention from the disclosure of the present disclosure. The present invention may be embodied or applied in various other specific embodiments, and various modifications and changes can be made without departing from the spirit and scope of the invention.

It should be noted that the illustrations provided in the present embodiment merely illustrate the basic concept of the present invention in a schematic manner, and only the components related to the present invention are shown in the drawings, instead of the number and shape of components in actual implementation. Dimensional drawing, the actual type of implementation of each component's type, number and proportion can be a random change, and its component layout can be more complicated.

The artificial intelligence processing device design model optimization method, system, storage medium and terminal of the invention optimize the deep learning algorithm to enable it to run on the artificial intelligence processing device, and the optimization efficiency is high and the utility is strong.

As shown in FIG. 1 , in an embodiment, the artificial intelligence processing device design model optimization method of the present invention comprises the following steps:

Step S1: solidifying the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device.

In an embodiment of the invention, the deep learning network model data is 32 bit fixed point data or 32 bit floating point data. Of course, you can also extend the data width to suit different needs.

In an embodiment of the invention, the artificial intelligence processing device comprises a programmable logic device such as an FPGA.

Specifically, in order to enable the deep learning network model data to be adapted to the recognition accuracy of the artificial intelligence processing device, it is necessary to perform precision compression on the deep learning network model data. Therefore, the deep learning network data is first subjected to a hardening operation.

Specifically, curing, ie, freezing, means that the graph structure of the deep learning network model and the weight of the model are solidified together.

Step S2: Quantify the solidified network model data after curing based on the recognition accuracy of the artificial intelligence processing device.

Specifically, for the deep learning network model data after curing, further quantization processing is required.

In the field of digital signal processing, quantization refers to the process of approximating a continuous value of a signal (or a large number of possible discrete values) to a finite number (or fewer) of discrete values. Quantization is mainly used in the conversion from continuous signals to digital signals. The continuous signal is sampled into a discrete signal, and the discrete signal is quantized to become a digital signal. Note that discrete signals do not usually require a quantized process, but may not be discrete in the range or require a quantized process.

Specifically, the present invention quantizes the solidified learning network model data after curing using a certain quantization algorithm. For those skilled in the art, the quantification belongs to the mature prior art, and therefore will not be described herein.

Step S3: Generate a depth learning data map according to the solidified network model data after the solidification and the quantized deep learning network model data.

Specifically, in order to generate data adapted to the artificial intelligence processing device, the deep learning data map is further generated according to the solidified network model data after the solidification and the quantized depth learning network model data. For example, when the deep learning network model adopts the Tensorflow training model, a Tensorflow map is generated. Tensorflow is Google's second-generation artificial intelligence learning system based on DistBelief. Its name is derived from its operating principle. Tensor means an N-dimensional array. Flow means that based on the calculation of the data flow graph, Tensorflow flows from one end of the flow graph to the other. Tensorflow is a system that transmits complex data structures to an artificial intelligence neural network for analysis and processing.

In an embodiment of the present invention, the artificial intelligence processing device design model optimization method of the present invention further includes performing the deep learning network model according to the solidified network model data after the solidification and the quantized depth learning network model data. evaluation of. The deep learning network is evaluated to determine whether it is adapted to the artificial intelligence processing device, and when the two are not adapted, it can be corrected by adjusting the curing and/or quantization algorithm.

As shown in FIG. 2, in an embodiment, the artificial intelligence processing device design model optimization system of the present invention includes a curing module 21, a quantization module 22, and a generation module 23.

The curing module 21 is configured to solidify the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device.

The quantization module 22 is coupled to the curing module 21 for quantifying the solidified network model data after curing based on the recognition accuracy of the artificial intelligence processing device.

The generating module 23 is connected to the curing module 21 and the quantization module 22, and configured to generate a depth learning data map according to the solidified network model data after the solidification and the quantized depth learning network model data.

In an embodiment of the present invention, the artificial intelligence processing device design model optimization system of the present invention further includes an evaluation module, configured to perform, according to the solidified network model data after the solidification and the quantized depth learning network model data. An assessment of the deep learning network model. The deep learning network is evaluated to determine whether it is adapted to the artificial intelligence processing device, and when the two are not adapted, it can be corrected by adjusting the curing and/or quantization algorithm.

It should be noted that the division of each module of the above system is only a division of logical functions, and the actual implementation may be integrated into one physical entity in whole or in part, or may be physically separated. And these modules can all be implemented by software in the form of processing component calls; or all of them can be implemented in hardware form; some modules can be realized by processing component calling software, and some modules are realized by hardware. For example, the x module may be a separately set processing element, or may be integrated in one of the above-mentioned devices, or may be stored in the memory of the above device in the form of program code, by a processing element of the above device. Call and execute the functions of the above x modules. The implementation of other modules is similar. In addition, all or part of these modules can be integrated or implemented independently. The processing elements described herein can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or an instruction in a form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above method, for example, one or more specific integrated circuits (ASICs), or one or more microprocessors (digitalsingnal processors, referred to as DSP), or one or more Field Programmable Gate Arrays (FPGAs). For another example, when one of the above modules is implemented by the processing component dispatcher code, the processing component may be a general-purpose processor, such as a central processing unit (CPU) or other processor that can call the program code. As another example, these modules can be integrated and implemented in the form of a system-on-a-chip (SOC).

The storage medium of the present invention stores a computer program, and when the program is executed by the processor, the artificial intelligence processing device design model optimization method is implemented. Preferably, the storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

As shown in FIG. 3, in an embodiment, the terminal of the present invention includes a processor 31 and a memory 32.

The memory 32 is used to store a computer program.

Preferably, the memory 32 includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

The processor 31 is coupled to the memory 32 for executing a computer program stored by the memory 32 to cause the terminal to execute the artificial intelligence processing device design model optimization method.

Preferably, the processor 31 may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), dedicated integration. Circuit (ApplicationSpecific Integrated Circuit, ASIC for short), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In summary, the artificial intelligence processing device design model optimization method, system, storage medium, and terminal of the present invention optimize the deep learning algorithm to enable it to run on the artificial intelligence processing device; the optimization efficiency is high and the utility is strong. . Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

The above-described embodiments are merely illustrative of the principles of the invention and its effects, and are not intended to limit the invention. Modifications or variations of the above-described embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, all equivalent modifications or changes made by those skilled in the art without departing from the spirit and scope of the invention will be covered by the appended claims.

Claims

An artificial intelligence processing device design model optimization method, comprising: the following steps:

The deep learning network model data is solidified based on the recognition accuracy of the artificial intelligence processing device;

The depth learning network model data after the curing is quantized based on the recognition accuracy of the artificial intelligence processing device;

Generating a deep learning data map according to the solidified network model data after the solidification and the quantized depth learning network model data.
The artificial intelligence processing device design model optimization method according to claim 1, further comprising: performing the deep learning network according to the solidified network model data after the solidification and the quantized depth learning network model data. Evaluation of the model.
The artificial intelligence processing device design model optimization method according to claim 1, wherein the deep learning network model data is 32-bit fixed point data or 32-bit floating point data.
The artificial intelligence processing device design model optimization method according to claim 1, wherein the deep learning network model adopts a Tensorflow training model.
An artificial intelligence processing device design model optimization system, comprising: a curing module, a quantifying module and a generating module;

The curing module is configured to solidify the deep learning network model data based on the recognition accuracy of the artificial intelligence processing device;

The quantization module is configured to quantize the solidified network model data after curing based on a recognition accuracy of the artificial intelligence processing device;

The generating module is configured to generate a depth learning data map according to the solidified network model data after the solidification and the quantized depth learning network model data.
The artificial intelligence processing device design model optimization system according to claim 5, further comprising: an evaluation module, configured to perform, according to the solidified network model data after the solidification and the quantized depth learning network model data. Evaluation of the deep learning network model.
The artificial intelligence processing device design model optimization system according to claim 5, wherein the deep learning network model data is 32-bit fixed point data or 32-bit floating point data.
The artificial intelligence processing device design model optimization system according to claim 5, wherein the deep learning network model adopts a Tensorflow training model.
A storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the artificial intelligence processing device design model optimization method according to any one of claims 1 to 4.
A terminal, comprising: a processor and a memory;

The memory is for storing a computer program;

The processor is configured to execute the computer program stored in the memory to cause the terminal to perform the artificial intelligence processing device design model optimization method according to any one of claims 1 to 4.