CN108734288B - Operation method and device - Google Patents

Operation method and device Download PDF

Info

Publication number
CN108734288B
CN108734288B CN201710269049.0A CN201710269049A CN108734288B CN 108734288 B CN108734288 B CN 108734288B CN 201710269049 A CN201710269049 A CN 201710269049A CN 108734288 B CN108734288 B CN 108734288B
Authority
CN
China
Prior art keywords
data
model
neural network
processed
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710269049.0A
Other languages
Chinese (zh)
Other versions
CN108734288A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201710269049.0A priority Critical patent/CN108734288B/en
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to EP19214320.4A priority patent/EP3654172A1/en
Priority to KR1020197025307A priority patent/KR102292349B1/en
Priority to EP18788355.8A priority patent/EP3614259A4/en
Priority to EP19214371.7A priority patent/EP3786786B1/en
Priority to CN201880000923.3A priority patent/CN109121435A/en
Priority to KR1020197038135A priority patent/KR102258414B1/en
Priority to JP2019549467A priority patent/JP6865847B2/en
Priority to CN201811097653.0A priority patent/CN109376852B/en
Priority to US16/476,262 priority patent/US11531540B2/en
Priority to PCT/CN2018/083415 priority patent/WO2018192500A1/en
Publication of CN108734288A publication Critical patent/CN108734288A/en
Priority to US16/697,687 priority patent/US11734002B2/en
Priority to US16/697,727 priority patent/US11698786B2/en
Priority to US16/697,637 priority patent/US11720353B2/en
Priority to US16/697,533 priority patent/US11531541B2/en
Priority to JP2019228383A priority patent/JP6821002B2/en
Application granted granted Critical
Publication of CN108734288B publication Critical patent/CN108734288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/35Creation or generation of source code model driven
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Machine Translation (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

An arithmetic method and a device thereof, the arithmetic device comprises an input module used for inputting data; the model generation module is used for constructing a model according to the input data; the neural network operation module is used for generating an operation instruction based on the model, caching the operation instruction, and operating the data to be processed according to the operation instruction to obtain an operation result; and the output module is used for outputting the operation result. The device and the method can avoid the additional expense brought by the whole software architecture in the traditional method.

Description

Operation method and device
Technical Field
The present disclosure relates to the field of computer architecture, deep learning and neural networks, and more particularly, to an operation method and apparatus.
Background
Deep learning is a branch of machine learning that attempts to use algorithms that involve high-level abstractions of data using multiple processing layers that contain complex structures or are composed of multiple nonlinear transformations.
Deep learning is a method based on characterization learning of data in machine learning. An observation (e.g., an image) may be represented using a number of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, a specially shaped region, etc. Tasks (e.g., face recognition or facial expression recognition) are more easily learned from the examples using some specific representation methods.
Several deep learning architectures, such as deep neural networks, convolutional neural networks, deep belief networks, and recurrent neural networks, have been used in the fields of computer vision, speech recognition, natural language processing, audio recognition, and bioinformatics, and have achieved excellent results. In addition, deep learning has become a similar term, or brand remodeling of neural networks.
With the heat of deep learning (neural network), the neural network accelerator also works, and through the design of a special memory and an operation module, the neural network accelerator can obtain an acceleration ratio which is dozens of times or even hundreds of times that of a general processor when the neural network accelerator performs deep learning operation, and has smaller area and lower power consumption.
In order to facilitate the application of neural network accelerators to various network architectures for accelerating operations, programming software libraries and programming frameworks based thereon have also been and are being developed. In a conventional application scenario, a programming framework of a neural network accelerator is usually located at the uppermost layer, and currently, commonly used programming frameworks include Caffe, tensrflow, Torch, and the like, as shown in fig. 1, a neural network accelerator (dedicated hardware for neural network operation), a hardware driver (for software to call the neural network accelerator), a programming library of the neural network accelerator (for providing an interface for calling the neural network accelerator), a programming framework of the neural network accelerator, and a high-level application that needs to perform the neural network operation are sequentially arranged from the bottom layer to the upper layer. In some application scenarios with low memory and strong real-time performance, running the whole software architecture consumes excessive computing resources. Therefore, how to optimize the operation process for a specific application scenario is one of the problems to be solved.
Disclosure of Invention
Based on the above problems, the present disclosure is directed to a computing method and device for solving at least one of the above problems.
In order to achieve the above object, as one aspect of the present disclosure, the present disclosure proposes an arithmetic method comprising the steps of:
when the input data comprises data to be processed, network structure and weight data, executing the following steps:
step 11, inputting and reading input data;
step 12, constructing an offline model according to the network structure and the weight data;
step 13, analyzing the off-line model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 14, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;
when the input data comprises the data to be processed and the off-line model, executing the following steps:
step 21, inputting and reading input data;
step 22, analyzing the offline model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 23, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;
when the input data only comprises the data to be processed, the following steps are executed:
step 31, inputting and reading input data;
and step 32, calling the cached operation instruction, and operating the data to be processed to obtain an operation result for outputting.
Further, the step of obtaining the operation result by operating the data to be processed according to the operation instruction is realized by the neural network processing unit.
Further, the neural network processing unit has an instruction cache unit for caching the operation instruction for subsequent calculation call.
Further, the offline models include various neural network models including Cambridge _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model, and the like.
Further, the data to be processed is input which can be processed by a neural network.
Further, the data to be processed includes a continuous single picture, voice or video stream.
Further, the network structure includes AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN, ResNet, and possibly various neural network structures.
In order to achieve the above object, as another aspect of the present disclosure, the present disclosure proposes an arithmetic device comprising:
the input module is used for inputting data, and the data comprises data to be processed, network structure and weight data and/or offline model data;
the model generation module is used for constructing an offline model according to the input network structure and the weight data;
the neural network operation module is used for generating an operation instruction based on the offline model, caching the operation instruction, and operating the data to be processed based on the operation instruction to obtain an operation result;
the output module is used for outputting the operation result;
the control module is used for detecting the type of input data and executing the following operations:
when the input data comprises data to be processed, a network structure and weight data, the control input module inputs the network structure and the weight data into the model generation module to construct an offline model, and controls the neural network operation module to operate the data to be processed input by the input module based on the offline model input by the model generation module;
when the input data comprise data to be processed and an offline model, the control input module inputs the data to be processed and the offline model into the neural network operation module, controls the neural network operation module to generate and cache an operation instruction based on the offline model, and operates the data to be processed based on the operation instruction;
when the input data only comprises the data to be processed, the control input module inputs the data to be processed into the neural network operation module, and controls the neural network operation module to call the cached operation instruction to operate the data to be processed.
Further, the neural network operation module comprises a model analysis unit and a processing unit, wherein:
the model analysis unit is used for generating an operation instruction based on the offline model;
the neural network processing unit is used for caching the operation instruction for subsequent calculation and calling; or calling the cached operation instruction when the input data only comprises the data to be processed, and operating the data to be processed based on the operation instruction to obtain an operation result.
Further, the neural network processing unit has an instruction cache unit for caching the operation instruction for subsequent calculation call.
The operation method and the device provided by the disclosure have the following beneficial effects:
1. according to the method and the device, after the off-line model is generated, operation can be directly performed according to the off-line model, and extra overhead caused by running of the whole software framework including a deep learning framework is avoided;
2. the device and the method provided by the disclosure realize more efficient function reconstruction of the neural network processor, so that the neural network processor can fully exert the performance in an application environment with low memory and strong real-time performance, and the operation process is more concise and faster.
Drawings
FIG. 1 is a prior art programming framework;
fig. 2 is a flowchart illustrating an operation method according to an embodiment of the disclosure;
fig. 3 is a structural frame diagram of a computing device according to another embodiment of the disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
In this specification, the various embodiments described below are meant to be illustrative only and should not be construed in any way to limit the scope of the disclosure. The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of exemplary embodiments of the present disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but such details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Moreover, descriptions of well-known functions and constructions are omitted for clarity and conciseness. Moreover, throughout the drawings, the same reference numerals are used for similar functions and operations.
The present disclosure discloses an operation method, comprising the steps of:
when the input data comprises data to be processed, network structure and weight data, executing the following steps:
step 11, inputting and reading input data;
step 12, constructing an offline model according to the network structure and the weight data;
step 13, analyzing the off-line model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 14, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;
when the input data comprises the data to be processed and the off-line model, executing the following steps:
step 21, inputting and reading input data;
step 22, analyzing the offline model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 23, according to the operation instruction, operating the data to be processed to obtain an operation result for outputting;
when the input data only comprises the data to be processed, the following steps are executed:
step 31, inputting and reading input data;
and step 32, calling the cached operation instruction, and operating the data to be processed to obtain an operation result for outputting.
In some embodiments of the present disclosure, the neural network processing unit operates the data to be processed according to the operation instruction to obtain an operation result; preferably, the neural network processing unit has an instruction cache unit, configured to cache the received operation instruction, where the operation instruction cached in advance is an operation instruction of a previous operation cached by the instruction cache unit.
In some embodiments of the present disclosure, the neural network processing unit further includes a data caching unit, configured to cache the data to be processed.
Based on the above operation method, the present disclosure also discloses an operation device, including:
the input module is used for inputting data, and the data comprises data to be processed, network structure and weight data and/or offline model data;
the model generation module is used for constructing an offline model according to the input network structure and the weight data;
the neural network operation module is used for generating an operation instruction based on the offline model, caching the operation instruction, and operating the data to be processed based on the operation instruction to obtain an operation result;
the output module is used for outputting the operation result;
the control module is used for detecting the type of input data and executing the following operations:
when the input data comprises data to be processed, a network structure and weight data, the control input module inputs the network structure and the weight data into the model generation module to construct an offline model, and controls the neural network operation module to operate the data to be processed input by the input module based on the offline model input by the model generation module;
when the input data comprise data to be processed and an offline model, the control input module inputs the data to be processed and the offline model into the neural network operation module, controls the neural network operation module to generate and cache an operation instruction based on the offline model, and operates the data to be processed based on the operation instruction;
when the input data only comprises the data to be processed, the control input module inputs the data to be processed into the neural network operation module, and controls the neural network operation module to call the cached operation instruction to operate the data to be processed.
The neural network operation module comprises a model analysis unit and a neural network processing unit, wherein:
the model analysis unit is used for generating an operation instruction based on the offline model;
the neural network processing unit is used for caching the operation instruction for subsequent calculation and calling; or calling the cached operation instruction when the input data only comprises the data to be processed, and operating the data to be processed based on the operation instruction to obtain an operation result.
In some embodiments of the present disclosure, the neural network processing unit has an instruction cache unit, configured to cache the operation instruction for a subsequent computation call.
In some embodiments of the disclosure, the offline model is a text file defined according to a specific structure, and can be various neural network models, such as cambric _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model, etc., but not limited to these models provided in this embodiment.
In some embodiments of the present disclosure, the data to be processed is input that can be processed with a neural network, such as any of a continuous single picture, a voice or a video stream.
In some embodiments of the present disclosure, the network structure may be various neural network structures, such as AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN, ResNet, etc., but not limited to these structures proposed in this embodiment.
Specifically, according to the difference of the input data of the input module, the arithmetic device of the present disclosure has the following three working principles:
1. when the data input by the input module is the network structure, the weight data and the data to be processed, the control module controls the input module to transmit the network structure and the weight data to the model generation module and transmit the data to be processed to the model analysis module; the control module controls the model generation module to generate an offline model according to the network structure and the weight data and transmits the generated offline model to the model analysis unit; the control module controls the model analysis unit to analyze the received off-line model to obtain an operation instruction which can be identified by the neural network processing unit, and transmits the operation instruction and the data to be processed to the included neural network processing unit; the neural network processing unit operates the data to be processed according to the received operation instruction to obtain a determined operation result, and transmits the operation result to the output module for output.
2. When the data input by the input module is the offline model and the data to be processed, the control module controls the input module to directly transmit the offline model and the data to be processed to the model analysis unit, and the subsequent working principle is the same as that in the first case.
3. When the data input by the input module only contains the data to be processed, the control module controls the input module to directly transmit the data to be processed to the neural network processing unit through the model analysis unit, and the neural network processing unit performs operation on the data to be processed according to the cached operation instruction to obtain an operation result. This is typically not the case in first-use neural network processors to ensure that there are certain arithmetic instructions in the instruction cache.
Therefore, when the current network operation is different from the offline model of the last network operation, the data input by the input module comprises a network structure, weight data and data to be processed, and the model generation module generates a new offline model and then performs subsequent network operation; when the current network operation is the first network operation and a corresponding offline model is obtained in advance, the data input by the input module comprises the offline model and the data to be processed; when the current network operation is not the first time and is the same as the offline model of the last network operation, the data input by the input module only comprises the data to be processed.
In some embodiments of the present disclosure, the computing device described in the present disclosure is integrated as a sub-module into a central processor module of an entire computer system. The data to be processed and the off-line model are controlled by the central processing unit and transmitted to the arithmetic device. The model analysis unit analyzes the transmitted neural network offline model and generates an operation instruction. Then the operation instruction and the data to be processed are transmitted into the neural network processing unit, the operation result is obtained through operation processing, and the operation result is returned to the main memory unit. In the subsequent calculation process, the network structure is not changed any more, and the neural network calculation can be completed only by continuously transmitting data to be processed, so that an operation result is obtained.
The following describes the computing device and method proposed in the present disclosure in detail by specific embodiments.
Example 1
As shown in fig. 2, the present embodiment provides an operation method, including the following steps:
when the input data comprises data to be processed, network structure and weight data, executing the following steps:
step 11, inputting and reading input data;
step 12, constructing an offline model according to the network structure and the weight data;
step 13, analyzing the off-line model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 14, according to the operation instruction, operating the data to be processed to obtain a neural network operation result for outputting;
when the input data comprises the data to be processed and the off-line model, executing the following steps:
step 21, inputting and reading input data;
step 22, analyzing the offline model to obtain an operation instruction and caching the operation instruction for subsequent calculation and calling;
step 23, according to the operation instruction, operating the data to be processed to obtain a neural network operation result for outputting;
when the input data only comprises the data to be processed, the following steps are executed:
step 31, inputting and reading input data;
and step 32, calling the cached operation instruction, and operating the data to be processed to obtain a neural network operation result for outputting.
Processing the data to be processed according to the operation instruction through a neural network processing unit to obtain an operation result; the neural network processing unit is provided with an instruction cache unit and a data cache unit, and is used for caching the received operation instruction and the data to be processed respectively.
The input network structure provided in this embodiment is AlexNet, the weight data is bvlc _ alexnet.cafemodel, the data to be processed is a continuous single picture, and the offline model is Cambricon _ model.
In summary, the method provided by this embodiment can greatly simplify the operation flow using the neural network processor, and avoid the extra memory and IO overhead of calling the whole set of the conventional programming framework. By applying the method, the neural network accelerator can give full play to the operation performance in the environment with low memory and strong real-time performance.
Example 2
As shown in fig. 3, the present embodiment provides an arithmetic device, including: an input module 101, a model generation module 102, a neural network operation module 103, an output module 104 and a control module 105, wherein the neural network operation module 103 comprises a model analysis unit 106 and a neural network processor 107
The key word of the device is executed in an off-line mode, namely the key word is used for generating an off-line model, then directly generating a related operation instruction by using the off-line model, transmitting weight data and carrying out processing operation on data to be processed. More specifically:
the input module 101 is configured to input a combination of a network structure, weight data, and to-be-processed data, or a combination of an offline model and to-be-processed data. When the input is the network structure, the weight data and the data to be processed, the network structure and the weight data are transmitted to the model generation module 102 to generate an offline model for executing the following operations. When the input is the offline model and the data to be processed, the offline model and the data to be processed are directly transmitted to the model analysis unit 106 to perform the following operations.
The output module 104 is configured to output the determined operation data generated according to the specific network structure and the set of data to be processed. Wherein the output data is computed by the neural network processor 107.
The model generating module 102 is configured to generate an offline model for use by a lower layer according to the input network structure parameter and the weight data.
The model analyzing unit 106 is configured to analyze the incoming offline model, generate an operation instruction that can be directly sent to the neural network processor 107, and send the data to be processed, which is sent from the input module 101, to the neural network processor 107.
The neural network processor 107 is configured to perform an operation according to the transmitted operation instruction and data to be processed, obtain a determined operation result, transmit the operation result to the output module 104, and has an instruction cache unit and a data cache unit.
The control module 105 is configured to detect an input data type and perform the following operations:
when the input data comprises data to be processed, a network structure and weight data, the control input module 101 inputs the network structure and the weight data into the model generation module 102 to construct an offline model, and controls the neural network operation module 103 to perform neural network operation on the data to be processed input by the input module 101 based on the offline model input by the model generation module 102;
when the input data comprises data to be processed and an offline model, the control input module 101 inputs the data to be processed and the offline model into the neural network operation module 103, controls the neural network operation module 103 to generate and cache an operation instruction based on the offline model, and performs neural network operation on the data to be processed based on the operation instruction;
when the input data only includes the data to be processed, the control input module 101 inputs the data to be processed into the neural network operation module 103, and controls the neural network operation module 103 to call the cached operation instruction, so as to perform neural network operation on the data to be processed.
The input network structure provided in this embodiment is AlexNet, the weight data is bvlc _ AlexNet. The model generation module 102 generates a new offline model Cambricon _ model according to the input network structure and the weight data, and the generated offline model Cambricon _ model can also be used alone as the next input; the model parsing unit 106 may parse the offline model Cambricon _ model, thereby generating a series of operation instructions. The model analysis unit 106 transmits the generated operation instruction to an instruction cache unit on the neural network processor 107, and transmits the input image transmitted by the input module 101 to a data cache unit on the neural network processor 107.
The processes or methods depicted in the preceding figures may be performed by processing logic that comprises hardware, software, or a combination of both. Although the processes or methods are described above in terms of some sequential operations, it should be understood that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (12)

1. An arithmetic method comprising the steps of:
when the input data comprises data to be processed, network structure and weight data, executing the following steps:
inputting and reading input data;
constructing an offline model according to the network structure and the weight data;
analyzing the offline model to obtain an operation instruction which can be identified by the neural network processing unit and caching the operation instruction for subsequent calculation and calling;
the neural network processing unit operates the data to be processed according to the operation instruction to obtain an operation result for outputting;
when the input data comprises the data to be processed and the off-line model, executing the following steps:
inputting and reading input data;
analyzing the offline model to obtain an operation instruction which can be identified by the neural network processing unit and caching the operation instruction for subsequent calculation and calling;
the neural network processing unit operates the data to be processed according to the operation instruction to obtain an operation result for outputting;
when the input data only comprises the data to be processed, the following steps are executed:
inputting and reading input data;
and calling the cached operation instruction which can be identified by the neural network processing unit, and operating the data to be processed to obtain an operation result for outputting.
2. The method of operation of claim 1, wherein the neural network processing unit has an instruction cache unit to cache the operation instruction for subsequent computation calls.
3. The method of operation of any of claims 1-2, wherein the offline model is a neural network model; the neural network models include Cambridge _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model.
4. The method of operation of claim 1, wherein the data to be processed is an input that can be processed with a neural network.
5. The method of claim 4, wherein the data to be processed comprises a continuous single picture, voice, or video stream.
6. The operational method of claim 1, wherein the network structure is a neural network structure; the neural network structure comprises AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN.
7. An arithmetic device comprising:
the input module is used for inputting data, wherein the data comprises data to be processed, network structure and weight data and/or offline model data;
the model generation module is used for constructing an offline model according to the input network structure and the weight data;
a neural network operation module comprising:
the model analysis unit is used for generating an operation instruction which can be identified by the neural network processing unit based on the offline model; and
the neural network processing unit is used for caching the operation instruction for subsequent calculation and calling, and performing operation on the data to be processed based on the operation instruction to obtain an operation result;
the output module is used for outputting the operation result;
the control module is used for detecting the type of input data and executing the following operations:
when the input data comprises data to be processed, a network structure and weight data, the control input module inputs the network structure and the weight data into the model generation module to construct an offline model, and controls the neural network operation module to operate the data to be processed input by the input module based on the offline model constructed by the model generation module;
when input data comprise to-be-processed data and an offline model, controlling an input module to input the to-be-processed data and the offline model into a neural network operation module, controlling the neural network operation module to generate and cache an operation instruction based on the offline model, and operating the to-be-processed data based on the operation instruction;
when the input data only comprises the data to be processed, the control input module inputs the data to be processed into the neural network operation module, and controls the neural network operation module to call the cached operation instruction to operate the data to be processed.
8. The arithmetic device of claim 7 wherein the neural network processing unit has an instruction cache unit to cache the arithmetic instruction for subsequent compute calls.
9. The computing device of claim 7, wherein the offline model is a neural network model; the neural network models include Cambridge _ model, AlexNet _ model, GoogleNet _ model, VGG _ model, R-CNN _ model, GAN _ model, LSTM _ model, RNN _ model, ResNet _ model.
10. The computing device of claim 7, wherein the data to be processed is an input that can be processed with a neural network.
11. The computing device of claim 10, wherein the data to be processed comprises a continuous single picture, voice, or video stream.
12. The computing device of claim 7, wherein the network structure is a neural network structure; the neural network structure comprises AlexNet, GoogleNet, ResNet, VGG, R-CNN, GAN, LSTM, RNN.
CN201710269049.0A 2017-04-19 2017-04-21 Operation method and device Active CN108734288B (en)

Priority Applications (16)

Application Number Priority Date Filing Date Title
CN201710269049.0A CN108734288B (en) 2017-04-21 2017-04-21 Operation method and device
PCT/CN2018/083415 WO2018192500A1 (en) 2017-04-19 2018-04-17 Processing apparatus and processing method
EP18788355.8A EP3614259A4 (en) 2017-04-19 2018-04-17 Processing apparatus and processing method
EP19214371.7A EP3786786B1 (en) 2017-04-19 2018-04-17 Processing device, processing method, chip, and electronic apparatus
CN201880000923.3A CN109121435A (en) 2017-04-19 2018-04-17 Processing unit and processing method
KR1020197038135A KR102258414B1 (en) 2017-04-19 2018-04-17 Processing apparatus and processing method
JP2019549467A JP6865847B2 (en) 2017-04-19 2018-04-17 Processing equipment, chips, electronic equipment and methods
CN201811097653.0A CN109376852B (en) 2017-04-21 2018-04-17 Arithmetic device and arithmetic method
EP19214320.4A EP3654172A1 (en) 2017-04-19 2018-04-17 Fused vector multiplier and method using the same
KR1020197025307A KR102292349B1 (en) 2017-04-19 2018-04-17 Processing device and processing method
US16/476,262 US11531540B2 (en) 2017-04-19 2018-04-17 Processing apparatus and processing method with dynamically configurable operation bit width
US16/697,687 US11734002B2 (en) 2017-04-19 2019-11-27 Counting elements in neural network input data
US16/697,727 US11698786B2 (en) 2017-04-19 2019-11-27 Processing apparatus and processing method
US16/697,637 US11720353B2 (en) 2017-04-19 2019-11-27 Processing apparatus and processing method
US16/697,533 US11531541B2 (en) 2017-04-19 2019-11-27 Processing apparatus and processing method
JP2019228383A JP6821002B2 (en) 2017-04-19 2019-12-18 Processing equipment and processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710269049.0A CN108734288B (en) 2017-04-21 2017-04-21 Operation method and device

Publications (2)

Publication Number Publication Date
CN108734288A CN108734288A (en) 2018-11-02
CN108734288B true CN108734288B (en) 2021-01-29

Family

ID=63934137

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710269049.0A Active CN108734288B (en) 2017-04-19 2017-04-21 Operation method and device
CN201811097653.0A Active CN109376852B (en) 2017-04-19 2018-04-17 Arithmetic device and arithmetic method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201811097653.0A Active CN109376852B (en) 2017-04-19 2018-04-17 Arithmetic device and arithmetic method

Country Status (1)

Country Link
CN (2) CN108734288B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685203B (en) * 2018-12-21 2020-01-17 中科寒武纪科技股份有限公司 Data processing method, device, computer system and storage medium
CN109726797B (en) * 2018-12-21 2019-11-19 北京中科寒武纪科技有限公司 Data processing method, device, computer system and storage medium
CN109697500B (en) * 2018-12-29 2020-06-09 中科寒武纪科技股份有限公司 Data processing method and device, electronic equipment and storage medium
CN110070176A (en) * 2019-04-18 2019-07-30 北京中科寒武纪科技有限公司 The processing method of off-line model, the processing unit of off-line model and Related product
US11983535B2 (en) 2019-03-22 2024-05-14 Cambricon Technologies Corporation Limited Artificial intelligence computing device and related product
CN111832739B (en) * 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 Data processing method and related product
CN110309917B (en) * 2019-07-05 2020-12-18 安徽寒武纪信息科技有限公司 Verification method of off-line model and related device
CN116167422A (en) * 2019-07-31 2023-05-26 华为技术有限公司 Integrated chip and method for processing sensor data
CN111582459B (en) * 2020-05-18 2023-10-20 Oppo广东移动通信有限公司 Method for executing operation, electronic equipment, device and storage medium
CN112613597B (en) * 2020-11-30 2023-06-30 河南汇祥通信设备有限公司 Comprehensive pipe rack risk automatic identification convolutional neural network model and construction method
CN112947935B (en) * 2021-02-26 2024-08-13 上海商汤智能科技有限公司 Operation method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130090147A (en) * 2012-02-03 2013-08-13 안병익 Neural network computing apparatus and system, and method thereof
US9378455B2 (en) * 2012-05-10 2016-06-28 Yan M. Yufik Systems and methods for a computer understanding multi modal data streams
US20160162779A1 (en) * 2014-12-05 2016-06-09 RealMatch, Inc. Device, system and method for generating a predictive model by machine learning
EP3035249B1 (en) * 2014-12-19 2019-11-27 Intel Corporation Method and apparatus for distributed and cooperative computation in artificial neural networks
CN105005911B (en) * 2015-06-26 2017-09-19 深圳市腾讯计算机系统有限公司 The arithmetic system and operation method of deep neural network
CN106447035B (en) * 2015-10-08 2019-02-26 上海兆芯集成电路有限公司 Processor with variable rate execution unit
CN107506828B (en) * 2016-01-20 2020-11-03 中科寒武纪科技股份有限公司 Artificial neural network computing device and method for sparse connection
CN105930902B (en) * 2016-04-18 2018-08-10 中国科学院计算技术研究所 A kind of processing method of neural network, system
CN106228238B (en) * 2016-07-27 2019-03-22 中国科学技术大学苏州研究院 Accelerate the method and system of deep learning algorithm on field programmable gate array platform
CN106529670B (en) * 2016-10-27 2019-01-25 中国科学院计算技术研究所 It is a kind of based on weight compression neural network processor, design method, chip
CN106557332A (en) * 2016-11-30 2017-04-05 上海寒武纪信息科技有限公司 A kind of multiplexing method and device of instruction generating process

Also Published As

Publication number Publication date
CN109376852A (en) 2019-02-22
CN108734288A (en) 2018-11-02
CN109376852B (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN108734288B (en) Operation method and device
US11698786B2 (en) Processing apparatus and processing method
Hu et al. Dynamic adaptive DNN surgery for inference acceleration on the edge
US20210357759A1 (en) Task processing method and device based on neural network
CN106845631B (en) Stream execution method and device
CN109964238A (en) Video frame is generated using neural network
JP7241813B2 (en) METHOD AND DEVICE FOR CONSTRUCTING IMAGE EDITING MODEL
US11893708B2 (en) Image processing method and apparatus, device, and storage medium
JP2021193547A (en) Method, apparatus, electronic device and computer-readable storage medium for constructing key-point learning model
Jain et al. Efficient execution of quantized deep learning models: A compiler approach
WO2020189844A1 (en) Method for processing artificial neural network, and electronic device therefor
CN111651207A (en) Neural network model operation chip, method, device, equipment and medium
CN110750298B (en) AI model compiling method, equipment and storage medium
US11967150B2 (en) Parallel video processing systems
US20220343120A1 (en) Image processing method, computer system, electronic device, and program product
KR20190064862A (en) Client terminal that improves the efficiency of machine learning through cooperation with a server and a machine learning system including the same
CN115600676A (en) Deep learning model reasoning method, device, equipment and storage medium
CN112052945A (en) Neural network training method, neural network training device and electronic equipment
Gao et al. Ocdst: Offloading chained dnns for streaming tasks
US20230419082A1 (en) Improved Processing of Sequential Data via Machine Learning Models Featuring Temporal Residual Connections
Kim et al. Low-complexity online model selection with Lyapunov control for reward maximization in stabilized real-time deep learning platforms
CN113190352B (en) General CPU-oriented deep learning calculation acceleration method and system
CN115098262A (en) Multi-neural-network task processing method and device
Parashar et al. Processor pipelining method for efficient deep neural network inference on embedded devices
da Silva Carvalho et al. Real-time image recognition system based on an embedded heterogeneous computer and deep convolutional neural networks for deployment in constrained environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant