CN106951926B - Deep learning method and device of hybrid architecture - Google Patents
Deep learning method and device of hybrid architecture Download PDFInfo
- Publication number
- CN106951926B CN106951926B CN201710196532.0A CN201710196532A CN106951926B CN 106951926 B CN106951926 B CN 106951926B CN 201710196532 A CN201710196532 A CN 201710196532A CN 106951926 B CN106951926 B CN 106951926B
- Authority
- CN
- China
- Prior art keywords
- deep learning
- module
- training
- server
- capi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 87
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 86
- 238000012544 monitoring process Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 26
- 230000008859 change Effects 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 10
- 238000004891 communication Methods 0.000 claims description 9
- 230000001133 acceleration Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000013136 deep learning model Methods 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Debugging And Monitoring (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a deep learning method and a deep learning device of a hybrid architecture, which are characterized by comprising the following steps of: when the training data set is updated, the training module carries out deep learning network model training again and stores the weight and the bias parameters; the server side monitoring process monitors the change of the parameter file, packages the parameter file into a preset data structure and informs an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module; the system device comprises a server, a training module, an inference module and a bus interface; the training and reasoning hybrid CPU + GPU + CAPI heterogeneous deep learning system can fully utilize resources, obtain higher energy efficiency ratio, realize that CAPI directly accesses a server memory, and iteratively update parameters such as a reasoning model weight value in real time on line.
Description
The technical field is as follows:
the present invention relates to the technical field of circuit design and machine learning, and in particular, to a deep learning method and apparatus for a hybrid architecture.
Background art:
the rapid development of the information technology industry in the 21 st century brings great benefits and convenience to people. Deep learning application is divided into two parts of training and reasoning, taking ImageNet evaluation as an example, the AlexNet model training process needs 800 thousands of pictures of 1000 categories, features are extracted through an AlexNet model and loss is calculated, and then weight parameters are updated through back propagation such as SGD, so that the model is continuously converged, and an ideal network model is finally obtained. The reasoning process is a process of performing a forward operation on the input through a network model to obtain the accuracy of the final classification (Top 5 is generally selected). The training process of the deep learning application needs to use a large amount of computing resources and training data, and the current training platform generally adopts a high-performance GPU of NVIDIA (video graphics processing Unit) such as Tesla P100, Titan X, GTX1080 and the like to accelerate the training process. After the available model is obtained, the model is deployed to another platform for reasoning and providing services for the outside, the reasoning process only needs one-time forward operation, so the requirement on calculation is lower, more requirements are reflected in time delay, the current platform for reasoning is provided with a cloud service platform based on a CPU, a GPU server cluster based on low power consumption, an FPGA or a special ASIC cluster and the like. The use of FPGAs and dedicated ASICs is even more advantageous in terms of low latency and high performance. And compared with an ASIC, the FPGA has more architectural flexibility and obtains more and more attention. CAPI, namely a Coherent Accelerator Processor Interface (Cowlett packard Interface), is a high-speed bus Interface protocol developed by IBM on a POWER Processor, and the physical Interface form is PCI-E or BlueLink developed by IBM. The PSL layer is realized inside the CAPI, the access consistency between the CAPI and the server is ensured, namely the CPU memory can be directly accessed through the virtual address, and the access delay is greatly reduced. And the SNAP Framework programming environment provided by IBM can use a C/C + + convenient algorithm model.
Therefore, various deep learning methods and devices are developed and researched by people, for example, an embedded deep learning processor disclosed in Chinese patent with the publication number of CN106022472A, the invention belongs to the technical field of integrated circuits, and particularly relates to an embedded deep learning processor based on an FPGA (field programmable gate array); the deep learning processor includes: a Central Processing Unit (CPU) for performing necessary logic operation, control and storage operations during learning and operation of the processor; the deep learning unit is a hardware implementation unit of a deep learning algorithm and is a core component for deep learning processing; the deep learning processor combines a traditional CPU and a deep learning combination unit, wherein the deep learning combination unit can be combined by a plurality of deep learning units at will, has expandability, and can be used as a core processor for artificial intelligence application aiming at different calculation scales. As shown in fig. 5, chinese patent with publication number CN106156851A is an acceleration apparatus and method for deep learning service, which is used to perform deep learning calculation on data to be processed in a server, and includes a network card disposed at a server end, a calculation control module connected to the server through a bus, a first memory and a second memory; the calculation control module is a programmable logic device and comprises a control unit, a data storage unit, a logic storage unit, a bus interface, a first communication interface and a second communication interface, wherein the bus interface, the first communication interface and the second communication interface are respectively communicated with the network card, the first memory and the second memory; the logic storage unit is used for storing deep learning control logic; the first memory is used for storing weight data and bias data of each layer of the network; by using the method and the device, the calculation efficiency can be effectively improved, and the performance power consumption ratio is improved.
The prior art has the following defects: 1) the general method adopts the separation of training and reasoning, needs to maintain two sets of platform environments, and cannot fully utilize resources; 2) the FPGA/CPLD is completely adopted for deep learning calculation, the calculation capability is not strong enough, and the method is not suitable for large-scale training scenes at present; 3) the communication between the FPGA/CPLD and the server is generally realized in a DMA mode, and the interaction time delay between data and the CPU server is large. Therefore, it is necessary to provide a new deep learning system method and apparatus.
The invention content is as follows:
in order to solve the defects of the prior art, the invention provides a deep learning method and a deep learning device of a hybrid architecture, which give full play to the advantages and the characteristics of respective modules, obtain higher energy efficiency ratio and fully utilize resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; the technical scheme for solving the technical problems of the invention is as follows:
a deep learning method of a hybrid architecture is used for realizing deep learning training and reasoning, and comprises the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
and S4, the server side monitoring process processes the input files needing to be reasoned at the same time, informs the inference module, and returns the result to the server side monitoring process after the inference module is finished.
The step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
s12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module;
the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
s23, when updating happens, a stop command and updated parameter file information need to be sent to the reasoning module; the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the monitoring process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
The deep learning network model of the hybrid architecture adopts a deep learning network model aiming at image classification.
A hybrid architecture deep learning device is used for realizing parallelization operation of deep learning training and reasoning and comprises a server, a training module, a reasoning module and a bus interface; the server comprises a CPU processor, a DDR memory and a network; the training module and the reasoning module are connected with the server through bus interfaces and can be connected and communicated.
The server has functions of control, data processing, network interaction and parameter storage for deep learning.
The CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process; the inference module is a CAPI inference module which can be loaded with a preset deep learning network model in advance and is used for the deep learning inference process.
Compared with the prior art, the invention has the beneficial effects that: the invention discloses a deep learning method of a hybrid architecture, which comprises the following steps: when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file; monitoring the change of the parameter file by a monitoring process at the server end, packaging the virtual address and length information of the weight and bias parameter storage space into a preset data structure, and informing an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side through the bus interface, and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module, and the inference module returns the results to the server side monitoring process after finishing the processing; the hybrid architecture deep learning device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the invention adopts a set of CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, exerts the advantages and characteristics of respective modules, obtains higher energy efficiency ratio and fully utilizes resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; parameters such as the weight of the inference model and the like can be updated in an online iterative manner in real time.
Drawings
Fig. 1 is a flowchart illustrating a deep learning method of a hybrid architecture according to the present invention.
FIG. 2 is an architecture diagram of a hybrid architecture deep learning device according to the present invention.
Fig. 3 is an architecture diagram of a hybrid architecture deep learning device according to an embodiment of the present invention.
Fig. 4 is a working schematic diagram of the present invention using an Alexnet deep learning network model as an example.
Fig. 5 is a block diagram illustrating a structure of an acceleration apparatus for deep learning service according to an embodiment of the prior art.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings 1 to 5 so that the public can better understand the implementation method of the present invention, and the specific embodiments of the present invention are as follows:
as shown in fig. 1, the deep learning method with a hybrid architecture according to the present invention is used for implementing deep learning training and reasoning, and includes the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
and S4, the server side monitoring process processes the input files needing to be reasoned at the same time, informs the inference module, and returns the result to the server side monitoring process after the inference module is finished.
The step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
s12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module;
the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
s23, when updating happens, a stop command and updated parameter file information need to be sent to the reasoning module;
the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the daemon process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
As shown in fig. 2, the hybrid architecture deep learning apparatus is configured to implement parallelization operations of deep learning training and reasoning, and is characterized in that: the device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the server comprises the functions of deep learning control, data processing, network interaction and parameter storage; the CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process; the inference module is a CAPI inference module which can be pre-loaded with a preset deep learning network model and is used for the deep learning inference process; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the reasoning module is PCI-E or BlueLink, and the bus protocol is CAPI.
Preferably, as shown in fig. 4, the deep learning system network model of the hybrid architecture adopts an Alexnet deep learning network model for picture classification. In order to facilitate understanding of the scheme of the invention, the working principle of the invention is briefly described below by taking an Alexnet deep learning network model as an example: the Alexnet deep learning network model is composed of 5 convolutional layers and 3 full-connection layers, Relu, Pooling and Normalization operations are added to part of the convolutional layers, and 1000 classified Softmax layers are output from the last full-connection layer. The Alexnet model can be used for wide picture classification, can perform classification training aiming at different situations according to different training data sets, and provides picture classification service.
Example 1
As shown in fig. 3, as a preferred embodiment, an Alexnet picture classification problem is implemented:
the deep learning device with the hybrid architecture is used for realizing parallelization operation of deep learning training and reasoning, and comprises a POWER8 processor, a DDR memory, a network and the like; a GPU acceleration training module GTX1080 connected with the server through a bus; and the CAPI inference module ADM-PCIE-KU3 accelerator card is connected with the server through a bus. The GPU training module is used for accelerating the training process of the deep learning model; the inference module is preloaded with an AlexNet network model and used for the inference process of deep learning; the server is used for control of deep learning, data processing, network interaction, parameter storage and the like; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the reasoning module is PCI-E or BlueLink, and the bus protocol is CAPI.
The deep learning method of the hybrid architecture of the device comprises the following implementation steps:
SS1, using SNAP Framework tool (an algorithm model tool using C/C + + to run in CAPI card) to realize Alexnet 8 layer network model, and writing into CAPI reasoning module;
SS2, based on a Tensorflow depth frame, acquiring TFReccrods picture sets of 300 million pictures of 300 kinds of marked birds, for example, and providing the TFReccrods picture sets as training data sets to two GTX1080 GPU for distributed training;
SS3, monitoring the process to obtain the latest training result pb file, analyzing the weight and the offset parameter in the pb file to a file A, and acquiring the virtual address and length information stored by the parameter;
the SS4 calls a CAPI kernel library function interface and a driver by the monitoring program, and sends a data structure packaged with parameter information to the ADM-PCIE-KU3 CAPI module;
SS5, the CAPI card analyzes the parameter address from the structure, thereby obtaining the parameter information and correspondingly updating the stored network model weight and the biased parameter variable;
SS6, the CAPI card receives the picture inference request sent by the monitoring program, and returns the Top5 result output by the network, and the picture identification service of the category can be provided for the outside;
the SS7 can continuously train new classes while the CAPI card provides services, and synchronously update the trained parameters into the CAPI card. Thus, synchronous updating and iteration of training and reasoning are realized.
Compared with the prior art, the invention has the beneficial effects that: the invention discloses a deep learning method of a hybrid architecture, which comprises the following steps: when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file; monitoring the change of the parameter file by a monitoring process at the server end, packaging the virtual address and length information of the weight and bias parameter storage space into a preset data structure, and informing an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side through the bus interface, and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module, and the inference module returns the results to the server side monitoring process after finishing the processing; the hybrid architecture deep learning device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the invention adopts a set of CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, exerts the advantages and characteristics of respective modules, obtains higher energy efficiency ratio and fully utilizes resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; parameters such as the weight of the inference model and the like can be updated in an online iterative manner in real time.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited to the specific embodiments of the present invention, and any modification, equivalent replacement, improvement, modification, etc. within the spirit and principle of the present invention and the disclosed technical scope should be included in the scope of the present invention.
Claims (9)
1. A deep learning method of a hybrid architecture is disclosed, which realizes deep learning training and reasoning based on a deep learning system and is characterized in that: the deep learning system is a CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, and comprises the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
s4, the server side monitoring process processes the input files needing to be reasoned at the same time and informs the inference module, and the inference module returns the result to the server side monitoring process after finishing the process;
CPU + GPU + CAPI heterogeneous deep learning system includes:
the POWER8 processor, DDR memory, network server; a GPU acceleration training module GTX1080 connected with the server through a bus; the CAPI inference module ADM-PCIE-KU3 accelerator card is connected with the server through a bus; the GPU acceleration training module GTX1080 is used for accelerating the training process of the deep learning model; the inference module is preloaded with an AlexNet network model and used for the inference process of deep learning; the server is used for control of deep learning, data processing, network interaction and parameter storage; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the inference module is PCI-E or BlueLink, and the bus protocol is CAPI; the deep learning method of the hybrid architecture comprises the following implementation steps:
SS1, realizing Alexnet 8-layer network model by using SNAP Framework tool, and writing to CAPI reasoning module;
SS2, acquiring picture data based on a Tensorflow depth frame, and providing the picture data as a training data set for two GTX1080 GPUs to perform distributed training;
SS3, monitoring the process to obtain the latest training result pb file, analyzing the weight and the offset parameter in the pb file to a file A, and acquiring the virtual address and length information stored by the parameter;
the SS4 calls a CAPI kernel library function interface and a driver by the monitoring program, and sends a data structure packaged with parameter information to the ADM-PCIE-KU3 CAPI module;
SS5, the CAPI card analyzes the parameter address from the structure, thereby obtaining the parameter information and correspondingly updating the stored network model weight and the biased parameter variable;
SS6, the CAPI card receives the picture inference request sent by the monitoring program, and returns the Top5 result output by the network, and the picture identification service of the corresponding category can be provided for the outside;
the SS7 can continuously train new classes while the CAPI card provides services, and synchronously update the trained parameters into the CAPI card.
2. The method of claim 1, wherein: the step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
and S12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module.
3. The method of claim 1, wherein: the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
and S23, when the update happens, a stop command and updated parameter file information need to be sent to the inference module.
4. The method of claim 1, wherein: the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the monitoring process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
5. The method of claim 1, wherein: the network model adopts a deep learning model aiming at image classification.
6. A hybrid architecture deep learning apparatus using the method of any one of claims 1-5 for implementing parallelized operations for deep learning training and reasoning, characterized by: the device comprises a server, a training module, an inference module and a bus interface; the server comprises a CPU processor, a DDR memory and a network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the CAPI directly accesses the memory of the server.
7. The apparatus of claim 6, wherein: the server has the functions of control, data processing, network interaction and parameter storage for deep learning.
8. The apparatus of claim 6, wherein: the CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process.
9. The apparatus of claim 6, wherein: the reasoning module is a CAPI reasoning module which can be loaded with a deep learning network model in advance and is used for the deep learning reasoning process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710196532.0A CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710196532.0A CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951926A CN106951926A (en) | 2017-07-14 |
CN106951926B true CN106951926B (en) | 2020-11-24 |
Family
ID=59474087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710196532.0A Active CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951926B (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563512B (en) * | 2017-08-24 | 2023-10-17 | 腾讯科技(上海)有限公司 | Data processing method, device and storage medium |
WO2019042571A1 (en) * | 2017-09-04 | 2019-03-07 | Huawei Technologies Co., Ltd. | Asynchronous gradient averaging distributed stochastic gradient descent |
CN107729268B (en) * | 2017-09-20 | 2019-11-12 | 山东英特力数据技术有限公司 | A kind of memory expansion apparatus and method based on CAPI interface |
TWI658365B (en) * | 2017-10-30 | 2019-05-01 | 緯創資通股份有限公司 | Connecting module |
CN109064382B (en) * | 2018-06-21 | 2023-06-23 | 北京陌上花科技有限公司 | Image information processing method and server |
CN109460826A (en) * | 2018-10-31 | 2019-03-12 | 北京字节跳动网络技术有限公司 | For distributing the method, apparatus and model modification system of data |
CN109726170A (en) * | 2018-12-26 | 2019-05-07 | 上海新储集成电路有限公司 | A kind of on-chip system chip of artificial intelligence |
CN109886408A (en) * | 2019-02-28 | 2019-06-14 | 北京百度网讯科技有限公司 | A kind of deep learning method and device |
CN109947682B (en) * | 2019-03-21 | 2021-03-09 | 浪潮商用机器有限公司 | Server mainboard and server |
US11176493B2 (en) | 2019-04-29 | 2021-11-16 | Google Llc | Virtualizing external memory as local to a machine learning accelerator |
CN112148470B (en) * | 2019-06-28 | 2022-11-04 | 富联精密电子(天津)有限公司 | Parameter synchronization method, computer device and readable storage medium |
CN110399234A (en) * | 2019-07-10 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing |
CN110533181B (en) * | 2019-07-25 | 2023-07-18 | 南方电网数字平台科技(广东)有限公司 | Rapid training method and system for deep learning model |
CN112541513B (en) * | 2019-09-20 | 2023-06-27 | 百度在线网络技术(北京)有限公司 | Model training method, device, equipment and storage medium |
CN110598855B (en) * | 2019-09-23 | 2023-06-09 | Oppo广东移动通信有限公司 | Deep learning model generation method, device, equipment and storage medium |
CN111147603A (en) * | 2019-09-30 | 2020-05-12 | 华为技术有限公司 | Method and device for networking reasoning service |
TWI780382B (en) * | 2019-12-05 | 2022-10-11 | 新唐科技股份有限公司 | Microcontroller updating system and method |
CN113298222A (en) * | 2020-02-21 | 2021-08-24 | 深圳致星科技有限公司 | Parameter updating method based on neural network and distributed training platform system |
CN111860260B (en) * | 2020-07-10 | 2024-01-26 | 逢亿科技(上海)有限公司 | High-precision low-calculation target detection network system based on FPGA |
CN112465112B (en) * | 2020-11-19 | 2022-06-07 | 苏州浪潮智能科技有限公司 | nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system |
CN112581353A (en) * | 2020-12-29 | 2021-03-30 | 浪潮云信息技术股份公司 | End-to-end picture reasoning system facing deep learning model |
CN112949427A (en) * | 2021-02-09 | 2021-06-11 | 北京奇艺世纪科技有限公司 | Person identification method, electronic device, storage medium, and apparatus |
CN113537284B (en) * | 2021-06-04 | 2023-01-24 | 中国人民解放军战略支援部队信息工程大学 | Deep learning implementation method and system based on mimicry mechanism |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098633A1 (en) * | 2014-10-02 | 2016-04-07 | Nec Laboratories America, Inc. | Deep learning model for structured outputs with high-order interaction |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
US20160267380A1 (en) * | 2015-03-13 | 2016-09-15 | Nuance Communications, Inc. | Method and System for Training a Neural Network |
CN104714852B (en) * | 2015-03-17 | 2018-05-22 | 华中科技大学 | A kind of parameter synchronization optimization method and its system suitable for distributed machines study |
CN105825235B (en) * | 2016-03-16 | 2018-12-25 | 新智认知数据服务有限公司 | A kind of image-recognizing method based on multi-characteristic deep learning |
-
2017
- 2017-03-29 CN CN201710196532.0A patent/CN106951926B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106951926A (en) | 2017-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951926B (en) | Deep learning method and device of hybrid architecture | |
US20190318231A1 (en) | Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information | |
US20190332944A1 (en) | Training Method, Apparatus, and Chip for Neural Network Model | |
KR20200069353A (en) | Machine learning runtime library for neural network acceleration | |
EP4145351A1 (en) | Neural network construction method and system | |
CN110392903A (en) | The dynamic of matrix manipulation is rejected | |
CN111967468A (en) | FPGA-based lightweight target detection neural network implementation method | |
DE102019103310A1 (en) | ESTIMATE FOR AN OPTIMAL OPERATING POINT FOR HARDWARE WORKING WITH A RESTRICTION ON THE SHARED PERFORMANCE / HEAT | |
CN111814959A (en) | Model training data processing method, device and system and storage medium | |
JP2021193546A (en) | Method and apparatus for generating image, electronic device, storage medium, and computer program | |
CN108304925B (en) | Pooling computing device and method | |
CN108304926B (en) | Pooling computing device and method suitable for neural network | |
CN111915555B (en) | 3D network model pre-training method, system, terminal and storage medium | |
CN109492761A (en) | Realize FPGA accelerator, the method and system of neural network | |
KR20240148819A (en) | System and method for performing semantic image segmentation | |
US20230298237A1 (en) | Data processing method, apparatus, and device and storage medium | |
CN113592066A (en) | Hardware acceleration method, apparatus, device, computer program product and storage medium | |
CN110738720A (en) | Special effect rendering method and device, terminal and storage medium | |
KR20200040165A (en) | Apparatus of Acceleration for Artificial Neural Network System and Method thereof | |
JP2022013579A (en) | Method and apparatus for processing image, electronic device, and storage medium | |
DE112020007087T5 (en) | Concurrent hash table updates | |
US12112533B2 (en) | Method and apparatus for data calculation in neural network model, and image processing method and apparatus | |
CN113111201B (en) | Digital twin model lightweight method and system | |
CN115129460A (en) | Method and device for acquiring operator hardware time, computer equipment and storage medium | |
CN115456149B (en) | Impulse neural network accelerator learning method, device, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240919 Address after: 272000, No. 431, Chongwen Avenue, high tech Zone, Jining City, Shandong Province Patentee after: SHANDONG INTELLIGENT OPTICAL COMMUNICATION DEVELOPMENT Co.,Ltd. Country or region after: China Address before: 272000 yingteli Industrial Park, 431 Chongwen Avenue, high tech Zone, Jining City, Shandong Province Patentee before: SHANDONG ITL DATA TECHNIQUE CO.,LTD. Country or region before: China |