CN106951926B - Deep learning method and device of hybrid architecture - Google Patents

Deep learning method and device of hybrid architecture Download PDF

Info

Publication number
CN106951926B
CN106951926B CN201710196532.0A CN201710196532A CN106951926B CN 106951926 B CN106951926 B CN 106951926B CN 201710196532 A CN201710196532 A CN 201710196532A CN 106951926 B CN106951926 B CN 106951926B
Authority
CN
China
Prior art keywords
deep learning
module
training
server
capi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710196532.0A
Other languages
Chinese (zh)
Other versions
CN106951926A (en
Inventor
程归鹏
卢飞
江涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Intelligent Optical Communication Development Co ltd
Original Assignee
Shandong Itl Data Technique Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Itl Data Technique Co ltd filed Critical Shandong Itl Data Technique Co ltd
Priority to CN201710196532.0A priority Critical patent/CN106951926B/en
Publication of CN106951926A publication Critical patent/CN106951926A/en
Application granted granted Critical
Publication of CN106951926B publication Critical patent/CN106951926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a deep learning method and a deep learning device of a hybrid architecture, which are characterized by comprising the following steps of: when the training data set is updated, the training module carries out deep learning network model training again and stores the weight and the bias parameters; the server side monitoring process monitors the change of the parameter file, packages the parameter file into a preset data structure and informs an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module; the system device comprises a server, a training module, an inference module and a bus interface; the training and reasoning hybrid CPU + GPU + CAPI heterogeneous deep learning system can fully utilize resources, obtain higher energy efficiency ratio, realize that CAPI directly accesses a server memory, and iteratively update parameters such as a reasoning model weight value in real time on line.

Description

Deep learning method and device of hybrid architecture
The technical field is as follows:
the present invention relates to the technical field of circuit design and machine learning, and in particular, to a deep learning method and apparatus for a hybrid architecture.
Background art:
the rapid development of the information technology industry in the 21 st century brings great benefits and convenience to people. Deep learning application is divided into two parts of training and reasoning, taking ImageNet evaluation as an example, the AlexNet model training process needs 800 thousands of pictures of 1000 categories, features are extracted through an AlexNet model and loss is calculated, and then weight parameters are updated through back propagation such as SGD, so that the model is continuously converged, and an ideal network model is finally obtained. The reasoning process is a process of performing a forward operation on the input through a network model to obtain the accuracy of the final classification (Top 5 is generally selected). The training process of the deep learning application needs to use a large amount of computing resources and training data, and the current training platform generally adopts a high-performance GPU of NVIDIA (video graphics processing Unit) such as Tesla P100, Titan X, GTX1080 and the like to accelerate the training process. After the available model is obtained, the model is deployed to another platform for reasoning and providing services for the outside, the reasoning process only needs one-time forward operation, so the requirement on calculation is lower, more requirements are reflected in time delay, the current platform for reasoning is provided with a cloud service platform based on a CPU, a GPU server cluster based on low power consumption, an FPGA or a special ASIC cluster and the like. The use of FPGAs and dedicated ASICs is even more advantageous in terms of low latency and high performance. And compared with an ASIC, the FPGA has more architectural flexibility and obtains more and more attention. CAPI, namely a Coherent Accelerator Processor Interface (Cowlett packard Interface), is a high-speed bus Interface protocol developed by IBM on a POWER Processor, and the physical Interface form is PCI-E or BlueLink developed by IBM. The PSL layer is realized inside the CAPI, the access consistency between the CAPI and the server is ensured, namely the CPU memory can be directly accessed through the virtual address, and the access delay is greatly reduced. And the SNAP Framework programming environment provided by IBM can use a C/C + + convenient algorithm model.
Therefore, various deep learning methods and devices are developed and researched by people, for example, an embedded deep learning processor disclosed in Chinese patent with the publication number of CN106022472A, the invention belongs to the technical field of integrated circuits, and particularly relates to an embedded deep learning processor based on an FPGA (field programmable gate array); the deep learning processor includes: a Central Processing Unit (CPU) for performing necessary logic operation, control and storage operations during learning and operation of the processor; the deep learning unit is a hardware implementation unit of a deep learning algorithm and is a core component for deep learning processing; the deep learning processor combines a traditional CPU and a deep learning combination unit, wherein the deep learning combination unit can be combined by a plurality of deep learning units at will, has expandability, and can be used as a core processor for artificial intelligence application aiming at different calculation scales. As shown in fig. 5, chinese patent with publication number CN106156851A is an acceleration apparatus and method for deep learning service, which is used to perform deep learning calculation on data to be processed in a server, and includes a network card disposed at a server end, a calculation control module connected to the server through a bus, a first memory and a second memory; the calculation control module is a programmable logic device and comprises a control unit, a data storage unit, a logic storage unit, a bus interface, a first communication interface and a second communication interface, wherein the bus interface, the first communication interface and the second communication interface are respectively communicated with the network card, the first memory and the second memory; the logic storage unit is used for storing deep learning control logic; the first memory is used for storing weight data and bias data of each layer of the network; by using the method and the device, the calculation efficiency can be effectively improved, and the performance power consumption ratio is improved.
The prior art has the following defects: 1) the general method adopts the separation of training and reasoning, needs to maintain two sets of platform environments, and cannot fully utilize resources; 2) the FPGA/CPLD is completely adopted for deep learning calculation, the calculation capability is not strong enough, and the method is not suitable for large-scale training scenes at present; 3) the communication between the FPGA/CPLD and the server is generally realized in a DMA mode, and the interaction time delay between data and the CPU server is large. Therefore, it is necessary to provide a new deep learning system method and apparatus.
The invention content is as follows:
in order to solve the defects of the prior art, the invention provides a deep learning method and a deep learning device of a hybrid architecture, which give full play to the advantages and the characteristics of respective modules, obtain higher energy efficiency ratio and fully utilize resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; the technical scheme for solving the technical problems of the invention is as follows:
a deep learning method of a hybrid architecture is used for realizing deep learning training and reasoning, and comprises the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
and S4, the server side monitoring process processes the input files needing to be reasoned at the same time, informs the inference module, and returns the result to the server side monitoring process after the inference module is finished.
The step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
s12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module;
the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
s23, when updating happens, a stop command and updated parameter file information need to be sent to the reasoning module; the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the monitoring process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
The deep learning network model of the hybrid architecture adopts a deep learning network model aiming at image classification.
A hybrid architecture deep learning device is used for realizing parallelization operation of deep learning training and reasoning and comprises a server, a training module, a reasoning module and a bus interface; the server comprises a CPU processor, a DDR memory and a network; the training module and the reasoning module are connected with the server through bus interfaces and can be connected and communicated.
The server has functions of control, data processing, network interaction and parameter storage for deep learning.
The CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process; the inference module is a CAPI inference module which can be loaded with a preset deep learning network model in advance and is used for the deep learning inference process.
Compared with the prior art, the invention has the beneficial effects that: the invention discloses a deep learning method of a hybrid architecture, which comprises the following steps: when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file; monitoring the change of the parameter file by a monitoring process at the server end, packaging the virtual address and length information of the weight and bias parameter storage space into a preset data structure, and informing an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side through the bus interface, and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module, and the inference module returns the results to the server side monitoring process after finishing the processing; the hybrid architecture deep learning device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the invention adopts a set of CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, exerts the advantages and characteristics of respective modules, obtains higher energy efficiency ratio and fully utilizes resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; parameters such as the weight of the inference model and the like can be updated in an online iterative manner in real time.
Drawings
Fig. 1 is a flowchart illustrating a deep learning method of a hybrid architecture according to the present invention.
FIG. 2 is an architecture diagram of a hybrid architecture deep learning device according to the present invention.
Fig. 3 is an architecture diagram of a hybrid architecture deep learning device according to an embodiment of the present invention.
Fig. 4 is a working schematic diagram of the present invention using an Alexnet deep learning network model as an example.
Fig. 5 is a block diagram illustrating a structure of an acceleration apparatus for deep learning service according to an embodiment of the prior art.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings 1 to 5 so that the public can better understand the implementation method of the present invention, and the specific embodiments of the present invention are as follows:
as shown in fig. 1, the deep learning method with a hybrid architecture according to the present invention is used for implementing deep learning training and reasoning, and includes the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
and S4, the server side monitoring process processes the input files needing to be reasoned at the same time, informs the inference module, and returns the result to the server side monitoring process after the inference module is finished.
The step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
s12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module;
the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
s23, when updating happens, a stop command and updated parameter file information need to be sent to the reasoning module;
the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the daemon process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
As shown in fig. 2, the hybrid architecture deep learning apparatus is configured to implement parallelization operations of deep learning training and reasoning, and is characterized in that: the device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the server comprises the functions of deep learning control, data processing, network interaction and parameter storage; the CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process; the inference module is a CAPI inference module which can be pre-loaded with a preset deep learning network model and is used for the deep learning inference process; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the reasoning module is PCI-E or BlueLink, and the bus protocol is CAPI.
Preferably, as shown in fig. 4, the deep learning system network model of the hybrid architecture adopts an Alexnet deep learning network model for picture classification. In order to facilitate understanding of the scheme of the invention, the working principle of the invention is briefly described below by taking an Alexnet deep learning network model as an example: the Alexnet deep learning network model is composed of 5 convolutional layers and 3 full-connection layers, Relu, Pooling and Normalization operations are added to part of the convolutional layers, and 1000 classified Softmax layers are output from the last full-connection layer. The Alexnet model can be used for wide picture classification, can perform classification training aiming at different situations according to different training data sets, and provides picture classification service.
Example 1
As shown in fig. 3, as a preferred embodiment, an Alexnet picture classification problem is implemented:
the deep learning device with the hybrid architecture is used for realizing parallelization operation of deep learning training and reasoning, and comprises a POWER8 processor, a DDR memory, a network and the like; a GPU acceleration training module GTX1080 connected with the server through a bus; and the CAPI inference module ADM-PCIE-KU3 accelerator card is connected with the server through a bus. The GPU training module is used for accelerating the training process of the deep learning model; the inference module is preloaded with an AlexNet network model and used for the inference process of deep learning; the server is used for control of deep learning, data processing, network interaction, parameter storage and the like; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the reasoning module is PCI-E or BlueLink, and the bus protocol is CAPI.
The deep learning method of the hybrid architecture of the device comprises the following implementation steps:
SS1, using SNAP Framework tool (an algorithm model tool using C/C + + to run in CAPI card) to realize Alexnet 8 layer network model, and writing into CAPI reasoning module;
SS2, based on a Tensorflow depth frame, acquiring TFReccrods picture sets of 300 million pictures of 300 kinds of marked birds, for example, and providing the TFReccrods picture sets as training data sets to two GTX1080 GPU for distributed training;
SS3, monitoring the process to obtain the latest training result pb file, analyzing the weight and the offset parameter in the pb file to a file A, and acquiring the virtual address and length information stored by the parameter;
the SS4 calls a CAPI kernel library function interface and a driver by the monitoring program, and sends a data structure packaged with parameter information to the ADM-PCIE-KU3 CAPI module;
SS5, the CAPI card analyzes the parameter address from the structure, thereby obtaining the parameter information and correspondingly updating the stored network model weight and the biased parameter variable;
SS6, the CAPI card receives the picture inference request sent by the monitoring program, and returns the Top5 result output by the network, and the picture identification service of the category can be provided for the outside;
the SS7 can continuously train new classes while the CAPI card provides services, and synchronously update the trained parameters into the CAPI card. Thus, synchronous updating and iteration of training and reasoning are realized.
Compared with the prior art, the invention has the beneficial effects that: the invention discloses a deep learning method of a hybrid architecture, which comprises the following steps: when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file; monitoring the change of the parameter file by a monitoring process at the server end, packaging the virtual address and length information of the weight and bias parameter storage space into a preset data structure, and informing an inference module; the inference module interrupts the inference service, reads the weight and the content of the bias file from the server side through the bus interface, and updates the network model; the server side monitoring process simultaneously processes the input files needing to be inferred and informs the inference module, and the inference module returns the results to the server side monitoring process after finishing the processing; the hybrid architecture deep learning device comprises a server, a training module, an inference module and a bus interface; the server CPU processor, the DDR memory and the network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the invention adopts a set of CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, exerts the advantages and characteristics of respective modules, obtains higher energy efficiency ratio and fully utilizes resources; the CAPI realizes direct access to the memory of the server, and reduces time delay and programming complexity; parameters such as the weight of the inference model and the like can be updated in an online iterative manner in real time.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited to the specific embodiments of the present invention, and any modification, equivalent replacement, improvement, modification, etc. within the spirit and principle of the present invention and the disclosed technical scope should be included in the scope of the present invention.

Claims (9)

1. A deep learning method of a hybrid architecture is disclosed, which realizes deep learning training and reasoning based on a deep learning system and is characterized in that: the deep learning system is a CPU + GPU + CAPI heterogeneous deep learning system which mixes training and reasoning, and comprises the following steps:
s1, when the training data set is updated, the training module carries out deep learning network model training again, and after the deep learning network model training is finished, the weight and the bias parameters of the network model are stored in a preset file;
s2, the server monitoring process monitors the parameter file change, packages the weight and the virtual address and length information of the bias parameter storage space into a preset data structure, and informs the inference module;
s3, the inference module interrupts the inference service, reads the weight and the bias file content from the server side through the bus interface, and updates the network model;
s4, the server side monitoring process processes the input files needing to be reasoned at the same time and informs the inference module, and the inference module returns the result to the server side monitoring process after finishing the process;
CPU + GPU + CAPI heterogeneous deep learning system includes:
the POWER8 processor, DDR memory, network server; a GPU acceleration training module GTX1080 connected with the server through a bus; the CAPI inference module ADM-PCIE-KU3 accelerator card is connected with the server through a bus; the GPU acceleration training module GTX1080 is used for accelerating the training process of the deep learning model; the inference module is preloaded with an AlexNet network model and used for the inference process of deep learning; the server is used for control of deep learning, data processing, network interaction and parameter storage; the bus interface of the server and the training module is a PCI-E or Nvlink bus; the hardware interface of the server and the inference module is PCI-E or BlueLink, and the bus protocol is CAPI; the deep learning method of the hybrid architecture comprises the following implementation steps:
SS1, realizing Alexnet 8-layer network model by using SNAP Framework tool, and writing to CAPI reasoning module;
SS2, acquiring picture data based on a Tensorflow depth frame, and providing the picture data as a training data set for two GTX1080 GPUs to perform distributed training;
SS3, monitoring the process to obtain the latest training result pb file, analyzing the weight and the offset parameter in the pb file to a file A, and acquiring the virtual address and length information stored by the parameter;
the SS4 calls a CAPI kernel library function interface and a driver by the monitoring program, and sends a data structure packaged with parameter information to the ADM-PCIE-KU3 CAPI module;
SS5, the CAPI card analyzes the parameter address from the structure, thereby obtaining the parameter information and correspondingly updating the stored network model weight and the biased parameter variable;
SS6, the CAPI card receives the picture inference request sent by the monitoring program, and returns the Top5 result output by the network, and the picture identification service of the corresponding category can be provided for the outside;
the SS7 can continuously train new classes while the CAPI card provides services, and synchronously update the trained parameters into the CAPI card.
2. The method of claim 1, wherein: the step S1 specifically includes the following sub-steps:
s11, when the training data set is updated, the network model is not changed, and retraining is needed, so as to obtain updated network weight and bias parameters;
and S12, after training, storing the weight and the bias parameters of each layer of the network into a preset file in a format agreed with the reasoning module.
3. The method of claim 1, wherein: the step S2 specifically includes the following sub-steps:
s21, the server side runs a monitoring process, and controls the operation, stop and parameter update of the reasoning module by calling the reasoning module to perform function interface and drive in the kernel library of the server;
s22, the server side monitors whether the weight bias parameters need to be updated or not at all times and acquires the latest parameter information;
and S23, when the update happens, a stop command and updated parameter file information need to be sent to the inference module.
4. The method of claim 1, wherein: the step S3 specifically includes the following sub-steps:
s31, the reasoning module reads corresponding weight and bias information from the server side to the internal RAM directly through the virtual address;
s32, the reasoning module informs the monitoring process after reading is completed, and the monitoring process sends an operation command to the monitoring process;
and S33, the inference module updates the network model parameters and continues to perform inference service.
5. The method of claim 1, wherein: the network model adopts a deep learning model aiming at image classification.
6. A hybrid architecture deep learning apparatus using the method of any one of claims 1-5 for implementing parallelized operations for deep learning training and reasoning, characterized by: the device comprises a server, a training module, an inference module and a bus interface; the server comprises a CPU processor, a DDR memory and a network; the training module and the reasoning module are connected with the server through bus interfaces and can perform connection communication; the CAPI directly accesses the memory of the server.
7. The apparatus of claim 6, wherein: the server has the functions of control, data processing, network interaction and parameter storage for deep learning.
8. The apparatus of claim 6, wherein: the CPU processor is a POWER processor; the training module is a GPU acceleration training module used for accelerating the deep learning model training process.
9. The apparatus of claim 6, wherein: the reasoning module is a CAPI reasoning module which can be loaded with a deep learning network model in advance and is used for the deep learning reasoning process.
CN201710196532.0A 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture Active CN106951926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710196532.0A CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710196532.0A CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Publications (2)

Publication Number Publication Date
CN106951926A CN106951926A (en) 2017-07-14
CN106951926B true CN106951926B (en) 2020-11-24

Family

ID=59474087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710196532.0A Active CN106951926B (en) 2017-03-29 2017-03-29 Deep learning method and device of hybrid architecture

Country Status (1)

Country Link
CN (1) CN106951926B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563512B (en) * 2017-08-24 2023-10-17 腾讯科技(上海)有限公司 Data processing method, device and storage medium
WO2019042571A1 (en) * 2017-09-04 2019-03-07 Huawei Technologies Co., Ltd. Asynchronous gradient averaging distributed stochastic gradient descent
CN107729268B (en) * 2017-09-20 2019-11-12 山东英特力数据技术有限公司 A kind of memory expansion apparatus and method based on CAPI interface
TWI658365B (en) * 2017-10-30 2019-05-01 緯創資通股份有限公司 Connecting module
CN109064382B (en) * 2018-06-21 2023-06-23 北京陌上花科技有限公司 Image information processing method and server
CN109460826A (en) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 For distributing the method, apparatus and model modification system of data
CN109726170A (en) * 2018-12-26 2019-05-07 上海新储集成电路有限公司 A kind of on-chip system chip of artificial intelligence
CN109886408A (en) * 2019-02-28 2019-06-14 北京百度网讯科技有限公司 A kind of deep learning method and device
CN109947682B (en) * 2019-03-21 2021-03-09 浪潮商用机器有限公司 Server mainboard and server
US11176493B2 (en) 2019-04-29 2021-11-16 Google Llc Virtualizing external memory as local to a machine learning accelerator
CN112148470B (en) * 2019-06-28 2022-11-04 富联精密电子(天津)有限公司 Parameter synchronization method, computer device and readable storage medium
CN110399234A (en) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing
CN110533181B (en) * 2019-07-25 2023-07-18 南方电网数字平台科技(广东)有限公司 Rapid training method and system for deep learning model
CN112541513B (en) * 2019-09-20 2023-06-27 百度在线网络技术(北京)有限公司 Model training method, device, equipment and storage medium
CN110598855B (en) * 2019-09-23 2023-06-09 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium
CN111147603A (en) * 2019-09-30 2020-05-12 华为技术有限公司 Method and device for networking reasoning service
TWI780382B (en) * 2019-12-05 2022-10-11 新唐科技股份有限公司 Microcontroller updating system and method
CN113298222A (en) * 2020-02-21 2021-08-24 深圳致星科技有限公司 Parameter updating method based on neural network and distributed training platform system
CN111860260B (en) * 2020-07-10 2024-01-26 逢亿科技(上海)有限公司 High-precision low-calculation target detection network system based on FPGA
CN112465112B (en) * 2020-11-19 2022-06-07 苏州浪潮智能科技有限公司 nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system
CN112581353A (en) * 2020-12-29 2021-03-30 浪潮云信息技术股份公司 End-to-end picture reasoning system facing deep learning model
CN112949427A (en) * 2021-02-09 2021-06-11 北京奇艺世纪科技有限公司 Person identification method, electronic device, storage medium, and apparatus
CN113537284B (en) * 2021-06-04 2023-01-24 中国人民解放军战略支援部队信息工程大学 Deep learning implementation method and system based on mimicry mechanism

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098633A1 (en) * 2014-10-02 2016-04-07 Nec Laboratories America, Inc. Deep learning model for structured outputs with high-order interaction
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN104714852B (en) * 2015-03-17 2018-05-22 华中科技大学 A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN105825235B (en) * 2016-03-16 2018-12-25 新智认知数据服务有限公司 A kind of image-recognizing method based on multi-characteristic deep learning

Also Published As

Publication number Publication date
CN106951926A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN106951926B (en) Deep learning method and device of hybrid architecture
US20190318231A1 (en) Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information
US20190332944A1 (en) Training Method, Apparatus, and Chip for Neural Network Model
KR20200069353A (en) Machine learning runtime library for neural network acceleration
EP4145351A1 (en) Neural network construction method and system
CN110392903A (en) The dynamic of matrix manipulation is rejected
CN111967468A (en) FPGA-based lightweight target detection neural network implementation method
DE102019103310A1 (en) ESTIMATE FOR AN OPTIMAL OPERATING POINT FOR HARDWARE WORKING WITH A RESTRICTION ON THE SHARED PERFORMANCE / HEAT
CN111814959A (en) Model training data processing method, device and system and storage medium
JP2021193546A (en) Method and apparatus for generating image, electronic device, storage medium, and computer program
CN108304925B (en) Pooling computing device and method
CN108304926B (en) Pooling computing device and method suitable for neural network
CN111915555B (en) 3D network model pre-training method, system, terminal and storage medium
CN109492761A (en) Realize FPGA accelerator, the method and system of neural network
KR20240148819A (en) System and method for performing semantic image segmentation
US20230298237A1 (en) Data processing method, apparatus, and device and storage medium
CN113592066A (en) Hardware acceleration method, apparatus, device, computer program product and storage medium
CN110738720A (en) Special effect rendering method and device, terminal and storage medium
KR20200040165A (en) Apparatus of Acceleration for Artificial Neural Network System and Method thereof
JP2022013579A (en) Method and apparatus for processing image, electronic device, and storage medium
DE112020007087T5 (en) Concurrent hash table updates
US12112533B2 (en) Method and apparatus for data calculation in neural network model, and image processing method and apparatus
CN113111201B (en) Digital twin model lightweight method and system
CN115129460A (en) Method and device for acquiring operator hardware time, computer equipment and storage medium
CN115456149B (en) Impulse neural network accelerator learning method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240919

Address after: 272000, No. 431, Chongwen Avenue, high tech Zone, Jining City, Shandong Province

Patentee after: SHANDONG INTELLIGENT OPTICAL COMMUNICATION DEVELOPMENT Co.,Ltd.

Country or region after: China

Address before: 272000 yingteli Industrial Park, 431 Chongwen Avenue, high tech Zone, Jining City, Shandong Province

Patentee before: SHANDONG ITL DATA TECHNIQUE CO.,LTD.

Country or region before: China