CN106951926A - The deep learning systems approach and device of a kind of mixed architecture - Google Patents
The deep learning systems approach and device of a kind of mixed architecture Download PDFInfo
- Publication number
- CN106951926A CN106951926A CN201710196532.0A CN201710196532A CN106951926A CN 106951926 A CN106951926 A CN 106951926A CN 201710196532 A CN201710196532 A CN 201710196532A CN 106951926 A CN106951926 A CN 106951926A
- Authority
- CN
- China
- Prior art keywords
- module
- reasoning
- training
- deep learning
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses the deep learning systems approach and device of a kind of mixed architecture, it is characterized in that comprising the following steps:When training dataset updates, training module re-starts deep learning network model and trains and store weights and offset parameter;Server end monitoring process monitors that Parameter File changes, and is encapsulated in data structure set in advance and notifies reasoning module;Reasoning module interrupts inference service, reads weights and biasing file content from server side and updates network model;Server end monitoring process is handled simultaneously to be needed the input file of reasoning and notifies reasoning module;The system and device includes server module, training module, reasoning module, EBI;The training of the present invention and reasoning mixing CPU+GPU+CAPI isomery deep learning systems, can make full use of resource, obtain higher Energy Efficiency Ratio, realize the direct access server internal memories of CAPI, and real-time online iteration updates the parameters such as inference pattern weights.
Description
Technical field
The present invention relates to circuit design and the technical field of machine learning, more particularly to a kind of depth of mixed architecture
Learning system method and device.
Background technology
21 century IT industry is developed rapidly, and band gives people huge interests and facility.Deep learning application point
For training and two parts of reasoning, so that ImageNet is evaluated and tested as an example, AlexNet model training processes need 8,000,000 totally 1000
The picture of individual classification, by such as AlexNet model extractions feature and counting loss, is then updated by backpropagation such as SGD
Weighting parameter, so as to constantly restrain model, finally gives preferable network model.The process of reasoning is exactly that network is passed through in input
Model carries out a forward direction computing, so as to obtain final classification(It is typically chosen Top5)The process of accuracy rate.Deep learning application
Training process need to use substantial amounts of computing resource and training data, current training platform generally uses NVIDIA height
The GPU of performance such as Tesla P100, Titan X, GTX1080 etc. accelerate training process.After available model is obtained, it is deployed to
Another platform is used for reasoning and externally provides service, because reasoning process only does a forward direction computing, so to calculating
It is required that can be lower, it is more require to be embodied in time delay, there is the cloud service platform based on CPU currently used for the platform of reasoning
, also have based on low-power consumption GPU server clusters, also using FPGA or special ASIC clusters etc..From low delay and efficiently
, can be even better using FPGA and special ASIC on energy.And FPGA has more the flexibility of framework compared to ASIC, obtains
Increasing concern.CAPI is uniformity OverDrive Processor ODP interface (Coherent Accelerator Processor
Interface), be high speed bus interface agreement that IBM is released on POWER processors, physical interface form be PCI-E or
The BlueLink that IBM is released.PSL layers are realized inside CAPI, it is ensured that the memory access uniformity between server, you can with logical
Cross virtual address and directly have access to CPU internal memories, so as to greatly reduce access time delay.And the SNAP Framework that IBM is released
Programmed environment, can use C/C++ easily to realize algorithm model.
Various deep learning method and devices, such as Publication No. CN106022472A China for this people's developmental research
A kind of embedded deep learning processor of patent, the invention belongs to technical field of integrated circuits, specially a kind of based on FPGA's
Embedded deep learning processor;The deep learning processor includes:Central processing unit(CPU), complete processor study and transport
Necessary logical operation, control and storage work during row;Deep learning unit, the hardware of deep learning algorithm realizes list
Member, is the core component for carrying out deep learning processing;The deep learning processor combines single with deep learning with reference to tradition CPU
Member, wherein deep learning assembled unit can be combined by multiple deep learning units, with scalability, can be directed to different
Calculation scale, is used as the core processor of artificial intelligence application.As shown in figure 5, Publication No. CN106156851A China is specially
Sharp a kind of accelerator and method towards deep learning business, for carrying out deep learning to the pending data in server
Calculate, including the calculation control module and first for being arranged at the network interface card of server end, being connected with the server by bus
Memory and second memory;The calculation control module is PLD, including control unit, data storage list
Member, logic storage unit and the EBI communicated respectively with the network interface card, first memory and second memory, first
Communication interface and the second communication interface;The logic storage unit is used for storage depth and learns control logic;First storage
Device is used for the weighted data and biased data for storing each layer of network;Using the present invention, computational efficiency, enhancing can be effectively improved
Can power dissipation ratio.
Prior art has the following disadvantages:1)Conventional method is separated, it is necessary to safeguard two sets of platform rings using training with reasoning
Border, resource is not fully utilized;2)Deep learning calculating is done using FPGA/CPLD completely, computing capability is not powerful enough, at present
It is not particularly suited for large-scale Training scene;3)Communicate general by dma mode between FPGA/CPLD and server, data with
The time delay of interaction is larger between cpu server.It is therefore desirable to propose a kind of new deep learning systems approach and device.
The content of the invention
For the deficiency for the prior art problem to be solved, the invention provides a kind of deep learning system of mixed architecture
Method and device, has played the advantage and feature of respective module, has obtained higher Energy Efficiency Ratio, take full advantage of resource;CAPI
The direct access to server memory is realized, time delay and programming complexity is reduced;The present invention solves the skill of its technical problem
Art scheme is:
A kind of deep learning systems approach of mixed architecture, for realizing to deep learning training and reasoning, comprises the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network
The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space,
Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more
New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed
After return the result to service end monitoring process.
Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated
Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training
To file set in advance;
Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control
Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module;
Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, monitoring process is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
The deep learning grid model of described mixed architecture uses the deep learning network mould for picture classification
Type.
The deep learning system and device of a kind of mixed architecture, for realizing the parallelization behaviour to deep learning training and reasoning
Make, described device includes server module, training module, reasoning module, EBI;The server module is included at CPU
Manage device, DDR internal memories, network;The training module, reasoning module are connected with the server module by EBI, and
Communication can be attached.
Described server module, which has, includes the control for deep learning, data processing, network interaction, parameter storage
Function.
Described CPU processor is POWER processors;Described training module is for accelerating deep learning model training
The GPU of process accelerates training module;Described reasoning module for can be pre-loaded with deep learning network model set in advance and
CAPI reasoning modules for deep learning reasoning process.
Compared with prior art, beneficial effects of the present invention are embodied in:A kind of depth of mixed architecture of the present invention
Learning system method, comprises the following steps:Training dataset has more new change, and training module re-starts deep learning network model
Training, after terminating, the weights and offset parameter of network model are stored to file set in advance;Server end monitoring process is monitored
To Parameter File change, the virtual address of weights and offset parameter memory space, length information are encapsulated into number set in advance
According in structure, and notify reasoning module;Reasoning module interrupt inference service, by EBI from server side read weights and
File content is biased, and updates network model;Server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies
Service end monitoring process is returned the result to after the completion of reasoning module, reasoning module;The deep learning system of the mixed architecture
Device, including server module, training module, reasoning module, EBI;In the server module CPU processor, DDR
Deposit, network;The training module, reasoning module are connected with the server module by EBI, and can be connected
Connect letter;The CPU+GPU+CAPI isomery deep learning systems that the present invention will be trained and reasoning is mixed using a set of platform, are played
The advantage and feature of respective module, obtain higher Energy Efficiency Ratio, take full advantage of resource;CAPI is realized to server memory
Directly access, reduce time delay and programming complexity;The parameter energy real-time online such as weights to inference pattern iteration updates.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the deep learning systems approach of mixed architecture of the present invention.
Fig. 2 is the Organization Chart of the deep learning device of mixed architecture of the present invention.
Fig. 3 is the Organization Chart of the deep learning device of embodiment of the present invention mixed architecture.
Fig. 4 is the present invention using the fundamental diagram exemplified by Alexnet deep learning network models.
Fig. 5 is structured flowchart of the prior embodiment towards the accelerator of deep learning business.
Embodiment
The present invention is described in further detail with reference to accompanying drawing 1 to Fig. 5, so that the public preferably grasps the embodiment party of the present invention
Method, specific embodiment of the present invention is:
As shown in figure 1, a kind of deep learning systems approach of mixed architecture of the present invention, for realizing to deep learning
Training and reasoning, comprise the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network
The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space,
Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more
New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed
After return the result to service end monitoring process.
Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated
Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training
To file set in advance;
Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control
Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module;
Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, finger daemon is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
As shown in Fig. 2 the deep learning system and device of described mixed architecture, for realizing to deep learning training and
The parallelization operation of reasoning, it is characterised in that:Described device includes server module, training module, reasoning module, bus and connect
Mouthful;The server module CPU processor, DDR internal memories, network;The training module, reasoning module with the server mould
Block is connected by EBI, and can be attached communication;Described server module is to include the control for deep learning
System, data processing, network interaction, the server module of parameter store function;Described CPU processor is POWER processors;Institute
The training module stated is for accelerating the GPU of deep learning model training process to accelerate training module;Described reasoning module is
Deep learning network model set in advance can be pre-loaded with and for the CAPI reasoning modules of deep learning reasoning process;It is described
The EBI of server module and training module is PCI-E or Nvlink buses;The server module and reasoning module
Hardware interface be PCI-E or BlueLink, bus protocol is CAPI.
It is preferred that, as shown in figure 4, the deep learning grid model of described mixed architecture is directed to figure using a kind of
The Alexnet deep learning network models of piece classification.For the ease of the understanding of the present invention program, below with deep using Alexnet
Spend exemplified by learning network model, briefly explain the operation principle of the present invention:Described Alexnet deep learning network models are by 5
Relu, Pooling and Normalization behaviour are also added into layer convolutional layer and 3 layers of full articulamentum composition, part convolutional layer
Make, last layer of full articulamentum exports the Softmax layers of 1000 classification.Alexnet models can be used for extensive picture point
Class, according to the difference of training dataset, can do classification based training, and provide picture classification service for different situations.
Embodiment 1
As shown in figure 3, as preferred preferred forms, such as realizing Alexnet picture classification problem:The hybrid frame
The deep learning device of structure, for realizing that the parallelization to deep learning training and reasoning is operated, including POWER8 processors,
The server module of the compositions such as DDR internal memories, network;The GPU being connected with the server by bus accelerates training module
GTX1080;The CAPI reasoning module ADM-PCIE-KU3 accelerator cards being connected with the server by bus.Described GPU instructions
Practice the training process that module is used to accelerate deep learning model;Described reasoning module is pre-loaded with AlexNet network models, uses
In the reasoning process of deep learning;Described server module is for the control of deep learning, data processing, network interaction, ginseng
Number storage etc.;The EBI of the server module and training module is PCI-E or Nvlink buses;The server mould
The hardware interface of block and reasoning module is PCI-E or BlueLink, and bus protocol is CAPI.
The deep learning systems approach of the device mixed architecture, realizes that step is as follows:
S1, uses SNAP Framework instruments(A kind of use C/C++ realizes the algorithm model work run in CAPI cards
Tool)The layer network models of Alexnet 8 are realized, and are write with a brush dipped in Chinese ink into CAPI reasoning modules;
S2, based on Tensorflow depth frameworks, gets 3,000,000 pictures of 300 kinds of for example marked birds
TFRecrods pictures, are supplied to two pieces of GTX1080 GPU to do distributed training as training dataset;
S3, monitoring process obtains newest training result pb files, parses weights therein and offset parameter to file A, and obtain
Take virtual address and length information that parameter is stored;
S4, monitoring program calls CAPI kernel libraries function interface and driving, sends and encapsulates to ADM-PCIE-KU3 CAPI modules
The data structure of parameter information;
S5, CAPI card analytic parameter address from structure, so that get parms information and the network model power of correspondence renewal storage
Value and the parametric variable of biasing;
The picture reasoning request that monitoring program is sent is received in S6, CAPI clamping, and the Top5 results that network is exported are returned, can be external
The picture identification service of the category is provided;
S7, while CAPI card offer services, training network can also constantly carry out the training of newly-increased classification, and will train
Into parameter synchronization update into CAPI cards.It is achieved thereby that the synchronized update and iteration of training and reasoning.
Compared with prior art, beneficial effects of the present invention are embodied in:A kind of depth of mixed architecture of the present invention
Learning system method, comprises the following steps:Training dataset has more new change, and training module re-starts deep learning network model
Training, after terminating, the weights and offset parameter of network model are stored to file set in advance;Server end monitoring process is monitored
To Parameter File change, the virtual address of weights and offset parameter memory space, length information are encapsulated into number set in advance
According in structure, and notify reasoning module;Reasoning module interrupt inference service, by EBI from server side read weights and
File content is biased, and updates network model;Server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies
Service end monitoring process is returned the result to after the completion of reasoning module, reasoning module;The deep learning system of the mixed architecture
Device, including server module, training module, reasoning module, EBI;In the server module CPU processor, DDR
Deposit, network;The training module, reasoning module are connected with the server module by EBI, and can be connected
Connect letter;The CPU+GPU+CAPI isomery deep learning systems that the present invention will be trained and reasoning is mixed using a set of platform, are played
The advantage and feature of respective module, obtain higher Energy Efficiency Ratio, take full advantage of resource;CAPI is realized to server memory
Directly access, reduce time delay and programming complexity;The parameter energy real-time online such as weights to inference pattern iteration updates.
Presently preferred embodiments of the present invention is the foregoing is only, but protection scope of the present invention is not restricted to the present invention
Embodiment, it is all in the spirit and principles in the present invention, disclose within technical scope, any modification for being made, equally replace
Change, improve, retrofit, should be included within the scope of the present invention.
Claims (9)
1. the deep learning systems approach of a kind of mixed architecture, for realizing to deep learning training and reasoning, it is characterised in that:
Comprise the following steps:
S1, training dataset has more new change, and training module re-starts the training of deep learning network model, after terminating, network
The weights and offset parameter of model are stored to file set in advance;
S2, server end monitoring process monitor Parameter File change, by the virtual address of weights and offset parameter memory space,
Length information is encapsulated into data structure set in advance, and notifies reasoning module;
S3, reasoning module interrupts inference service, and weights and biasing file content are read from server side by EBI, and more
New network model;
S4, server end monitoring process is handled simultaneously needs the input file of reasoning, and notifies reasoning module, and reasoning module is completed
After return the result to service end monitoring process.
2. according to the method described in claim 1, it is characterised in that:Described step S1 specifically includes following sub-step:
S11, during the more new change of training dataset, does not change network model, it is necessary to re -training, so that the net after being updated
Network weights and offset parameter;
S12, it is necessary to which the weights and offset parameter of each layer of network are stored with the form appointed with reasoning module after the completion of training
To file set in advance.
3. according to the method described in claim 1, it is characterised in that:Described step S2 specifically includes following sub-step:
S21, service end operational monitoring process, by calling reasoning module in kernel server built-in function interface and driving, control
Operation, stopping and the parameter of reasoning module update;
S22, the service end monitoring process moment has monitored whether that weights offset parameter needs renewal, and obtains most recent parameters information;
S23, when having more kainogenesis, it is necessary to send the parameter file information ceasing and desisting order and update to reasoning module.
4. according to the method described in claim 1, it is characterised in that:Described step S3 specifically includes following sub-step:
S31, reasoning module directly reads corresponding weights, offset information to internal RAM by virtual address from service end;
S32, reasoning module notifies monitoring process after the completion of reading, monitoring process is sent to operation order;
S33, reasoning module updates network model parameter, proceeds inference service.
5. according to the method described in claim 1, it is characterised in that:Described network model is using a kind of for picture classification
Deep learning model.
6. a kind of device of the deep learning system of the mixed architecture as described in any one of Claims 1 to 5, for realizing to depth
The parallelization operation of learning training and reasoning, it is characterised in that:Described device includes server, training module, reasoning module, total
Line interface;The server module includes CPU processor, DDR internal memories, network;The training module, reasoning module with service
Device module is connected by EBI, and can be attached communication.
7. device according to claim 6, it is characterised in that:Described server module, which has, to be included being used for deep learning
Control, data processing, network interaction, parameter store function.
8. device according to claim 6, it is characterised in that:Described CPU processor is POWER processors;Described
Training module is for accelerating the GPU of deep learning model training process to accelerate training module.
9. device according to claim 6, it is characterised in that:Described reasoning module is that can be pre-loaded with deep learning net
Network model, and for the CAPI reasoning modules of deep learning reasoning process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710196532.0A CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710196532.0A CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951926A true CN106951926A (en) | 2017-07-14 |
CN106951926B CN106951926B (en) | 2020-11-24 |
Family
ID=59474087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710196532.0A Active CN106951926B (en) | 2017-03-29 | 2017-03-29 | Deep learning method and device of hybrid architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951926B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563512A (en) * | 2017-08-24 | 2018-01-09 | 腾讯科技(上海)有限公司 | A kind of data processing method, device and storage medium |
CN107729268A (en) * | 2017-09-20 | 2018-02-23 | 山东英特力数据技术有限公司 | A kind of memory expansion apparatus and method based on CAPI interfaces |
CN109064382A (en) * | 2018-06-21 | 2018-12-21 | 北京陌上花科技有限公司 | Image information processing method and server |
CN109460826A (en) * | 2018-10-31 | 2019-03-12 | 北京字节跳动网络技术有限公司 | For distributing the method, apparatus and model modification system of data |
TWI658365B (en) * | 2017-10-30 | 2019-05-01 | 緯創資通股份有限公司 | Connecting module |
CN109726170A (en) * | 2018-12-26 | 2019-05-07 | 上海新储集成电路有限公司 | A kind of on-chip system chip of artificial intelligence |
CN109886408A (en) * | 2019-02-28 | 2019-06-14 | 北京百度网讯科技有限公司 | A kind of deep learning method and device |
CN109947682A (en) * | 2019-03-21 | 2019-06-28 | 浪潮商用机器有限公司 | A kind of server master board and server |
CN110399234A (en) * | 2019-07-10 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing |
CN110533181A (en) * | 2019-07-25 | 2019-12-03 | 深圳市康拓普信息技术有限公司 | A kind of quick training method and system of deep learning model |
CN110598855A (en) * | 2019-09-23 | 2019-12-20 | Oppo广东移动通信有限公司 | Deep learning model generation method, device, equipment and storage medium |
CN111052155A (en) * | 2017-09-04 | 2020-04-21 | 华为技术有限公司 | Distributed random gradient descent method for asynchronous gradient averaging |
CN111147603A (en) * | 2019-09-30 | 2020-05-12 | 华为技术有限公司 | Method and device for networking reasoning service |
CN111860260A (en) * | 2020-07-10 | 2020-10-30 | 逢亿科技(上海)有限公司 | High-precision low-computation target detection network system based on FPGA |
CN112148470A (en) * | 2019-06-28 | 2020-12-29 | 鸿富锦精密电子(天津)有限公司 | Parameter synchronization method, computer device and readable storage medium |
CN112465112A (en) * | 2020-11-19 | 2021-03-09 | 苏州浪潮智能科技有限公司 | nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system |
CN112541513A (en) * | 2019-09-20 | 2021-03-23 | 百度在线网络技术(北京)有限公司 | Model training method, device, equipment and storage medium |
CN112581353A (en) * | 2020-12-29 | 2021-03-30 | 浪潮云信息技术股份公司 | End-to-end picture reasoning system facing deep learning model |
CN112925533A (en) * | 2019-12-05 | 2021-06-08 | 新唐科技股份有限公司 | Microcontroller update system and method |
CN112949427A (en) * | 2021-02-09 | 2021-06-11 | 北京奇艺世纪科技有限公司 | Person identification method, electronic device, storage medium, and apparatus |
CN113298222A (en) * | 2020-02-21 | 2021-08-24 | 深圳致星科技有限公司 | Parameter updating method based on neural network and distributed training platform system |
TWI741416B (en) * | 2019-04-29 | 2021-10-01 | 美商谷歌有限責任公司 | Virtualizing external memory as local to a machine learning accelerator |
CN113537284A (en) * | 2021-06-04 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | Deep learning implementation method and system based on mimicry mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN104714852A (en) * | 2015-03-17 | 2015-06-17 | 华中科技大学 | Parameter synchronization optimization method and system suitable for distributed machine learning |
US20160098633A1 (en) * | 2014-10-02 | 2016-04-07 | Nec Laboratories America, Inc. | Deep learning model for structured outputs with high-order interaction |
CN105825235A (en) * | 2016-03-16 | 2016-08-03 | 博康智能网络科技股份有限公司 | Image identification method based on deep learning of multiple characteristic graphs |
US20160267380A1 (en) * | 2015-03-13 | 2016-09-15 | Nuance Communications, Inc. | Method and System for Training a Neural Network |
-
2017
- 2017-03-29 CN CN201710196532.0A patent/CN106951926B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098633A1 (en) * | 2014-10-02 | 2016-04-07 | Nec Laboratories America, Inc. | Deep learning model for structured outputs with high-order interaction |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
US20160267380A1 (en) * | 2015-03-13 | 2016-09-15 | Nuance Communications, Inc. | Method and System for Training a Neural Network |
CN104714852A (en) * | 2015-03-17 | 2015-06-17 | 华中科技大学 | Parameter synchronization optimization method and system suitable for distributed machine learning |
CN105825235A (en) * | 2016-03-16 | 2016-08-03 | 博康智能网络科技股份有限公司 | Image identification method based on deep learning of multiple characteristic graphs |
Non-Patent Citations (1)
Title |
---|
余子健等: "基于FPGA的卷积神经网络加速器", 《计算机工程》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563512A (en) * | 2017-08-24 | 2018-01-09 | 腾讯科技(上海)有限公司 | A kind of data processing method, device and storage medium |
CN107563512B (en) * | 2017-08-24 | 2023-10-17 | 腾讯科技(上海)有限公司 | Data processing method, device and storage medium |
CN111052155A (en) * | 2017-09-04 | 2020-04-21 | 华为技术有限公司 | Distributed random gradient descent method for asynchronous gradient averaging |
CN111052155B (en) * | 2017-09-04 | 2024-04-16 | 华为技术有限公司 | Distribution of asynchronous gradient averages random gradient descent method |
CN107729268A (en) * | 2017-09-20 | 2018-02-23 | 山东英特力数据技术有限公司 | A kind of memory expansion apparatus and method based on CAPI interfaces |
CN107729268B (en) * | 2017-09-20 | 2019-11-12 | 山东英特力数据技术有限公司 | A kind of memory expansion apparatus and method based on CAPI interface |
CN109726159A (en) * | 2017-10-30 | 2019-05-07 | 纬创资通股份有限公司 | Link block |
TWI658365B (en) * | 2017-10-30 | 2019-05-01 | 緯創資通股份有限公司 | Connecting module |
CN109726159B (en) * | 2017-10-30 | 2020-12-04 | 纬创资通股份有限公司 | Connection module |
CN109064382A (en) * | 2018-06-21 | 2018-12-21 | 北京陌上花科技有限公司 | Image information processing method and server |
CN109064382B (en) * | 2018-06-21 | 2023-06-23 | 北京陌上花科技有限公司 | Image information processing method and server |
CN109460826A (en) * | 2018-10-31 | 2019-03-12 | 北京字节跳动网络技术有限公司 | For distributing the method, apparatus and model modification system of data |
CN109726170A (en) * | 2018-12-26 | 2019-05-07 | 上海新储集成电路有限公司 | A kind of on-chip system chip of artificial intelligence |
CN109886408A (en) * | 2019-02-28 | 2019-06-14 | 北京百度网讯科技有限公司 | A kind of deep learning method and device |
CN109947682A (en) * | 2019-03-21 | 2019-06-28 | 浪潮商用机器有限公司 | A kind of server master board and server |
CN109947682B (en) * | 2019-03-21 | 2021-03-09 | 浪潮商用机器有限公司 | Server mainboard and server |
TWI741416B (en) * | 2019-04-29 | 2021-10-01 | 美商谷歌有限責任公司 | Virtualizing external memory as local to a machine learning accelerator |
TWI777775B (en) * | 2019-04-29 | 2022-09-11 | 美商谷歌有限責任公司 | Virtualizing external memory as local to a machine learning accelerator |
US11176493B2 (en) | 2019-04-29 | 2021-11-16 | Google Llc | Virtualizing external memory as local to a machine learning accelerator |
CN112148470B (en) * | 2019-06-28 | 2022-11-04 | 富联精密电子(天津)有限公司 | Parameter synchronization method, computer device and readable storage medium |
CN112148470A (en) * | 2019-06-28 | 2020-12-29 | 鸿富锦精密电子(天津)有限公司 | Parameter synchronization method, computer device and readable storage medium |
CN110399234A (en) * | 2019-07-10 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing |
CN110533181A (en) * | 2019-07-25 | 2019-12-03 | 深圳市康拓普信息技术有限公司 | A kind of quick training method and system of deep learning model |
CN110533181B (en) * | 2019-07-25 | 2023-07-18 | 南方电网数字平台科技(广东)有限公司 | Rapid training method and system for deep learning model |
CN112541513A (en) * | 2019-09-20 | 2021-03-23 | 百度在线网络技术(北京)有限公司 | Model training method, device, equipment and storage medium |
CN110598855A (en) * | 2019-09-23 | 2019-12-20 | Oppo广东移动通信有限公司 | Deep learning model generation method, device, equipment and storage medium |
CN111147603A (en) * | 2019-09-30 | 2020-05-12 | 华为技术有限公司 | Method and device for networking reasoning service |
CN112925533A (en) * | 2019-12-05 | 2021-06-08 | 新唐科技股份有限公司 | Microcontroller update system and method |
CN113298222A (en) * | 2020-02-21 | 2021-08-24 | 深圳致星科技有限公司 | Parameter updating method based on neural network and distributed training platform system |
CN111860260A (en) * | 2020-07-10 | 2020-10-30 | 逢亿科技(上海)有限公司 | High-precision low-computation target detection network system based on FPGA |
CN111860260B (en) * | 2020-07-10 | 2024-01-26 | 逢亿科技(上海)有限公司 | High-precision low-calculation target detection network system based on FPGA |
CN112465112A (en) * | 2020-11-19 | 2021-03-09 | 苏州浪潮智能科技有限公司 | nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system |
CN112465112B (en) * | 2020-11-19 | 2022-06-07 | 苏州浪潮智能科技有限公司 | nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system |
US12001960B2 (en) | 2020-11-19 | 2024-06-04 | Inspur Suzhou Intelligent Technology Co., Ltd. | NGraph-based GPU backend distributed training method and system |
CN112581353A (en) * | 2020-12-29 | 2021-03-30 | 浪潮云信息技术股份公司 | End-to-end picture reasoning system facing deep learning model |
CN112949427A (en) * | 2021-02-09 | 2021-06-11 | 北京奇艺世纪科技有限公司 | Person identification method, electronic device, storage medium, and apparatus |
CN113537284B (en) * | 2021-06-04 | 2023-01-24 | 中国人民解放军战略支援部队信息工程大学 | Deep learning implementation method and system based on mimicry mechanism |
CN113537284A (en) * | 2021-06-04 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | Deep learning implementation method and system based on mimicry mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN106951926B (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
Guo et al. | Cloud resource scheduling with deep reinforcement learning and imitation learning | |
CN107103113B (en) | The Automation Design method, apparatus and optimization method towards neural network processor | |
CN108460457A (en) | A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks | |
CN107704922A (en) | Artificial neural network processing unit | |
CN109376843A (en) | EEG signals rapid classification method, implementation method and device based on FPGA | |
WO2022068663A1 (en) | Memory allocation method, related device, and computer readable storage medium | |
CN109472356A (en) | A kind of accelerator and method of restructural neural network algorithm | |
CN108268425A (en) | Programmable matrix handles engine | |
CN108416436A (en) | The method and its system of neural network division are carried out using multi-core processing module | |
CN108829515A (en) | A kind of cloud platform computing system and its application method | |
CN108416433A (en) | A kind of neural network isomery acceleration method and system based on asynchronous event | |
CN105718996B (en) | Cellular array computing system and communication means therein | |
CN110163353A (en) | A kind of computing device and method | |
CN113642734A (en) | Distributed training method and device for deep learning model and computing equipment | |
CN113449839A (en) | Distributed training method, gradient communication device and computing equipment | |
CN115828831B (en) | Multi-core-chip operator placement strategy generation method based on deep reinforcement learning | |
CN209231976U (en) | A kind of accelerator of restructural neural network algorithm | |
CN112686379B (en) | Integrated circuit device, electronic apparatus, board and computing method | |
CN110163350A (en) | A kind of computing device and method | |
Banerjee et al. | Re-designing CNTK deep learning framework on modern GPU enabled clusters | |
CN106776466A (en) | A kind of FPGA isomeries speed-up computation apparatus and system | |
CN115345285A (en) | GPU-based timing chart neural network training method and system and electronic equipment | |
CN112835844B (en) | Communication sparsification method for impulse neural network calculation load | |
CN109359542A (en) | The determination method and terminal device of vehicle damage rank neural network based |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |