CN110866610A - Deep learning model distributed operation method and device - Google Patents

Deep learning model distributed operation method and device Download PDF

Info

Publication number
CN110866610A
CN110866610A CN201911140560.6A CN201911140560A CN110866610A CN 110866610 A CN110866610 A CN 110866610A CN 201911140560 A CN201911140560 A CN 201911140560A CN 110866610 A CN110866610 A CN 110866610A
Authority
CN
China
Prior art keywords
virtual processor
deep learning
learning model
operator
hardware resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911140560.6A
Other languages
Chinese (zh)
Inventor
赵谦谦
仝培霖
赵红博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201911140560.6A priority Critical patent/CN110866610A/en
Publication of CN110866610A publication Critical patent/CN110866610A/en
Priority to PCT/CN2020/104006 priority patent/WO2021098269A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a device for deep learning model distributed operation, wherein the method comprises the following steps: registering a virtual processor in a device management list; registering and writing operators supported by the virtual processor; detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources; configuring a deep learning model based on an operator supported by a virtual processor, and assigning the virtual processor to the operator used in the deep learning model; and the virtual processor allocates the input data of the corresponding operator to the hardware resources associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator. The invention introduces the concept of virtual processors, appoints the virtual processors as operation equipment for corresponding operators, and distributes the operation to different hardware equipment to realize parallel execution, thereby realizing heterogeneous acceleration of deep learning model operation.

Description

Deep learning model distributed operation method and device
Technical Field
The invention relates to the technical field of artificial intelligence. The invention further relates to a method and a device for distributed operation of the deep learning model.
Background
TensorFlow is the most widely used deep learning framework in the field of deep learning at present, many deep learning models are realized based on TensorFlow, and most hardware manufacturers (including ASIC and FPGA manufacturers) use TensorFlow as a primary support framework for deep learning. At present, the common reasoning and calculating units are GPU, CPU and TPU, and FPGA is not supported for deep learning training.
TensorFlow is an operation in a data flow graph mode, and is supported according to an operator, and the current implementation scheme can only distribute one operator to only one hardware when the operation is executed. When the model is executed in series, other hardware can not calculate in parallel when waiting for the result of the last operator.
In addition, most current manufacturers only support the reasoning of the TensorFlow model, but usually only CPU, GPU and TPU support the training of TensorFlow. Some manufacturers realize that the FPGA supports the TensorFlow inference, and the existing scheme that the TensorFlow supports the FPGA training is only limited to realize the FPGA training in a single machine scene.
In addition, most of the existing technical solutions are based on a GPU. Compared with the FPGA, the GPU has low power consumption. The existing FPGA scheme only supports single machine training, the training of a TensorFlow large model needs the previous month time, the model development period is long, and the ever-increasing model training requirement cannot be met.
Based on the above problems, it is necessary to provide a method for supporting simultaneous accelerated operation of multiple types of hardware in the tensrflow, so as to realize support of the VPU of the virtual processor on the basis of utilizing the original operation mechanism and programming interface of the tensrflow, thereby accelerating the operation speed of the deep learning model.
Disclosure of Invention
In one aspect, the present invention provides a method for distributed operation of a deep learning model based on the above object, wherein the method comprises the following steps:
registering a virtual processor in a device management list;
registering and writing operators supported by the virtual processor;
detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
configuring a deep learning model based on an operator supported by a virtual processor, and assigning the virtual processor to the operator used in the deep learning model;
and the virtual processor allocates the input data of the corresponding operator to the hardware resources associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
According to the embodiment of the deep learning model distributed operation method, the hardware resources associated with the virtual processor comprise one or more of a CPU, a GPU and an FPGA.
According to an embodiment of the method for deep learning model distributed operations of the present invention, wherein registering and writing operators supported by the virtual processor further comprises: and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator.
An embodiment of the method for deep learning model distributed operation according to the present invention, wherein configuring the deep learning model based on operators supported by a virtual processor, and assigning the virtual processor to the operators used in the deep learning model further comprises: and constructing a deep learning model based on a TensorFlow framework, and selecting operators supported by corresponding virtual processors for each layer in the deep learning model.
According to an embodiment of the method for deep learning model distributed operations of the present invention, the virtual processor supports operators including a forward operator and a backward operator associated with the forward operator.
In another aspect, the present invention further provides an apparatus for distributed operation of a deep learning model, where the apparatus includes:
at least one processor; and
a memory storing processor-executable program instructions that, when executed by the processor, perform the steps of:
registering a virtual processor in a device management list;
registering and writing operators supported by the virtual processor;
detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
configuring a deep learning model based on an operator supported by a virtual processor, and assigning the virtual processor to the operator used in the deep learning model;
and the virtual processor allocates the input data of the corresponding operator to the hardware resources associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
An embodiment of the apparatus for deep learning model distributed operation according to the present invention, wherein the hardware resources associated with the virtual processor comprise one or more of a CPU, a GPU and an FPGA.
An embodiment of the apparatus for deep learning model distributed operations according to the present invention, wherein registering and writing operators supported by the virtual processor further comprises: and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator.
An embodiment of the apparatus for deep learning model distributed operation according to the present invention, wherein the deep learning model is configured based on operators supported by a virtual processor, and the specifying the virtual processor for the operators used in the deep learning model further comprises: and constructing a deep learning model based on a TensorFlow framework, and selecting operators supported by corresponding virtual processors for each layer in the deep learning model.
An embodiment of the apparatus for deep learning model distributed operations according to the present invention is described, wherein the virtual processor supported operators comprise a forward operator and a backward operator associated with the forward operator.
By adopting the technical scheme, the invention at least has the following beneficial effects: the distributed heterogeneous acceleration operation is supported in the operation process of the deep learning model, the concept of a virtual processor is introduced, the virtual processor is appointed to serve as operation equipment for corresponding operators, and the operation is distributed to different hardware equipment to realize parallel execution, so that the heterogeneous acceleration of the deep learning model operation is realized.
The present invention provides aspects of embodiments, which should not be used to limit the scope of the present invention. Other embodiments are contemplated in accordance with the techniques described herein, as will be apparent to one of ordinary skill in the art upon study of the following figures and detailed description, and are intended to be included within the scope of the present application.
Embodiments of the invention are explained and described in more detail below with reference to the drawings, but they should not be construed as limiting the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the prior art and the embodiments will be briefly described below, parts in the drawings are not necessarily drawn to scale, and related elements may be omitted, or in some cases the scale may have been exaggerated in order to emphasize and clearly show the novel features described herein. In addition, the structural order may be arranged differently, as is known in the art.
FIG. 1 shows a schematic block diagram of a method of deep learning model distributed computation according to the present invention.
Detailed Description
While the present invention may be embodied in various forms, there is shown in the drawings and will hereinafter be described some exemplary and non-limiting embodiments, with the understanding that the present disclosure is to be considered an exemplification of the invention and is not intended to limit the invention to the specific embodiments illustrated.
FIG. 1 shows a schematic block diagram of a method of deep learning model distributed computation according to the present invention. In the embodiment shown in the figure, the method comprises at least the following steps:
s1: registering a virtual processor in a device management list;
s2: registering and writing operators supported by the virtual processor;
s3: detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
s4: configuring a deep learning model based on an operator supported by a virtual processor, and assigning the virtual processor to the operator used in the deep learning model;
s5: and the virtual processor allocates the input data of the corresponding operator to the hardware resources associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
For the purpose of achieving heterogeneous acceleration, the concept of a Virtual Processor (VPU) is specified in an embodiment of the present invention in preferably a TensorFlow. Therefore, registration of the virtual processor VPU is first added according to the hardware registration mechanism of the tensrflow, so that the VPU device appears in the device list (device list) of the tensrflow. On this basis, step S2 registers and writes operators supported by the virtual processor. Specifically, the operators supported by the VPU are registered according to the operator registration mechanism of tensrflow, taking a two-dimensional convolution as an example, the operator "Conv 2D" is registered according to the following format:
REGISTER_KERNEL_BUILDER(Name("Conv2D").Device(DEVICE_VPU).TypeConstraint<float>("T"),Conv2DOp<VPUDevice,float>);
where "Conv 2D" is the name of the operator and "facility" needs to be registered as "DEVICE _ VPU" to indicate that the virtual processor supports the operator. And the Name needs to be the same as the CPU version of the operator in the original TensorFlow, so that all the CPU, GPU and TPU two-dimensional convolution models can be compatible. And then writing a corresponding code instruction according to the arithmetic logic required by the operator.
Before deep learning training, step S3 detects hardware resources associated with virtual processors in the current host, and determines respective allocation proportions of the hardware resources according to the computational power of the associated hardware resources. In some embodiments of the invention, the hardware resources associated with the virtual processor include one or more of a CPU, GPU and FPGA. For example, if a 1T computational force FPGA, a 2T computational force GPU, and a 0.5T computational force CPU are on line in the current host, the distribution ratio is 2: 4: 1. subsequently, step S4 configures a deep learning model based on operators supported by the virtual processor, and specifies the virtual processor for the operator used in the deep learning model. That is, the respective operators are matched among the operators registered and written in step S2 for the respective layers of the deep learning model according to the operation requirement. The VPU is then designated at the application layer as the designated running device, for example using TF.device ("/VPU: N") to designate the VPU device N that needs to be used, where N is the device number of the virtual processor VPU. Finally, step S5 is that the virtual processor allocates input data of a corresponding operator to the hardware resource associated with the virtual processor according to the allocation proportion to perform an operation, and merges operation results of the hardware resources into an output of the corresponding operator. Taking the above hardware resource situation as an example, according to distribution ratio 2: 4: 1, distributing the input data of the operator to the FPGA, the GPU and the CPU, and simultaneously operating the distributed input data on corresponding hardware. And combining the calculated results to obtain the output of the operator, and transmitting the output to the next layer of the deep learning model as input. The data are scattered on different hardware resources for parallel operation, so that the operation speed is greatly increased, and the training efficiency of the deep learning model is improved.
Further embodiments of the present invention will be described below, it being noted that the numbering of the steps mentioned therein is used only for the convenience of unambiguously indicating the step without any particular indication, and does not limit the order of the steps described.
In several embodiments of the method for deep learning model distributed operations of the present invention, the step S2 of registering and writing operators supported by a virtual processor further comprises: and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator. Because the virtual processor can be associated with one or more of the CPU, the GPU and the FPGA, and the logic processes required by the CPU, the GPU and the FPGA have differences when the same function is completed, the operation instructions and the corresponding adaptive instructions for the CPU, the GPU and the FPGA are written in the same operator when the operator is written.
In some embodiments of the method for deep learning model distributed operation of the present invention, the step S4 configures the deep learning model based on operators supported by the virtual processor, and the specifying the virtual processor for the operators used in the deep learning model further comprises: and constructing a deep learning model based on a TensorFlow framework, and selecting operators supported by corresponding virtual processors for each layer in the deep learning model. Tensorflow is a second generation artificial intelligence learning system developed by Google based on DistBerief, and the naming of Tensorflow comes from the operation principle of Google. Tensor means an N-dimensional array, Flow means a computation based on a dataflow graph, and tensorial is a computation process where tensors Flow from one end of the Flow graph to the other. Tensorflow is a system that transports complex data structures into artificial intelligent neural networks for analysis and processing. Therefore, in embodiments of the present invention, the deep learning model is preferably constructed based on the TensorFlow framework. And corresponding operators supported by the virtual processor are selected for each layer in the deep learning model so as to execute operation based on the virtual processor in the following.
In one or more embodiments of the method for deep learning model distributed operations of the present invention, the operators supported by the virtual processor include a Forward (Forward) operator and a Backward (Backward) operator associated with the Forward operator. For example, the aforementioned "Conv 2D" is a forward operator, and the "Conv 2D" operator should be registered and written at the same time as a backward operator related to the "Conv 2D" operator.
In another aspect, the present invention further provides an apparatus for distributed operation of a deep learning model, where the apparatus includes: at least one processor; and a memory storing processor-executable program instructions that, when executed by the processor, perform the steps of:
s1: registering a virtual processor in a device management list;
s2: registering and writing operators supported by the virtual processor;
s3: detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
s4: configuring a deep learning model based on an operator supported by a virtual processor, and assigning the virtual processor to the operator used in the deep learning model;
s5: and the virtual processor allocates the input data of the corresponding operator to the hardware resources associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
In some embodiments of the apparatus for deep learning model distributed operations of the present invention, the hardware resources associated with the virtual processor include one or more of a CPU, a GPU, and an FPGA.
In several embodiments of the apparatus for deep learning model distributed operations of the present invention, the step S2 of registering and writing operators supported by a virtual processor further comprises: and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator.
In some embodiments of the apparatus for deep learning model distributed operation of the present invention, the step S4 configures the deep learning model based on operators supported by the virtual processor, and the specifying the virtual processor for the operators used in the deep learning model further includes: and constructing a deep learning model based on a TensorFlow framework, and selecting operators supported by corresponding virtual processors for each layer in the deep learning model.
In one or more embodiments of the apparatus for deep learning model distributed operations of the present invention, the virtual processor supported operators include a forward operator and a backward operator associated with the forward operator.
The devices and apparatuses disclosed in the embodiments of the present invention may be various electronic terminal apparatuses, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal apparatus, such as a server, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of device and apparatus. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
The computer-readable storage media (e.g., memory) described herein may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
By adopting the technical scheme, the invention at least has the following beneficial effects: the distributed heterogeneous acceleration operation is supported in the operation process of the deep learning model, the concept of a virtual processor is introduced, the virtual processor is appointed to serve as operation equipment for corresponding operators, and the operation is distributed to different hardware equipment to realize parallel execution, so that the heterogeneous acceleration of the deep learning model operation is realized.
It is to be understood that the features listed above for the different embodiments may be combined with each other to form further embodiments within the scope of the invention, where technically feasible. Furthermore, the specific examples and embodiments described herein are non-limiting, and various modifications of the structure, steps and sequence set forth above may be made without departing from the scope of the invention.
In this application, the use of the conjunction of the contrary intention is intended to include the conjunction. The use of definite or indefinite articles is not intended to indicate cardinality. In particular, references to "the" object or "an" and "an" object are intended to mean one of many such objects possible. However, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated. Furthermore, the conjunction "or" may be used to convey simultaneous features, rather than mutually exclusive schemes. In other words, the conjunction "or" should be understood to include "and/or". The term "comprising" is inclusive and has the same scope as "comprising".
The above-described embodiments, particularly any "preferred" embodiments, are possible examples of implementations, and are presented merely for a clear understanding of the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing substantially from the spirit and principles of the technology described herein. All such modifications are intended to be included within the scope of this disclosure.

Claims (10)

1. A method of deep learning model distributed operations, the method comprising the steps of:
registering a virtual processor in a device management list;
registering and writing operators supported by the virtual processor;
detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
configuring a deep learning model based on operators supported by the virtual processor, and assigning the virtual processor to the operators used in the deep learning model;
and the virtual processor allocates the input data of the corresponding operator to the hardware resource associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
2. The method of claim 1, wherein the hardware resources associated with the virtual processor comprise one or more of a CPU, a GPU, and an FPGA.
3. The method of claim 2, wherein registering and writing operators supported by the virtual processor further comprises:
and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator.
4. The method of claim 1, wherein configuring a deep learning model based on operators supported by the virtual processor, and wherein assigning a virtual processor to an operator used in the deep learning model further comprises:
and constructing the deep learning model based on a TensorFlow framework, and selecting corresponding operators supported by the virtual processor for each layer in the deep learning model.
5. The method of claim 1, wherein the virtual processor supported operators comprise a forward operator and a backward operator associated with the forward operator.
6. An apparatus for deep learning model distributed operations, the apparatus comprising:
at least one processor; and
a memory storing processor-executable program instructions that, when executed by the processor, perform the steps of:
registering a virtual processor in a device management list;
registering and writing operators supported by the virtual processor;
detecting hardware resources associated with the virtual processor, and determining respective allocation proportions of the hardware resources according to the computing power of the associated hardware resources;
configuring a deep learning model based on operators supported by the virtual processor, and assigning the virtual processor to the operators used in the deep learning model;
and the virtual processor allocates the input data of the corresponding operator to the hardware resource associated with the virtual processor according to the allocation proportion so as to carry out operation, and combines the operation results of the hardware resources into the output of the corresponding operator.
7. The apparatus of claim 6, wherein the hardware resources associated with the virtual processor comprise one or more of a CPU, a GPU, and an FPGA.
8. The apparatus of claim 7, wherein registering and writing operators supported by the virtual processor further comprises:
and compiling operation instructions for the CPU, the GPU and the FPGA and corresponding adaptive instructions in the same operator.
9. The apparatus of claim 6, wherein configuring a deep learning model based on operators supported by the virtual processor, and wherein specifying a virtual processor for operators used in the deep learning model further comprises:
and constructing the deep learning model based on a TensorFlow framework, and selecting corresponding operators supported by the virtual processor for each layer in the deep learning model.
10. The apparatus of claim 6, wherein the virtual processor supported operators comprise a forward operator and a backward operator related to the forward operator.
CN201911140560.6A 2019-11-20 2019-11-20 Deep learning model distributed operation method and device Withdrawn CN110866610A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911140560.6A CN110866610A (en) 2019-11-20 2019-11-20 Deep learning model distributed operation method and device
PCT/CN2020/104006 WO2021098269A1 (en) 2019-11-20 2020-07-24 Deep learning model distributed operation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911140560.6A CN110866610A (en) 2019-11-20 2019-11-20 Deep learning model distributed operation method and device

Publications (1)

Publication Number Publication Date
CN110866610A true CN110866610A (en) 2020-03-06

Family

ID=69655743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911140560.6A Withdrawn CN110866610A (en) 2019-11-20 2019-11-20 Deep learning model distributed operation method and device

Country Status (2)

Country Link
CN (1) CN110866610A (en)
WO (1) WO2021098269A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111736463A (en) * 2020-05-09 2020-10-02 刘炜 Adaptive deep learning control method based on operation platform
CN111858036A (en) * 2020-06-29 2020-10-30 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN112270399A (en) * 2020-09-29 2021-01-26 北京百度网讯科技有限公司 Operator registration processing method and device based on deep learning and electronic equipment
WO2021098269A1 (en) * 2019-11-20 2021-05-27 苏州浪潮智能科技有限公司 Deep learning model distributed operation method and apparatus
CN113469360A (en) * 2020-03-31 2021-10-01 杭州海康威视数字技术股份有限公司 Inference method and device
CN113918351A (en) * 2021-12-08 2022-01-11 之江实验室 Method and device for adapting to distributed training in deep learning framework and AI acceleration card
CN116306856A (en) * 2023-05-17 2023-06-23 之江实验室 Deep learning model deployment method and device based on search

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10069681B2 (en) * 2015-12-31 2018-09-04 Amazon Technologies, Inc. FPGA-enabled compute instances
US10523519B2 (en) * 2017-04-14 2019-12-31 Accenture Global Solutions Limited Comparative multi-forecasting analytics service stack for cloud computing resource allocation
US20180322386A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Fine-grain compute communication execution for deep learning frameworks
CN110866610A (en) * 2019-11-20 2020-03-06 苏州浪潮智能科技有限公司 Deep learning model distributed operation method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021098269A1 (en) * 2019-11-20 2021-05-27 苏州浪潮智能科技有限公司 Deep learning model distributed operation method and apparatus
CN113469360A (en) * 2020-03-31 2021-10-01 杭州海康威视数字技术股份有限公司 Inference method and device
CN113469360B (en) * 2020-03-31 2023-10-20 杭州海康威视数字技术股份有限公司 Reasoning method and device
CN111736463A (en) * 2020-05-09 2020-10-02 刘炜 Adaptive deep learning control method based on operation platform
CN111736463B (en) * 2020-05-09 2023-03-03 刘炜 Adaptive deep learning control method based on operation platform
CN111858036A (en) * 2020-06-29 2020-10-30 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN111858036B (en) * 2020-06-29 2022-06-10 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN112270399A (en) * 2020-09-29 2021-01-26 北京百度网讯科技有限公司 Operator registration processing method and device based on deep learning and electronic equipment
CN113918351A (en) * 2021-12-08 2022-01-11 之江实验室 Method and device for adapting to distributed training in deep learning framework and AI acceleration card
US11714995B2 (en) 2021-12-08 2023-08-01 Zhejiang Lab Method for distributed type training adaptation and apparatus in deep learning framework and AI accelerator card
CN116306856A (en) * 2023-05-17 2023-06-23 之江实验室 Deep learning model deployment method and device based on search
CN116306856B (en) * 2023-05-17 2023-09-05 之江实验室 Deep learning model deployment method and device based on search

Also Published As

Publication number Publication date
WO2021098269A1 (en) 2021-05-27

Similar Documents

Publication Publication Date Title
CN110866610A (en) Deep learning model distributed operation method and device
US11227216B2 (en) Batch processing in a neural network processor
US11816559B2 (en) Dilated convolution using systolic array
CN110908667A (en) Method and device for joint compilation of neural network and electronic equipment
EP3502975A1 (en) Methods and apparatus for model parallelism in artificial neural networks
CN110889439B (en) Image feature extraction method and device, electronic equipment and storage medium
US20210158131A1 (en) Hierarchical partitioning of operators
EP3857384B1 (en) Processing sequential inputs using neural network accelerators
CN108470211B (en) Method and device for realizing convolution calculation and computer storage medium
CN114201107A (en) Storage device, method for operating storage device, and electronic device
US11941528B2 (en) Neural network training in a distributed system
US20210326683A1 (en) Hardware circuit for accelerating neural network computations
CN110955390A (en) Data processing method and device and electronic equipment
KR20210023401A (en) Neural network computing method and system including the computing method
CN116431315B (en) Batch processing task processing method and device, electronic equipment and storage medium
Wang et al. SOLAR: Services-oriented learning architectures
CN116755878A (en) Program running method, apparatus, device, medium and program product
WO2023050807A1 (en) Data processing method, apparatus, and system, electronic device, and storage medium
CN114327856A (en) Data processing method and device, electronic equipment and storage medium
CN114021709B (en) Multi-FPGA data processing method and device, server and storage medium
CN111831333A (en) Instruction decomposition method and device for intelligent processor and electronic equipment
CN111026515B (en) State monitoring device, task scheduler and state monitoring method
KR20240063137A (en) Hardware accelerator-optimized group convolution-based neural network model
CN113704687B (en) Tensor calculation operation method, device and operation system
US20220012573A1 (en) Neural network accelerators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200306