CN116523045A - Deep learning reasoning simulator oriented to multi-core chip - Google Patents

Deep learning reasoning simulator oriented to multi-core chip Download PDF

Info

Publication number
CN116523045A
CN116523045A CN202310235465.4A CN202310235465A CN116523045A CN 116523045 A CN116523045 A CN 116523045A CN 202310235465 A CN202310235465 A CN 202310235465A CN 116523045 A CN116523045 A CN 116523045A
Authority
CN
China
Prior art keywords
reasoning
deep learning
chip
simulator
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310235465.4A
Other languages
Chinese (zh)
Other versions
CN116523045B (en
Inventor
汤昭荣
杨佳宁
毛旷
潘秋红
杨弢
叶茂伟
许慧卿
王颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310235465.4A priority Critical patent/CN116523045B/en
Publication of CN116523045A publication Critical patent/CN116523045A/en
Application granted granted Critical
Publication of CN116523045B publication Critical patent/CN116523045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a deep learning reasoning simulator oriented to a multi-core chip, which comprises the following components: the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation; the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table; the route generation layer is used for analyzing routes in operators and routes among operators according to the operation strategy of each operator in the model analysis table and generating a route file; the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture, layering the routing file and carrying out multi-process parallel simulation through the network-on-chip simulator to obtain the number of cycles required by each operator route; and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.

Description

Deep learning reasoning simulator oriented to multi-core chip
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-core chip-oriented deep learning reasoning simulator which is particularly suitable for deep learning reasoning on a multi-core chip.
Background
With the continuous popularization of research and application of deep learning, the deployment of a deep model on a multi-core chip is proposed. The multi-core chip has the advantages of low cost and high yield, and is indispensable to use a simulator for early exploration in order to use the multi-core chip for deep learning reasoning.
In the process of realizing the invention, the inventor discovers that the full-system simulator in the prior art has too slow simulation speed to effectively design iteration; the period accurate simulator lacks a method of directly deploying the deep learning model to the simulator.
Aiming at the problem of model deployment, a multi-chip reasoning deep learning model for a framework is needed.
Disclosure of Invention
Aiming at the defects of the prior art, the embodiment of the application aims to provide a deep learning reasoning simulator oriented to a multi-core chip.
According to a first aspect of embodiments of the present application, there is provided a deep learning reasoning simulator for a multicore chip, including:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
Further, the deep learning model is a deep neural network composed of several operators.
Further, the multi-chip architecture is used to describe the architecture of a multi-chip, which is a large chip composed of multiple chips, each of which contains a set of neural network processing units.
Further, the mapping strategy is used to describe how operators are mapped onto multi-die chips and how devices are assigned for computation.
Further, the model analysis table comprises an operator type, an input and output shape, a data type and an operation strategy of each operator.
Further, the routing file is a set of routes of all data packets in the multi-core chip, and each route of the data packets in the multi-core chip includes a sending time, a source address, a destination address and a data packet size.
Further, in the reasoning simulation layer, the routing file is divided into a plurality of parts, and processes with corresponding numbers are simulated simultaneously by using a network-on-chip simulator so as to perform reasoning simulation of the deep learning model on the multi-core chip.
Further, in the result calculation layer, for single batch reasoning, the calculation process of the cycle number is as follows:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
and adding the cycle numbers required by each stage of reasoning to obtain the cycle numbers.
Further, in the result calculation layer, for multi-batch reasoning, the calculation process of the cycle number is as follows:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
the phase with the longest period number time is taken as the main body part of the pipeline, the period number of the phase is multiplied by the batch number and the period number required by each phase reasoning is added, and the total period number of the multi-batch reasoning is obtained.
Further, in the result calculation layer, the average device utilization rate is an average value of proportions of devices in the sub-network of the device used in each operator reasoning.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the embodiment, the simulator automatically deploys the deep learning model on the multi-core chip through the mapping strategy, so that simulation reasoning of the deep learning model on specific hardware is realized; by simulating parallel reasoning of the deep learning model on the multi-core chip, a large amount of reference data can be provided for early system structure design, the chip development cost is saved, and the system development speed is accelerated.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow diagram of a multi-chip oriented deep learning reasoning simulator, according to an exemplary embodiment.
Fig. 2 is a diagram of a multi-die chip apparatus, shown according to an example embodiment.
FIG. 3 is an operator execution policy diagram that is illustrated in accordance with an example embodiment.
FIG. 4 is a diagram of a pipelined multi-batch reasoning diagram, shown according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
FIG. 1 is a flow diagram of a multi-chip oriented deep learning reasoning simulator, as shown in FIG. 1, according to an exemplary embodiment, which may include:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
According to the embodiment, the simulator automatically deploys the deep learning model on the multi-core chip through the mapping strategy, so that simulation reasoning of the deep learning model on specific hardware is realized; by simulating parallel reasoning of the deep learning model on the multi-core chip, a large amount of reference data can be provided for early system structure design, the chip development cost is saved, and the system development speed is accelerated.
The following description is made on the deep learning reasoning simulator for the multi-core chip provided by the application:
1. the configuration input layer inputs a deep learning model required by simulation, a multi-chip architecture and a mapping strategy to the model input layer, wherein:
(1) The deep learning model is a deep neural network composed of a plurality of operators.
(2) The multi-chip architecture is used to describe the architecture of a multi-chip, which is a large chip composed of a plurality of cores, each core contains a set of neural network processing units, as shown in fig. 2, and is a multi-chip with four cores, and 9 neural network processing units exist in each core.
(3) The mapping strategy is used to describe how operators are mapped onto multi-chip chips and how devices are allocated for computation. Wherein the mapping strategy is shown in table 1 below. And splitting the deep learning model with the stages as granularity, wherein each stage consists of one or more operators, and the model performs reasoning calculation with the stages as sequence. The chip architecture is abstracted into a device diagram, and the device diagram is segmented into a plurality of device subnets, each subnet can run a stage, each stage occupies one device subnet on the device diagram, the device subnets are determined by the device subnet starting point and the device subnet length and width, and as shown in fig. 2, stage 1 starts from (0, 0), and has a length of 4 and a width of 6. The operator operation policy represents the policy of each operator when the sub-network of the device is operated.
Table 1 mapping policy table
2. The model analysis layer analyzes the input deep learning model according to a mapping strategy table, and the model analysis table comprises operator types, input and output shapes, data types and operation strategies of each operator, wherein the operation strategies of the operators refer to strategies for reasonably distributing the operators to equipment subnets, so that the operator operation fully utilizes equipment.
Table 2 model resolution table
The operator operation strategy is shown in fig. 3, when two tensors are subjected to matrix multiplication, the tensors can be divided into a plurality of blocks to be respectively operated in a plurality of neural network processing units; when in transverse cutting, the two tensors obtained by cutting are respectively operated in two neural network processing units, and the two tensors obtained by calculation can be obtained by splicing operation; when cutting longitudinally, the two tensors obtained by calculation need to be added; tensors, whether spliced or added, need to be carried between the neural network processing units.
Therefore, different operation strategies reasonably distribute tensors to the neural network processing units on the equipment sub-network by adopting different cutting methods, and data handling among the neural network processing units is called intra-operator routing, and when one operator is completely operated and is switched to the next operator, data handling is also generated and is called inter-operator routing.
3. The route generation layer analyzes routes in operators and routes among operators according to an operator operation strategy in the model analysis table and generates a route file, wherein the route file is a set of routes of all data packets in the multi-core chip, and each data packet route consists of a sending time, a source address, a destination address and a data packet size.
As shown in table 3 below, the transmission time represents the time at which the packet is injected into the network on chip, the source address and destination address are represented by four-dimensional coordinates, the first two dimensions representing the coordinates of the core and the second two bits representing the coordinates of the processing unit within the core, the packet size being in flits.
Table 3 routing file
4. The reasoning simulation layer performs reasoning simulation of the deep learning model on the multi-core chip, and the specific content of the reasoning simulation is the routing process of the data packet on the network on chip, namely the number of cycles required by the reasoning is obtained by the routing file list. In order to accelerate the speed of reasoning simulation, the routing file is divided into a plurality of processes to be simultaneously simulated by using a network-on-chip simulator, and in specific implementation, the network-on-chip simulator with an open source can be used, for example, a booksim, a popnet, a gem5 and the like.
It should be noted that:
(1) Although the stages are executed in series, the inference simulation does not need to perform specific numerical operation, so that the routes of each stage can be simulated in parallel;
(2) Similarly, operators within the same phase can be simulated in parallel.
5. The result calculation layer sorts the operation data obtained by parallel simulation in the reasoning simulation layer to obtain the cycle number and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
5.1 the following is the calculation step of reasoning the simulation cycle number:
(1) Calculating the number of cycles required for each phase of reasoning, the number of cycles required for one phase being the sum of the cycles required for each operator of reasoning in the current phase, i.e
In the method, in the process of the invention,the number of cycles needed for reasoning in this stage, +.>The number of cycles needed for an operator reasoning, < +.>For the number of operators in this stage.
(2) The cycle numbers required by each stage of reasoning are added, namely the cycle numbers required by one time of reasoning are:
where t is the total number of cycles required for one inference and m is the number of stages.
And (3) for single batch reasoning, obtaining the cycle number through the steps (1) - (2).
(3) For multi-batch reasoning, the pipeline mode can be used for calculating the cycle number, the stage with the longest cycle number is taken as the main body part of the pipeline, the cycle number of the stage is multiplied by the batch number and the cycle number required by each stage of reasoning is added, and the total cycle number of the multi-batch reasoning is obtained.
In one embodiment, as shown in fig. 4, there are five total samples a, B, C, D, E that need to be inferred, the number of cycles required for each phase inference is different, and the total inference time is the sum of the number of cycles required for each phase inference plus the number of cycles required for phase 3 inference times 4 (four samples B, C, D, E). In summary, when the lot is B, the total cycle number is as follows:
where T is the total number of cycles required for multi-batch reasoning, the first term is the number of cycles of the first batch through the pipeline, and the second term is the number of cycles of the remaining batch.
5.2 mapping strategies do not necessarily fully exploit the computational power of a multi-core chip, and therefore require an average device utilization, which is the average of the proportions of devices in the sub-network of the device used in each operator reasoning, as shown in the following equation,
where n is the total number of operators,the ratio of the devices used in the operator reasoning to the sub-network of the devices.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims (10)

1. A multi-core chip-oriented deep learning reasoning simulator is characterized by comprising:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
2. The multi-kernel chip oriented deep learning reasoning simulator of claim 1 wherein the deep learning model is a deep neural network comprised of several operators.
3. The multi-chip oriented deep learning reasoning simulator of claim 1, wherein the multi-chip architecture is used to describe a multi-chip architecture that is a large chip composed of multiple chips, each chip containing a set of neural network processing units.
4. The multi-die oriented deep learning reasoning simulator of claim 1, wherein the mapping strategy is used to describe how operators are mapped onto multi-die dies and how devices are assigned for computation.
5. The multi-chip oriented deep learning reasoning simulator of claim 1, wherein the model parsing table includes an operator type, an input-output shape, a data type, and an operation strategy for each operator.
6. The multi-die chip oriented deep learning reasoning simulator of claim 1 wherein the routing file is a collection of all data packets routed in the multi-die chip, the routing of each data packet in the multi-die chip including a time of transmission, a source address, a destination address, and a data packet size.
7. The multi-chip-oriented deep learning reasoning simulator of claim 1, wherein in the reasoning simulation layer, the routing file is divided into a plurality of parts, and a corresponding number of processes are simulated simultaneously using a network-on-chip simulator to perform reasoning simulation of a deep learning model on a multi-chip.
8. The multi-chip oriented deep learning reasoning simulator of claim 1 wherein in the result calculation layer, for single batch reasoning, the number of cycles is calculated as:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
and adding the cycle numbers required by each stage of reasoning to obtain the cycle numbers.
9. The multi-chip oriented deep learning reasoning simulator of claim 1 wherein in the result calculation layer, for multi-batch reasoning, the number of cycles is calculated as:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
the phase with the longest period number time is taken as the main body part of the pipeline, the period number of the phase is multiplied by the batch number and the period number required by each phase reasoning is added, and the total period number of the multi-batch reasoning is obtained.
10. The multi-core chip oriented deep learning reasoning simulator of claim 1, wherein in the result calculation layer, the average device utilization is an average value of proportions of devices in the sub-network of the devices used in each operator reasoning.
CN202310235465.4A 2023-03-13 2023-03-13 Deep learning reasoning simulator oriented to multi-core chip Active CN116523045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310235465.4A CN116523045B (en) 2023-03-13 2023-03-13 Deep learning reasoning simulator oriented to multi-core chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310235465.4A CN116523045B (en) 2023-03-13 2023-03-13 Deep learning reasoning simulator oriented to multi-core chip

Publications (2)

Publication Number Publication Date
CN116523045A true CN116523045A (en) 2023-08-01
CN116523045B CN116523045B (en) 2023-11-07

Family

ID=87392950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310235465.4A Active CN116523045B (en) 2023-03-13 2023-03-13 Deep learning reasoning simulator oriented to multi-core chip

Country Status (1)

Country Link
CN (1) CN116523045B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236263A (en) * 2023-11-15 2023-12-15 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2204763A1 (en) * 2008-12-23 2010-07-07 Wolfgang Dipl.-Ing. Schmidt Learning computer
US20190243735A1 (en) * 2018-02-05 2019-08-08 Wuhan University Deep belief network feature extraction-based analogue circuit fault diagnosis method
CN110163233A (en) * 2018-02-11 2019-08-23 陕西爱尚物联科技有限公司 A method of so that machine is competent at more complex works
US20190287208A1 (en) * 2018-03-15 2019-09-19 TMRW Entertainment Europe S.A.R.L. Game engine and artificial intelligence engine on a chip
US20210110089A1 (en) * 2019-10-10 2021-04-15 Nvidia Corporation Generating computer simulations of manipulations of materials based on machine learning from measured statistics of observed manipulations
CN113449856A (en) * 2020-03-27 2021-09-28 华为技术有限公司 Control flow graph processing method and related equipment
CN113986234A (en) * 2021-09-19 2022-01-28 苏州浪潮智能科技有限公司 Cross-platform model reasoning method, system, storage medium and equipment
WO2022083536A1 (en) * 2020-10-21 2022-04-28 华为技术有限公司 Neural network construction method and apparatus
KR20220061827A (en) * 2020-11-06 2022-05-13 한국전자통신연구원 Adaptive deep learning inference apparatus and method in mobile edge computing
WO2022110446A1 (en) * 2020-11-30 2022-06-02 中国科学院深圳先进技术研究院 Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium
CN114580280A (en) * 2022-03-02 2022-06-03 北京市商汤科技开发有限公司 Model quantization method, device, apparatus, computer program and storage medium
CN115186821A (en) * 2022-09-13 2022-10-14 之江实验室 Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN115378881A (en) * 2022-07-08 2022-11-22 南京邮数通信息科技有限公司 Federal learning-based home router data flow identification method and identification framework
CN115460128A (en) * 2022-11-09 2022-12-09 之江实验室 Network-on-chip simulation system for multi-core particle combined chip
CN115600676A (en) * 2022-10-08 2023-01-13 浙江大华技术股份有限公司(Cn) Deep learning model reasoning method, device, equipment and storage medium
CN115658274A (en) * 2022-11-14 2023-01-31 之江实验室 Modular scheduling method and device for neural network reasoning in core grain and computing equipment

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2204763A1 (en) * 2008-12-23 2010-07-07 Wolfgang Dipl.-Ing. Schmidt Learning computer
US20190243735A1 (en) * 2018-02-05 2019-08-08 Wuhan University Deep belief network feature extraction-based analogue circuit fault diagnosis method
CN110163233A (en) * 2018-02-11 2019-08-23 陕西爱尚物联科技有限公司 A method of so that machine is competent at more complex works
US20190287208A1 (en) * 2018-03-15 2019-09-19 TMRW Entertainment Europe S.A.R.L. Game engine and artificial intelligence engine on a chip
US20210110089A1 (en) * 2019-10-10 2021-04-15 Nvidia Corporation Generating computer simulations of manipulations of materials based on machine learning from measured statistics of observed manipulations
CN113449856A (en) * 2020-03-27 2021-09-28 华为技术有限公司 Control flow graph processing method and related equipment
WO2022083536A1 (en) * 2020-10-21 2022-04-28 华为技术有限公司 Neural network construction method and apparatus
KR20220061827A (en) * 2020-11-06 2022-05-13 한국전자통신연구원 Adaptive deep learning inference apparatus and method in mobile edge computing
WO2022110446A1 (en) * 2020-11-30 2022-06-02 中国科学院深圳先进技术研究院 Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium
CN113986234A (en) * 2021-09-19 2022-01-28 苏州浪潮智能科技有限公司 Cross-platform model reasoning method, system, storage medium and equipment
CN114580280A (en) * 2022-03-02 2022-06-03 北京市商汤科技开发有限公司 Model quantization method, device, apparatus, computer program and storage medium
CN115378881A (en) * 2022-07-08 2022-11-22 南京邮数通信息科技有限公司 Federal learning-based home router data flow identification method and identification framework
CN115186821A (en) * 2022-09-13 2022-10-14 之江实验室 Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN115600676A (en) * 2022-10-08 2023-01-13 浙江大华技术股份有限公司(Cn) Deep learning model reasoning method, device, equipment and storage medium
CN115460128A (en) * 2022-11-09 2022-12-09 之江实验室 Network-on-chip simulation system for multi-core particle combined chip
CN115658274A (en) * 2022-11-14 2023-01-31 之江实验室 Modular scheduling method and device for neural network reasoning in core grain and computing equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
丁然;林建文;朱振华;刘弋波;: "一种类CPU的深度学习协处理器架构", 中国集成电路, no. 4 *
余鹏;万里红;霍宏;方涛;: "基于层次特征映射模型的目标识别", 高技术通讯, no. 04 *
王丽;郭振华;曹芳;高开;赵雅倩;赵坤;: "面向模型并行训练的模型拆分策略自动生成方法", 计算机工程与科学, no. 09 *
薛峰;方维维;: "EdgeMI:资源受限条件下深度学习多设备协同推理", 现代计算机, no. 20 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236263A (en) * 2023-11-15 2023-12-15 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment
CN117236263B (en) * 2023-11-15 2024-02-06 之江实验室 Multi-core interconnection simulation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN116523045B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN109993299A (en) Data training method and device, storage medium, electronic device
CN116523045B (en) Deep learning reasoning simulator oriented to multi-core chip
CN110502337B (en) Optimization system for shuffling stage in Hadoop MapReduce
CN112149047A (en) Data processing method and device, storage medium and electronic device
CN110297748A (en) The method, apparatus and computer readable storage medium of error are called in a kind of positioning
Yasudo et al. Performance estimation for exascale reconfigurable dataflow platforms
Lößer et al. Bottlemod: Modeling data flows and tasks for fast bottleneck analysis
CN116974765A (en) Storage management system of heterogeneous computer
Alhazov et al. On the number of nodes in universal networks of evolutionary processors
CN106844024A (en) The GPU/CPU dispatching methods and system of a kind of self study run time forecast model
Spillane et al. Temporal partitioning for partially-reconfigurable-field-programmable gate
CN115292044A (en) Data processing method and device, electronic equipment and storage medium
CN104598917B (en) A kind of support vector machine classifier IP kernel
Wabnig et al. Performance prediction of parallel programs
CN116415667B (en) Data processing method, machine learning framework and related equipment
JPH0769893B2 (en) Neural network simulator
CN116821200B (en) Visual analysis system and visual analysis method for artificial intelligent cloud data
CN117829242B (en) Model processing method and related equipment
WO2024128372A1 (en) Calculation unit, buffer, and data transfer optimization methodology for next-generation high-speed, lightweight object recognition fpga npu system
CN118349514A (en) Multi-core-grain-oriented data transmission method and system
CN111159523A (en) Spark-based parallel ant colony optimization community discovery method
Dussa-Zieger et al. Configuration, mapping and sequencing by genetic algorithms
Houstis et al. The algorithm mapper: a system for modeling and evaluating parallel applications/architecture pairs
Luo et al. A flexible transputer network for numerical applications
CN117744726A (en) Neural network overhead estimation method and system for core particle fault perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant