CN116523045A - Deep learning reasoning simulator oriented to multi-core chip - Google Patents
Deep learning reasoning simulator oriented to multi-core chip Download PDFInfo
- Publication number
- CN116523045A CN116523045A CN202310235465.4A CN202310235465A CN116523045A CN 116523045 A CN116523045 A CN 116523045A CN 202310235465 A CN202310235465 A CN 202310235465A CN 116523045 A CN116523045 A CN 116523045A
- Authority
- CN
- China
- Prior art keywords
- reasoning
- deep learning
- chip
- simulator
- simulation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 22
- 238000004088 simulation Methods 0.000 claims abstract description 42
- 238000013136 deep learning model Methods 0.000 claims abstract description 37
- 238000013507 mapping Methods 0.000 claims abstract description 18
- 238000013486 operation strategy Methods 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000013215 result calculation Methods 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000033772 system development Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a deep learning reasoning simulator oriented to a multi-core chip, which comprises the following components: the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation; the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table; the route generation layer is used for analyzing routes in operators and routes among operators according to the operation strategy of each operator in the model analysis table and generating a route file; the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture, layering the routing file and carrying out multi-process parallel simulation through the network-on-chip simulator to obtain the number of cycles required by each operator route; and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-core chip-oriented deep learning reasoning simulator which is particularly suitable for deep learning reasoning on a multi-core chip.
Background
With the continuous popularization of research and application of deep learning, the deployment of a deep model on a multi-core chip is proposed. The multi-core chip has the advantages of low cost and high yield, and is indispensable to use a simulator for early exploration in order to use the multi-core chip for deep learning reasoning.
In the process of realizing the invention, the inventor discovers that the full-system simulator in the prior art has too slow simulation speed to effectively design iteration; the period accurate simulator lacks a method of directly deploying the deep learning model to the simulator.
Aiming at the problem of model deployment, a multi-chip reasoning deep learning model for a framework is needed.
Disclosure of Invention
Aiming at the defects of the prior art, the embodiment of the application aims to provide a deep learning reasoning simulator oriented to a multi-core chip.
According to a first aspect of embodiments of the present application, there is provided a deep learning reasoning simulator for a multicore chip, including:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
Further, the deep learning model is a deep neural network composed of several operators.
Further, the multi-chip architecture is used to describe the architecture of a multi-chip, which is a large chip composed of multiple chips, each of which contains a set of neural network processing units.
Further, the mapping strategy is used to describe how operators are mapped onto multi-die chips and how devices are assigned for computation.
Further, the model analysis table comprises an operator type, an input and output shape, a data type and an operation strategy of each operator.
Further, the routing file is a set of routes of all data packets in the multi-core chip, and each route of the data packets in the multi-core chip includes a sending time, a source address, a destination address and a data packet size.
Further, in the reasoning simulation layer, the routing file is divided into a plurality of parts, and processes with corresponding numbers are simulated simultaneously by using a network-on-chip simulator so as to perform reasoning simulation of the deep learning model on the multi-core chip.
Further, in the result calculation layer, for single batch reasoning, the calculation process of the cycle number is as follows:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
and adding the cycle numbers required by each stage of reasoning to obtain the cycle numbers.
Further, in the result calculation layer, for multi-batch reasoning, the calculation process of the cycle number is as follows:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
the phase with the longest period number time is taken as the main body part of the pipeline, the period number of the phase is multiplied by the batch number and the period number required by each phase reasoning is added, and the total period number of the multi-batch reasoning is obtained.
Further, in the result calculation layer, the average device utilization rate is an average value of proportions of devices in the sub-network of the device used in each operator reasoning.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the embodiment, the simulator automatically deploys the deep learning model on the multi-core chip through the mapping strategy, so that simulation reasoning of the deep learning model on specific hardware is realized; by simulating parallel reasoning of the deep learning model on the multi-core chip, a large amount of reference data can be provided for early system structure design, the chip development cost is saved, and the system development speed is accelerated.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow diagram of a multi-chip oriented deep learning reasoning simulator, according to an exemplary embodiment.
Fig. 2 is a diagram of a multi-die chip apparatus, shown according to an example embodiment.
FIG. 3 is an operator execution policy diagram that is illustrated in accordance with an example embodiment.
FIG. 4 is a diagram of a pipelined multi-batch reasoning diagram, shown according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
FIG. 1 is a flow diagram of a multi-chip oriented deep learning reasoning simulator, as shown in FIG. 1, according to an exemplary embodiment, which may include:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
According to the embodiment, the simulator automatically deploys the deep learning model on the multi-core chip through the mapping strategy, so that simulation reasoning of the deep learning model on specific hardware is realized; by simulating parallel reasoning of the deep learning model on the multi-core chip, a large amount of reference data can be provided for early system structure design, the chip development cost is saved, and the system development speed is accelerated.
The following description is made on the deep learning reasoning simulator for the multi-core chip provided by the application:
1. the configuration input layer inputs a deep learning model required by simulation, a multi-chip architecture and a mapping strategy to the model input layer, wherein:
(1) The deep learning model is a deep neural network composed of a plurality of operators.
(2) The multi-chip architecture is used to describe the architecture of a multi-chip, which is a large chip composed of a plurality of cores, each core contains a set of neural network processing units, as shown in fig. 2, and is a multi-chip with four cores, and 9 neural network processing units exist in each core.
(3) The mapping strategy is used to describe how operators are mapped onto multi-chip chips and how devices are allocated for computation. Wherein the mapping strategy is shown in table 1 below. And splitting the deep learning model with the stages as granularity, wherein each stage consists of one or more operators, and the model performs reasoning calculation with the stages as sequence. The chip architecture is abstracted into a device diagram, and the device diagram is segmented into a plurality of device subnets, each subnet can run a stage, each stage occupies one device subnet on the device diagram, the device subnets are determined by the device subnet starting point and the device subnet length and width, and as shown in fig. 2, stage 1 starts from (0, 0), and has a length of 4 and a width of 6. The operator operation policy represents the policy of each operator when the sub-network of the device is operated.
Table 1 mapping policy table
2. The model analysis layer analyzes the input deep learning model according to a mapping strategy table, and the model analysis table comprises operator types, input and output shapes, data types and operation strategies of each operator, wherein the operation strategies of the operators refer to strategies for reasonably distributing the operators to equipment subnets, so that the operator operation fully utilizes equipment.
Table 2 model resolution table
The operator operation strategy is shown in fig. 3, when two tensors are subjected to matrix multiplication, the tensors can be divided into a plurality of blocks to be respectively operated in a plurality of neural network processing units; when in transverse cutting, the two tensors obtained by cutting are respectively operated in two neural network processing units, and the two tensors obtained by calculation can be obtained by splicing operation; when cutting longitudinally, the two tensors obtained by calculation need to be added; tensors, whether spliced or added, need to be carried between the neural network processing units.
Therefore, different operation strategies reasonably distribute tensors to the neural network processing units on the equipment sub-network by adopting different cutting methods, and data handling among the neural network processing units is called intra-operator routing, and when one operator is completely operated and is switched to the next operator, data handling is also generated and is called inter-operator routing.
3. The route generation layer analyzes routes in operators and routes among operators according to an operator operation strategy in the model analysis table and generates a route file, wherein the route file is a set of routes of all data packets in the multi-core chip, and each data packet route consists of a sending time, a source address, a destination address and a data packet size.
As shown in table 3 below, the transmission time represents the time at which the packet is injected into the network on chip, the source address and destination address are represented by four-dimensional coordinates, the first two dimensions representing the coordinates of the core and the second two bits representing the coordinates of the processing unit within the core, the packet size being in flits.
Table 3 routing file
4. The reasoning simulation layer performs reasoning simulation of the deep learning model on the multi-core chip, and the specific content of the reasoning simulation is the routing process of the data packet on the network on chip, namely the number of cycles required by the reasoning is obtained by the routing file list. In order to accelerate the speed of reasoning simulation, the routing file is divided into a plurality of processes to be simultaneously simulated by using a network-on-chip simulator, and in specific implementation, the network-on-chip simulator with an open source can be used, for example, a booksim, a popnet, a gem5 and the like.
It should be noted that:
(1) Although the stages are executed in series, the inference simulation does not need to perform specific numerical operation, so that the routes of each stage can be simulated in parallel;
(2) Similarly, operators within the same phase can be simulated in parallel.
5. The result calculation layer sorts the operation data obtained by parallel simulation in the reasoning simulation layer to obtain the cycle number and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
5.1 the following is the calculation step of reasoning the simulation cycle number:
(1) Calculating the number of cycles required for each phase of reasoning, the number of cycles required for one phase being the sum of the cycles required for each operator of reasoning in the current phase, i.e
In the method, in the process of the invention,the number of cycles needed for reasoning in this stage, +.>The number of cycles needed for an operator reasoning, < +.>For the number of operators in this stage.
(2) The cycle numbers required by each stage of reasoning are added, namely the cycle numbers required by one time of reasoning are:
where t is the total number of cycles required for one inference and m is the number of stages.
And (3) for single batch reasoning, obtaining the cycle number through the steps (1) - (2).
(3) For multi-batch reasoning, the pipeline mode can be used for calculating the cycle number, the stage with the longest cycle number is taken as the main body part of the pipeline, the cycle number of the stage is multiplied by the batch number and the cycle number required by each stage of reasoning is added, and the total cycle number of the multi-batch reasoning is obtained.
In one embodiment, as shown in fig. 4, there are five total samples a, B, C, D, E that need to be inferred, the number of cycles required for each phase inference is different, and the total inference time is the sum of the number of cycles required for each phase inference plus the number of cycles required for phase 3 inference times 4 (four samples B, C, D, E). In summary, when the lot is B, the total cycle number is as follows:
where T is the total number of cycles required for multi-batch reasoning, the first term is the number of cycles of the first batch through the pipeline, and the second term is the number of cycles of the remaining batch.
5.2 mapping strategies do not necessarily fully exploit the computational power of a multi-core chip, and therefore require an average device utilization, which is the average of the proportions of devices in the sub-network of the device used in each operator reasoning, as shown in the following equation,
where n is the total number of operators,the ratio of the devices used in the operator reasoning to the sub-network of the devices.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.
Claims (10)
1. A multi-core chip-oriented deep learning reasoning simulator is characterized by comprising:
the configuration input layer is used for acquiring a deep learning model, a multi-chip architecture and a mapping strategy required by simulation;
the model analysis layer is used for analyzing the deep learning model according to the mapping strategy to obtain a model analysis table, wherein the model analysis table describes the operation strategy of each operator in the deep learning model;
the route generation layer is used for analyzing the routes in the operators and the routes among the operators according to the operation strategy of each operator in the model analysis table and generating a route file;
the reasoning simulation layer is used for carrying out reasoning simulation of the deep learning model on the multi-core chip described by the multi-core chip architecture by utilizing the routing file to obtain the number of cycles required by each operator route;
and the result calculation layer is used for carrying out arrangement calculation on the operator routing cycle numbers obtained by parallel simulation in the reasoning simulation layer to obtain the cycle numbers and average equipment utilization rate of the deep learning model reasoning simulation on the multi-core chip.
2. The multi-kernel chip oriented deep learning reasoning simulator of claim 1 wherein the deep learning model is a deep neural network comprised of several operators.
3. The multi-chip oriented deep learning reasoning simulator of claim 1, wherein the multi-chip architecture is used to describe a multi-chip architecture that is a large chip composed of multiple chips, each chip containing a set of neural network processing units.
4. The multi-die oriented deep learning reasoning simulator of claim 1, wherein the mapping strategy is used to describe how operators are mapped onto multi-die dies and how devices are assigned for computation.
5. The multi-chip oriented deep learning reasoning simulator of claim 1, wherein the model parsing table includes an operator type, an input-output shape, a data type, and an operation strategy for each operator.
6. The multi-die chip oriented deep learning reasoning simulator of claim 1 wherein the routing file is a collection of all data packets routed in the multi-die chip, the routing of each data packet in the multi-die chip including a time of transmission, a source address, a destination address, and a data packet size.
7. The multi-chip-oriented deep learning reasoning simulator of claim 1, wherein in the reasoning simulation layer, the routing file is divided into a plurality of parts, and a corresponding number of processes are simulated simultaneously using a network-on-chip simulator to perform reasoning simulation of a deep learning model on a multi-chip.
8. The multi-chip oriented deep learning reasoning simulator of claim 1 wherein in the result calculation layer, for single batch reasoning, the number of cycles is calculated as:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
and adding the cycle numbers required by each stage of reasoning to obtain the cycle numbers.
9. The multi-chip oriented deep learning reasoning simulator of claim 1 wherein in the result calculation layer, for multi-batch reasoning, the number of cycles is calculated as:
calculating the cycle number required by each stage of reasoning, wherein the cycle number required by one stage is the sum of the cycle numbers required by each operator reasoning in the current stage;
the phase with the longest period number time is taken as the main body part of the pipeline, the period number of the phase is multiplied by the batch number and the period number required by each phase reasoning is added, and the total period number of the multi-batch reasoning is obtained.
10. The multi-core chip oriented deep learning reasoning simulator of claim 1, wherein in the result calculation layer, the average device utilization is an average value of proportions of devices in the sub-network of the devices used in each operator reasoning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310235465.4A CN116523045B (en) | 2023-03-13 | 2023-03-13 | Deep learning reasoning simulator oriented to multi-core chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310235465.4A CN116523045B (en) | 2023-03-13 | 2023-03-13 | Deep learning reasoning simulator oriented to multi-core chip |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116523045A true CN116523045A (en) | 2023-08-01 |
CN116523045B CN116523045B (en) | 2023-11-07 |
Family
ID=87392950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310235465.4A Active CN116523045B (en) | 2023-03-13 | 2023-03-13 | Deep learning reasoning simulator oriented to multi-core chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116523045B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236263A (en) * | 2023-11-15 | 2023-12-15 | 之江实验室 | Multi-core interconnection simulation method and device, storage medium and electronic equipment |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2204763A1 (en) * | 2008-12-23 | 2010-07-07 | Wolfgang Dipl.-Ing. Schmidt | Learning computer |
US20190243735A1 (en) * | 2018-02-05 | 2019-08-08 | Wuhan University | Deep belief network feature extraction-based analogue circuit fault diagnosis method |
CN110163233A (en) * | 2018-02-11 | 2019-08-23 | 陕西爱尚物联科技有限公司 | A method of so that machine is competent at more complex works |
US20190287208A1 (en) * | 2018-03-15 | 2019-09-19 | TMRW Entertainment Europe S.A.R.L. | Game engine and artificial intelligence engine on a chip |
US20210110089A1 (en) * | 2019-10-10 | 2021-04-15 | Nvidia Corporation | Generating computer simulations of manipulations of materials based on machine learning from measured statistics of observed manipulations |
CN113449856A (en) * | 2020-03-27 | 2021-09-28 | 华为技术有限公司 | Control flow graph processing method and related equipment |
CN113986234A (en) * | 2021-09-19 | 2022-01-28 | 苏州浪潮智能科技有限公司 | Cross-platform model reasoning method, system, storage medium and equipment |
WO2022083536A1 (en) * | 2020-10-21 | 2022-04-28 | 华为技术有限公司 | Neural network construction method and apparatus |
KR20220061827A (en) * | 2020-11-06 | 2022-05-13 | 한국전자통신연구원 | Adaptive deep learning inference apparatus and method in mobile edge computing |
WO2022110446A1 (en) * | 2020-11-30 | 2022-06-02 | 中国科学院深圳先进技术研究院 | Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium |
CN114580280A (en) * | 2022-03-02 | 2022-06-03 | 北京市商汤科技开发有限公司 | Model quantization method, device, apparatus, computer program and storage medium |
CN115186821A (en) * | 2022-09-13 | 2022-10-14 | 之江实验室 | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment |
CN115378881A (en) * | 2022-07-08 | 2022-11-22 | 南京邮数通信息科技有限公司 | Federal learning-based home router data flow identification method and identification framework |
CN115460128A (en) * | 2022-11-09 | 2022-12-09 | 之江实验室 | Network-on-chip simulation system for multi-core particle combined chip |
CN115600676A (en) * | 2022-10-08 | 2023-01-13 | 浙江大华技术股份有限公司(Cn) | Deep learning model reasoning method, device, equipment and storage medium |
CN115658274A (en) * | 2022-11-14 | 2023-01-31 | 之江实验室 | Modular scheduling method and device for neural network reasoning in core grain and computing equipment |
-
2023
- 2023-03-13 CN CN202310235465.4A patent/CN116523045B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2204763A1 (en) * | 2008-12-23 | 2010-07-07 | Wolfgang Dipl.-Ing. Schmidt | Learning computer |
US20190243735A1 (en) * | 2018-02-05 | 2019-08-08 | Wuhan University | Deep belief network feature extraction-based analogue circuit fault diagnosis method |
CN110163233A (en) * | 2018-02-11 | 2019-08-23 | 陕西爱尚物联科技有限公司 | A method of so that machine is competent at more complex works |
US20190287208A1 (en) * | 2018-03-15 | 2019-09-19 | TMRW Entertainment Europe S.A.R.L. | Game engine and artificial intelligence engine on a chip |
US20210110089A1 (en) * | 2019-10-10 | 2021-04-15 | Nvidia Corporation | Generating computer simulations of manipulations of materials based on machine learning from measured statistics of observed manipulations |
CN113449856A (en) * | 2020-03-27 | 2021-09-28 | 华为技术有限公司 | Control flow graph processing method and related equipment |
WO2022083536A1 (en) * | 2020-10-21 | 2022-04-28 | 华为技术有限公司 | Neural network construction method and apparatus |
KR20220061827A (en) * | 2020-11-06 | 2022-05-13 | 한국전자통신연구원 | Adaptive deep learning inference apparatus and method in mobile edge computing |
WO2022110446A1 (en) * | 2020-11-30 | 2022-06-02 | 中国科学院深圳先进技术研究院 | Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium |
CN113986234A (en) * | 2021-09-19 | 2022-01-28 | 苏州浪潮智能科技有限公司 | Cross-platform model reasoning method, system, storage medium and equipment |
CN114580280A (en) * | 2022-03-02 | 2022-06-03 | 北京市商汤科技开发有限公司 | Model quantization method, device, apparatus, computer program and storage medium |
CN115378881A (en) * | 2022-07-08 | 2022-11-22 | 南京邮数通信息科技有限公司 | Federal learning-based home router data flow identification method and identification framework |
CN115186821A (en) * | 2022-09-13 | 2022-10-14 | 之江实验室 | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment |
CN115600676A (en) * | 2022-10-08 | 2023-01-13 | 浙江大华技术股份有限公司(Cn) | Deep learning model reasoning method, device, equipment and storage medium |
CN115460128A (en) * | 2022-11-09 | 2022-12-09 | 之江实验室 | Network-on-chip simulation system for multi-core particle combined chip |
CN115658274A (en) * | 2022-11-14 | 2023-01-31 | 之江实验室 | Modular scheduling method and device for neural network reasoning in core grain and computing equipment |
Non-Patent Citations (4)
Title |
---|
丁然;林建文;朱振华;刘弋波;: "一种类CPU的深度学习协处理器架构", 中国集成电路, no. 4 * |
余鹏;万里红;霍宏;方涛;: "基于层次特征映射模型的目标识别", 高技术通讯, no. 04 * |
王丽;郭振华;曹芳;高开;赵雅倩;赵坤;: "面向模型并行训练的模型拆分策略自动生成方法", 计算机工程与科学, no. 09 * |
薛峰;方维维;: "EdgeMI:资源受限条件下深度学习多设备协同推理", 现代计算机, no. 20 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236263A (en) * | 2023-11-15 | 2023-12-15 | 之江实验室 | Multi-core interconnection simulation method and device, storage medium and electronic equipment |
CN117236263B (en) * | 2023-11-15 | 2024-02-06 | 之江实验室 | Multi-core interconnection simulation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN116523045B (en) | 2023-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993299A (en) | Data training method and device, storage medium, electronic device | |
CN116523045B (en) | Deep learning reasoning simulator oriented to multi-core chip | |
CN110502337B (en) | Optimization system for shuffling stage in Hadoop MapReduce | |
CN112149047A (en) | Data processing method and device, storage medium and electronic device | |
CN110297748A (en) | The method, apparatus and computer readable storage medium of error are called in a kind of positioning | |
Yasudo et al. | Performance estimation for exascale reconfigurable dataflow platforms | |
Lößer et al. | Bottlemod: Modeling data flows and tasks for fast bottleneck analysis | |
CN116974765A (en) | Storage management system of heterogeneous computer | |
Alhazov et al. | On the number of nodes in universal networks of evolutionary processors | |
CN106844024A (en) | The GPU/CPU dispatching methods and system of a kind of self study run time forecast model | |
Spillane et al. | Temporal partitioning for partially-reconfigurable-field-programmable gate | |
CN115292044A (en) | Data processing method and device, electronic equipment and storage medium | |
CN104598917B (en) | A kind of support vector machine classifier IP kernel | |
Wabnig et al. | Performance prediction of parallel programs | |
CN116415667B (en) | Data processing method, machine learning framework and related equipment | |
JPH0769893B2 (en) | Neural network simulator | |
CN116821200B (en) | Visual analysis system and visual analysis method for artificial intelligent cloud data | |
CN117829242B (en) | Model processing method and related equipment | |
WO2024128372A1 (en) | Calculation unit, buffer, and data transfer optimization methodology for next-generation high-speed, lightweight object recognition fpga npu system | |
CN118349514A (en) | Multi-core-grain-oriented data transmission method and system | |
CN111159523A (en) | Spark-based parallel ant colony optimization community discovery method | |
Dussa-Zieger et al. | Configuration, mapping and sequencing by genetic algorithms | |
Houstis et al. | The algorithm mapper: a system for modeling and evaluating parallel applications/architecture pairs | |
Luo et al. | A flexible transputer network for numerical applications | |
CN117744726A (en) | Neural network overhead estimation method and system for core particle fault perception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |