WO2023017884A1 - Procédé et système de prédiction de latence de modèle d'apprentissage profond par dispositif - Google Patents

Procédé et système de prédiction de latence de modèle d'apprentissage profond par dispositif Download PDF

Info

Publication number
WO2023017884A1
WO2023017884A1 PCT/KR2021/011006 KR2021011006W WO2023017884A1 WO 2023017884 A1 WO2023017884 A1 WO 2023017884A1 KR 2021011006 W KR2021011006 W KR 2021011006W WO 2023017884 A1 WO2023017884 A1 WO 2023017884A1
Authority
WO
WIPO (PCT)
Prior art keywords
latency
neural network
deep learning
network layer
learning model
Prior art date
Application number
PCT/KR2021/011006
Other languages
English (en)
Korean (ko)
Inventor
김정호
김민수
김태호
Original Assignee
주식회사 노타
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 노타 filed Critical 주식회사 노타
Publication of WO2023017884A1 publication Critical patent/WO2023017884A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the description below relates to a method and system for predicting the latency of a deep learning model on a device.
  • AI artificial intelligence
  • a latency prediction method and system capable of predicting latency of a deep learning model on an on-device without the need to set up an actual edge device and build a pipeline.
  • the generating of the latency lookup table may include information on the single neural network layer deep learning model and the actually measured latency based on latency actually measured in the edge device for the single neural network layer deep learning model. It may be characterized in that the latency lookup table, which is stored in association with each other, is created.
  • the generating of the latency lookup table may include constructing a single neural network layer deep learning model; Compiling the single neural network layer deep learning model according to the edge device; transmitting the compiled single neural network layer deep learning model to the edge device; receiving latency actually measured by the edge device for the compiled single neural network layer deep learning model; and storing the received latency in the latency lookup table in association with information on the single neural network layer deep learning model.
  • the latency lookup table may be generated to store latency for each of a plurality of single neural network layer deep learning models for each type of edge device.
  • the learning may include pre-processing the latency value of the latency lookup table and the output value of the latency predictor so that the latency predictor does not output a negative value.
  • the predicting of the on-device latency may include generating a single neural network layer deep learning model by decomposing the input deep learning model into single neural network layer units; generating a prediction value of latency in the edge device by inputting each of the single neural network layer deep learning models to the learned latency predictor; and calculating latency of the input deep learning model by adding predicted latency values of each of the single neural network layer deep learning models.
  • the latency predictor may include a regression analysis model using a boosting algorithm.
  • a computer program stored in a computer readable recording medium is provided in combination with a computer device to execute the method on the computer device.
  • a computer readable recording medium having a program for executing the method in a computer device is recorded.
  • It includes at least one processor implemented to execute instructions readable by a computer device, and by the at least one processor, a latency lookup table including information of a single neural network layer and latency information on an edge device of a single neural network layer is generated. and using the latency lookup table, the latency predictor is trained to predict the latency of the input neural network layer, and the on-device latency of the input deep learning model is predicted using the learned latency predictor. It provides a computer device characterized in that.
  • FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating an example of an internal configuration of a latency prediction system according to an embodiment of the present invention.
  • FIG. 4 is a flowchart illustrating an example of a latency prediction method according to an embodiment of the present invention.
  • FIG. 5 is a diagram illustrating an example of a latency lookup table according to an embodiment of the present invention.
  • FIG. 6 is a diagram illustrating an example of a process of generating learning data according to an embodiment of the present invention.
  • FIG. 7 is a diagram illustrating an example of a process of predicting latency using a latency predictor according to an embodiment of the present invention.
  • a latency prediction system may be implemented by at least one computer device.
  • a computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the latency prediction method according to the embodiments of the present invention under the control of the driven computer program.
  • the above-described computer program may be combined with a computer device and stored in a computer readable recording medium to execute a latency prediction method on a computer.
  • FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
  • the network environment of FIG. 1 shows an example including a plurality of electronic devices 110 , 120 , 130 , and 140 , a plurality of servers 150 and 160 , and a network 170 .
  • 1 is an example for explanation of the invention, and the number of electronic devices or servers is not limited as shown in FIG. 1 .
  • the network environment of FIG. 1 only describes one example of environments applicable to the present embodiments, and the environment applicable to the present embodiments is not limited to the network environment of FIG. 1 .
  • the plurality of electronic devices 110, 120, 130, and 140 may be fixed terminals implemented as computer devices or mobile terminals.
  • Examples of the plurality of electronic devices 110, 120, 130, and 140 include a smart phone, a mobile phone, a navigation device, a computer, a laptop computer, a digital broadcast terminal, a personal digital assistant (PDA), and a portable multimedia player (PMP). ), and tablet PCs.
  • FIG. 1 shows the shape of a smartphone as an example of the electronic device 110, but in the embodiments of the present invention, the electronic device 110 substantially uses a wireless or wired communication method to transmit other information via the network 170. It may refer to one of various physical computer devices capable of communicating with the electronic devices 120 , 130 , and 140 and/or the servers 150 and 160 .
  • the communication method is not limited, and short-distance wireless communication between devices as well as a communication method utilizing a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, and a broadcasting network) that the network 170 may include may also be included.
  • a communication network eg, a mobile communication network, a wired Internet, a wireless Internet, and a broadcasting network
  • the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , one or more arbitrary networks such as the Internet.
  • PAN personal area network
  • LAN local area network
  • CAN campus area network
  • MAN metropolitan area network
  • WAN wide area network
  • BBN broadband network
  • the network 170 may include any one or more of network topologies including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, and the like. Not limited.
  • Each of the servers 150 and 160 communicates with the plurality of electronic devices 110, 120, 130, and 140 through the network 170 to provide commands, codes, files, contents, services, and the like, or a computer device or a plurality of computers.
  • the server 150 provides a service (eg, an instant messaging service, a social network service, a payment service, a virtual exchange) to a plurality of electronic devices 110, 120, 130, and 140 connected through the network 170.
  • service eg, an instant messaging service, a social network service, a payment service, a virtual exchange
  • FIG. 2 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention.
  • Each of the plurality of electronic devices 110 , 120 , 130 , and 140 or each of the servers 150 and 160 described above may be implemented by the computer device 200 shown in FIG. 2 .
  • the computer device 200 may include a memory 210, a processor 220, a communication interface 230, and an input/output interface 240.
  • the memory 210 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.
  • RAM random access memory
  • ROM read only memory
  • a permanent mass storage device such as a disk drive.
  • a non-perishable mass storage device such as a ROM and a disk drive may be included in the computer device 200 as a separate permanent storage device distinct from the memory 210 .
  • an operating system and at least one program code may be stored in the memory 210 . These software components may be loaded into the memory 210 from a computer-readable recording medium separate from the memory 210 .
  • the separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card.
  • software components may be loaded into the memory 210 through the communication interface 230 rather than a computer-readable recording medium.
  • software components may be loaded into memory 210 of computer device 200 based on a computer program installed by files received over network 170 .
  • the processor 220 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 220 by memory 210 or communication interface 230 . For example, processor 220 may be configured to execute received instructions according to program codes stored in a recording device such as memory 210 .
  • the communication interface 230 may provide a function for the computer device 200 to communicate with other devices (eg, storage devices described above) through the network 170 .
  • a request, command, data, file, etc. generated according to a program code stored in a recording device such as the memory 210 by the processor 220 of the computer device 200 is controlled by the communication interface 230 to the network ( 170) to other devices.
  • signals, commands, data, files, etc. from other devices may be received by the computer device 200 through the communication interface 230 of the computer device 200 via the network 170 .
  • Signals, commands, data, etc. received through the communication interface 230 may be transferred to the processor 220 or the memory 210, and files, etc. may be stored as storage media that the computer device 200 may further include (described above). permanent storage).
  • the input/output interface 240 may be a means for interface with the input/output device 250 .
  • the input device may include a device such as a microphone, keyboard, or mouse
  • the output device may include a device such as a display or speaker.
  • the input/output interface 240 may be a means for interface with a device in which functions for input and output are integrated into one, such as a touch screen.
  • At least one of the input/output devices 250 may be configured as one device with the computer device 200 . For example, like a smart phone, a touch screen, a microphone, a speaker, and the like may be implemented in a form included in the computer device 200 .
  • computer device 200 may include fewer or more elements than those of FIG. 2 . However, there is no need to clearly show most of the prior art components.
  • the computer device 200 may be implemented to include at least some of the aforementioned input/output devices 250 or may further include other components such as a transceiver and a database.
  • Latency prediction system may predict latency in a specific edge device of a corresponding model based on information of an arbitrary deep learning model input.
  • FIG. 3 is a block diagram illustrating an example of an internal configuration of a latency prediction system according to an embodiment of the present invention
  • FIG. 4 is a flowchart illustrating an example of a latency prediction method according to an embodiment of the present invention.
  • the latency prediction system 300 may be implemented by at least one computer device 200 .
  • the latency prediction system 300 of FIG. 3 may include a latency lookup table generator 310 , a latency predictor learner 320 , and a latency predictor 330 .
  • the latency lookup table generator 310, the latency predictor learner 320, and the latency predictor 330 are controlled by the processor 220 of the computer device 200 implementing the latency prediction system 300 under the control of the computer program.
  • the processor 220 of the computer device 200 may be implemented to execute a control instruction according to an operating system code or at least one computer program code included in the memory 210 .
  • the processor 220 controls the computer device 200 so that the computer device 200 performs steps 410 to 430 included in the method of FIG. 4 according to control commands provided by codes stored in the computer device 200. can control.
  • the latency lookup table generator 310, the latency predictor learner 320, and the latency predictor 330 may be used as functional representations of the processor 220 for performing each of the steps 410 to 430.
  • the latency lookup table generator 310 may generate a latency lookup table including information of a single neural network layer and latency information on an edge device of the single neural network layer.
  • the latency lookup table generator 310 may create a latency lookup table containing latency information of the corresponding edge device for each of various single neural network layers.
  • the latency lookup table generator 310 may configure a deep learning model of a single neural network layer to be used as an input of a latency predictor. In this case, the prediction performance of the latency predictor may be improved by configuring more diverse single neural network layer deep learning models.
  • the latency lookup table generation unit 310 may perform a compilation process so that the configured single neural network layer deep learning model can be driven in a predetermined edge device. In this case, the latency lookup table generation unit 310 may transmit the compiled single neural network layer deep learning model to a corresponding edge device to obtain an actual latency.
  • the measured actual latency value may be transmitted to the latency prediction system 300 .
  • the latency lookup table generator 310 may construct the latency lookup table by adding an actual latency value transmitted in association with information on the corresponding single neural network layer deep learning model to the latency lookup table.
  • the latency lookup table generator 310 may generate a latency lookup table by measuring actual latency in an edge device for each of various single neural network layer deep learning models.
  • the generated latency lookup table may be used to learn a latency predictor.
  • the latency predictor learning unit 320 may train the latency predictor so that the latency predictor predicts the latency of the input neural network layer using the latency lookup table.
  • the latency predictor may be a regression analysis model using a boosting algorithm.
  • the boosting algorithm is an algorithm that improves prediction performance while sequentially learning and predicting several weak learners.
  • the gradient boosting algorithm uses a method of reducing an error between an actual value and a predicted value in a previous model using a gradient, and is known to exhibit high performance.
  • the latency predictor learning unit 320 may start learning the latency predictor, which is a regression analysis model using a boosting algorithm. In this case, the latency predictor learning unit 320 may learn a latency predictor to predict latency of a corresponding model based on information of a single neural network layer deep learning model in the latency lookup table. Meanwhile, the latency predictor learning unit 320 may prevent the latency predictor from outputting a negative value by pre-processing the latency value of the latency lookup table and the output value of the latency predictor.
  • the latency predictor 330 may predict the on-device latency of the input deep learning model using the learned latency predictor.
  • the latency predictor 330 may generate a single neural network layer deep learning model by decomposing the input deep learning model into single neural network layer units. Then, the latency predictor 330 may input each decomposed single neural network layer deep learning model to the learned latency predictor.
  • the latency predictor may predict and output latency in a specific type of edge device for an input single neural network layer deep learning model. In this case, the latency predictor 330 may predict the on-device latency of the input deep learning model by adding the latencies output by the latency predictor.
  • the latency prediction system 300 transfers the input deep learning model to an actual edge device, so that the on-device latency of the deep learning model can be predicted without measuring the actual latency.
  • the latency predictor is a regression analysis model, it shows high predictive power even for information not used in the learning process, and thus it is possible to predict on-device latency with high reliability for various input deep learning models. .
  • a latency lookup table is generated for each of the various types of edge devices. and latency predictors learned for each of various types of edge devices may be generated.
  • the latency predictor 330 calculates on-device latency according to the type of edge device for the deep learning model entered as an input. be able to predict
  • the latency prediction system 300 may create a latency lookup table to store latency for each of a plurality of single neural network layer deep learning models for each type of edge device.
  • the latency lookup table may include deep learning model information of a single neural network layer and latency actually measured by the edge device A for the single neural network layer.
  • the deep learning model information of a single neural network layer may include information about what kind of deep learning model the corresponding neural network layer is and what layer it is.
  • the latency lookup table may be used as learning data for a latency predictor in the future by storing actual latencies in the edge device A for each of the neural network layers in various types of deep learning models in association with each other.
  • the latency prediction system 300 or the latency lookup table generation unit 310 compiles the single neural network layer deep learning model 610 according to the edge device 630 using the compiler 620, and compiles for the edge device 630.
  • a single neural network layer deep learning model 640 can be created. Thereafter, the latency prediction system 300 or the latency lookup table generator 310 calculates the actual latency of the compiled single neural network layer deep learning model 640 at the edge device 630, and the compiled single neural network layer deep learning model. 640 may be transmitted to the edge device 630 .
  • the edge device 630 may measure actual latency for the compiled single neural network layer deep learning model 640 .
  • the measured latency may be transmitted to the latency prediction system 300, and the latency prediction system 300 or the latency lookup table generator 310 may store the transmitted latency in the latency lookup table 650.
  • the latency may be stored in the latency lookup table 650 in association with information on the corresponding single neural network layer deep learning model 610 .
  • the latency lookup table 650 can be used as training data of the latency predictor.
  • a latency predictor can be trained to output a latency value for a particular single neural network layer deep learning model.
  • the on-device latency of the deep learning model can be predicted using the learned latency predictor.
  • the latency prediction system 300 or the latency predictor 330 separates the input deep learning model 710 by neural network layer to obtain a plurality of neural network layers 720 .
  • Each of the plurality of neural network layers 720 may be input to the latency predictor 730, and latencies 740 for the plurality of neural network layers 720 may be output.
  • the sum of the output latencies 740 may be calculated as the latency 750 of the deep learning model 710 .
  • instances of the latency predictor 730 are applied in parallel to each of the plurality of neural network layers 720 .
  • the system or device described above may be implemented as a hardware component or a combination of hardware components and software components.
  • devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions.
  • the processing device may run an operating system (OS) and one or more software applications running on the operating system.
  • a processing device may also access, store, manipulate, process, and generate data in response to execution of software.
  • the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include.
  • a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.
  • Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device.
  • Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device.
  • can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner.
  • Software and data may be stored on one or more computer readable media.
  • the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium.
  • the computer readable medium may include program instructions, data files, data structures, etc. alone or in combination.
  • the medium may continuously store programs executable by a computer or temporarily store them for execution or download.
  • the medium may be various recording means or storage means in the form of a single or combined hardware, but is not limited to a medium directly connected to a certain computer system, and may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROM and DVD, magneto-optical media such as floptical disks, and ROM, RAM, flash memory, etc.
  • examples of other media include recording media or storage media managed by an app store that distributes applications, a site that supplies or distributes various other software, and a server.
  • Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un système de prédiction de latence d'un modèle d'apprentissage profond par un dispositif. Un procédé de prédiction de latence selon un mode de réalisation peut comprendre les étapes suivantes : la génération d'une table de consultation de latence comprenant des informations d'une couche de réseau neuronal unique et des informations de latence sur un dispositif périphérique de la couche de réseau neuronal unique ; l'entraînement d'un prédicteur de latence de telle sorte que le prédicteur de latence prédit une latence d'entrée de la couche de réseau neuronal, à l'aide de la table de consultation de latence ; et la prédiction d'une latence d'entrée sur dispositif d'un modèle d'apprentissage profond à l'aide du prédicteur de latence entraîné.
PCT/KR2021/011006 2021-08-12 2021-08-19 Procédé et système de prédiction de latence de modèle d'apprentissage profond par dispositif WO2023017884A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20210106527 2021-08-12
KR10-2021-0106527 2021-08-12

Publications (1)

Publication Number Publication Date
WO2023017884A1 true WO2023017884A1 (fr) 2023-02-16

Family

ID=85200773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/011006 WO2023017884A1 (fr) 2021-08-12 2021-08-19 Procédé et système de prédiction de latence de modèle d'apprentissage profond par dispositif

Country Status (2)

Country Link
KR (1) KR102561799B1 (fr)
WO (1) WO2023017884A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668475A (zh) * 2023-05-18 2023-08-29 泰州市元根体育器材有限公司 在线教育操作系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210073242A (ko) * 2019-12-10 2021-06-18 삼성전자주식회사 모델 최적화 방법 및 장치 및 모델 최적화 장치를 포함한 가속기 시스템

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JUSSI HANHIROVA; TEEMU K\"AM\"AR\"AINEN; SIPI SEPP\"AL\"A; MATTI SIEKKINEN; VESA HIRVISALO; ANTTI YL\&QUO: "Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 March 2018 (2018-03-26), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080859116 *
MOHAMMED SHADY A.; SHIRMOHAMMADI SHERVIN; ALTAMIMI SA'DI: "A Multimodal Deep Learning-Based Distributed Network Latency Measurement System", IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, IEEE, USA, vol. 69, no. 5, 20 January 2020 (2020-01-20), USA, pages 2487 - 2494, XP011781913, ISSN: 0018-9456, DOI: 10.1109/TIM.2020.2967877 *
VÉSTIAS MÁRIO P.: "A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing", ALGORITHMS, vol. 12, no. 8, 31 July 2019 (2019-07-31), pages 154, XP093034724, DOI: 10.3390/a12080154 *
ZADEH ALI HADI; EDO ISAK; AWAD OMAR MOHAMED; MOSHOVOS ANDREAS: "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", 2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), IEEE, 17 October 2020 (2020-10-17), pages 811 - 824, XP033856366, DOI: 10.1109/MICRO50266.2020.00071 *
ZHANG LI LYNA LZHANI@MICROSOFT.COM; HAN SHIHAO HANS3@ROSE-HULMAN.EDU; WEI JIANYU NOOB@MAIL.USTC.EDU.CN; ZHENG NINGXIN NINGXIN.ZHEN: "nn-Meter towards accurate latency prediction of deep-learning model inference on diverse edge devices", PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, ACMPUB27, NEW YORK, NY, USA, 24 June 2021 (2021-06-24) - 3 December 2021 (2021-12-03), New York, NY, USA, pages 81 - 93, XP058761876, ISBN: 978-1-4503-8457-5, DOI: 10.1145/3458864.3467882 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668475A (zh) * 2023-05-18 2023-08-29 泰州市元根体育器材有限公司 在线教育操作系统
CN116668475B (zh) * 2023-05-18 2023-12-26 尚学仕教育科技(北京)有限公司 在线教育操作系统

Also Published As

Publication number Publication date
KR102561799B1 (ko) 2023-07-31
KR20230024835A (ko) 2023-02-21

Similar Documents

Publication Publication Date Title
CN107885762B (zh) 智能大数据系统、提供智能大数据服务的方法和设备
US20180307945A1 (en) Installation and operation of different processes of an an engine adapted to different configurations of hardware located on-premises and in hybrid environments
CN108334439A (zh) 一种压力测试方法、装置、设备和存储介质
US20160247062A1 (en) Mapping of algorithms to neurosynaptic hardware
WO2019235821A1 (fr) Technique d'optimisation pour former des dnn capables de réaliser des inférences en temps réel dans un environnement mobile
CN109246027B (zh) 一种网络维护的方法、装置和终端设备
CN109902446A (zh) 用于生成信息预测模型的方法和装置
WO2023017884A1 (fr) Procédé et système de prédiction de latence de modèle d'apprentissage profond par dispositif
CN108829518A (zh) 用于推送信息的方法和装置
CN116684330A (zh) 基于人工智能的流量预测方法、装置、设备及存储介质
WO2022146080A1 (fr) Algorithme et procédé de modification dynamique de la précision de quantification d'un réseau d'apprentissage profond
US20230349700A1 (en) Evacuation using digital twins
WO2022163985A1 (fr) Procédé et système d'éclaircissement d'un modèle d'inférence d'intelligence artificielle
CN114330353B (zh) 虚拟场景的实体识别方法、装置、设备、介质及程序产品
WO2022163996A1 (fr) Dispositif pour prédire une interaction médicament-cible à l'aide d'un modèle de réseau neuronal profond à base d'auto-attention, et son procédé
WO2023033194A1 (fr) Procédé et système de distillation de connaissances spécialisés pour l'éclaircissement de réseau neuronal profond à base d'élagage
US11288322B2 (en) Conversational agents over domain structured knowledge
WO2023095934A1 (fr) Procédé et système d'allégement d'un réseau neuronal à tête d'un détecteur d'objet
US20230050247A1 (en) Latency prediction method and computing device for the same
WO2023096004A1 (fr) Procédé et système de production de jeu modulaire
WO2022245020A1 (fr) Appareil et procédé d'étalonnage de données d'analyte
Imteaj et al. Exploiting federated learning technique to recognize human activities in resource-constrained environment
KR102429832B1 (ko) 네트워크 환경 분석 기반 원격 접속 서비스 제공 방법
WO2022145550A1 (fr) Algorithme et procédé de variation dynamique de la précision de quantification d'un réseau d'apprentissage profond
WO2023224205A1 (fr) Procédé de génération de modèle commun par synthèse de résultat d'apprentissage de modèle de réseau neuronal artificiel

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21953531

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE