CN115712551A - Performance monitoring device and system for high-performance computing application - Google Patents

Performance monitoring device and system for high-performance computing application Download PDF

Info

Publication number
CN115712551A
CN115712551A CN202211504001.0A CN202211504001A CN115712551A CN 115712551 A CN115712551 A CN 115712551A CN 202211504001 A CN202211504001 A CN 202211504001A CN 115712551 A CN115712551 A CN 115712551A
Authority
CN
China
Prior art keywords
module
performance
computing
data
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211504001.0A
Other languages
Chinese (zh)
Inventor
甘润东
龙玉江
卫薇
王策
卢仁猛
钟掖
王杰峰
陈卿
袁捷
吴忠
李洵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211504001.0A priority Critical patent/CN115712551A/en
Publication of CN115712551A publication Critical patent/CN115712551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a performance monitoring device and a system for high-performance computing application, which relate to the technical field of high-performance computing.

Description

Performance monitoring device and system for high-performance computing application
Technical Field
The invention relates to a performance monitoring device and system for high-performance computing application, and belongs to the technical field of high-performance computing.
Background
As a conventional technology of data centers, high performance computing technology has always been a significant position. How to monitor high-performance computing applications to determine whether a high-performance computing application is suitable for a current high-performance computing cluster and whether a current platform can be utilized in an efficient and maximized manner is an important technical problem in the field of high-performance computing, and is also concerned by high-performance computing cluster operation and maintenance personnel, application and popularization personnel and common users, wherein the key point is a monitoring system.
At present, performance monitoring devices in the market generally comprise a digital module, and firstly, a key factor which has the largest influence needs to be extracted from a plurality of factors by using the digital module to serve as an evaluation index; secondly, a performance evaluation program set is established through a programming module, constraint conditions are set, test questions are formed, testing is carried out according to a testing method, a computing module is used for computing in the process, and qualitative and quantitative evaluation is carried out on test results according to standards and the test results are displayed outside the performance monitoring device.
However, when the existing system for monitoring performance normally works, only pure numerical value transmission exists between the computing module and the parallel development module, after the high-performance numerical value obtained by the computing module is transmitted to the development module, the high-performance numerical value needs to be operated for many times in the development module, internal numerical value exchange is not sufficient, external numerical values are always input, and the computing process is repeated, so that the computing time of the system is long, and the system only simply classifies the data before the data enters, and the capability of computing resource distribution and application processing in the system is not considered, when the data is imported too much, the system cannot process the data, certain disorder can be caused, wrong results are obtained, and the subsequent work is influenced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the performance monitoring device and the system for high-performance computing application can effectively solve the problems that when an existing performance monitoring system in the background art works normally, only pure numerical value transmission exists between a computing module and a parallel development module, after a high-performance numerical value obtained by the computing module is transmitted to the development module, the high-performance numerical value needs to be operated for multiple times in the development module, internal numerical value exchange is insufficient, external numerical values are input all the time, the computing process can be repeated, computing time of the system is long, and the data are classified simply before entering, the computing resource distribution and application processing capacity in the system are not considered, when the data are imported too much, the system cannot process the data, certain disorder can be caused, wrong results are obtained, and subsequent work is influenced.
The technical scheme adopted by the invention is as follows: the utility model provides a performance monitoring devices that high performance calculation was used, includes the monitoring devices main part, the inside left side of monitoring devices main part is provided with the evaluation module, one side of evaluation module is provided with the calculation module, the bottom of calculation module is provided with parallel development module, one side that evaluation module was kept away from to parallel development module is provided with the fault-tolerant module, the surface of detection device main part is provided with display device, be provided with data transmission line between calculation module and the parallel development module.
Preferably, the evaluation module and the fault-tolerant module are symmetrically arranged along a central axis of the calculation module, and the evaluation module is electrically connected with the fault-tolerant module.
A performance monitoring system for high-performance computing application comprises an evaluation module, a computing module, a parallel development module, a fault-tolerant module and a display device;
the evaluation module evaluates the system in three aspects of system scale, system function and system performance, and adopts a typical numerical mode to evaluate the system application;
the evaluation module sends high calculation data into the calculation module for calculation after finishing evaluation, wherein the calculation module comprises a high-performance general processor, an acceleration processor, a heterogeneous hybrid high-performance processor, a carbon nano tube, a low-temperature superconducting quantum tube and a memristor which can thoroughly break through barriers of a power consumption wall, a memory access wall or a performance wall, and a novel processor which is established based on a novel calculation model of probability and cognition and based on brain-like, nano-optics and superconducting quantum devices;
the parallel development module comprises a memory storage unit and a plurality of programming units, the plurality of programming units adopt a plurality of programming languages to develop programmed data in a mixed manner under different environments, each programming unit obtains a task, the plurality of tasks are stored in the memory storage unit in parallel, the plurality of programming units can share information in the memory storage unit, data exchange among the tasks is completed by implicitly using shared data, a flexible task scheduling strategy is provided to call a plurality of programs and data structures at a time, the calculated data are classified in the parallel development module, work is matched with different programming languages, are stored in the memory storage unit together, and are displayed through a display device;
the fault-tolerant module provides a reliability guarantee technology and a parallel fault-tolerant method for the digital module and the computing module, when the evaluation module evaluates that the system is unqualified, the fault-tolerant module obtains an instruction to stop the work of the system, and when the computing module and the parallel development module run, the fault-tolerant module monitors the accuracy of the computing module and the bearing rate of the parallel development module in real time.
Preferably, the high performance general purpose processor described above employs a vectorized, continuously-widening SIMD design that increases the data width of a single functional unit to 512-bit wide, 8-long integer or double precision vectors, while supporting floating point/fixed point multiply-add operations.
Preferably, the memory storage unit in the parallel development module utilizes a novel storage medium to design a hybrid multi-level storage structure, and processes and responds to I/O access requests at multiple storage levels by utilizing I/O locality, so as to provide high-bandwidth, evenly-expandable I/O capability, and at the same time, a software storage and server platform is adopted to implement storage-oriented function customization and concurrent I/O performance optimization function, where the I/O performance optimization function is to process and respond to I/O access requests at multiple storage levels by utilizing I/O locality.
Preferably, the computing module sends the computed data to the parallel development module to cooperate with the parallel development module to work, so as to eliminate the bottleneck of the storage unit (the multi-level storage structure is used for realizing the fusion of computing and storage on different levels to process the data nearby), research the architecture of the fusion of computing and storage, and the architecture of the micro integration of computing and storage; based on a novel memory device technology, a three-dimensional packaging technology and an interconnection technology, according to the technical maturity and application requirements, the data distribution work is completed by realizing the fusion of calculation and storage on different levels of macroscopy and microcosmic and processing data nearby.
Preferably, the memristor has logic calculation capability, can simultaneously perform structural design of image processing operation and storage fusion, and can perform simple image processing operation on the storage position of image data; for big data processing, processing operations of data query, sequencing and data aggregation are researched and designed.
Preferably, the system scale refers to a quantitative index which reflects the actual resource scale of the system, such as theoretical calculation peak performance, processor core number, storage capacity and the like provided by a high-performance calculation system; system functionality refers to the functional support provided by a high performance computing system to multiple applications; the system performance comprises system single performance and system comprehensive performance, wherein the system single performance refers to single performance indexes of floating point computing capacity, system single node computing capacity, high-performance computing network data exchange capacity and system I/O capacity of a system processor core; the comprehensive performance of the system refers to the continuous computing capacity of a typical numerical mode available in a high-performance computing system, whether the software and hardware configuration of the system is reasonable or not, and whether the operation is coordinated or not, and comprises indexes of expandability, balance, fault tolerance, stability, usability, reliability and site environment supporting capacity of the system.
Preferably, in the application evaluation indexes, the evaluation indexes are set according to the service requirements and the test purposes of each test question, and the minimum system resource allocation and the minimum time required for completing the calculation task are included on the premise that the scale of the problem is fixed and the requirement on the timeliness is met.
Preferably, the high-performance computing system is provided with a balancing module in both a bottom hardware architecture and a system software layer to fully consider the I/O performance, so that the computing capability and the data access capability of the system can be balanced.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) In the invention, through the setting of a computing module and a parallel development module, the rationality of data of high-performance computation is checked, and within a certain time, 1 computing task is jointly completed by a plurality of processors, so that the computing efficiency is improved, the computing time is shortened, a plurality of programming units in the parallel development module adopt a plurality of programming languages to develop and program data in a mixed manner under different environments, each programming unit obtains one task, the tasks are stored in a memory storage unit in parallel, the programming units can share the information in the memory storage unit, and the data exchange among the tasks is completed by implicitly using shared data, so that higher internet bandwidth and lower delay are obtained, the parallel computing efficiency is further improved, the mixed development and programming can effectively reduce the occurrence of data repetition, the expansion capability is improved, and the computing capability and the data access capability of the system can be balanced;
2) According to the invention, through the arrangement of the fault-tolerant module, the stability of the system is ensured, certain index requirements are provided for each module of the whole system, the system performance balance can be ensured, the operation is stable, and higher computing efficiency and better using effect can be obtained by combining with an external software supporting system;
3) According to the invention, through the arrangement of the evaluation module, the system evaluation mainly considers the system theory/continuous calculation peak performance, whether the system overall performance is balanced, whether the single-core CPU peak performance and the cache/memory ratio are reasonable, whether the application software and the system software meet the requirements and the like; the application evaluation mainly considers the mapping relation between the continuous computing performance and the parallel expandability of each test system and the requirement on computing resources, and calculates the configuration scale of a target system by combining the computing capacity, the I/O capacity and the high-speed computing network data exchange capacity of the test systems; whether the system is matched with the application or not can be judged, when the performance attribute and the application calculation characteristic of the system are considered, the similarity and the difference between the system and the application can be evaluated, the calculation in the system is associated with the typical numerical mode, a basis is provided for the calculation evaluation of the typical numerical mode in the calculation system, an enough job management scheduling system can be provided, the application requirement of flexible allocation of calculation resources is provided, and the stability of the system is further improved.
Drawings
FIG. 1 is a cross-sectional view of a performance monitoring device for a high performance computing application of the present invention;
FIG. 2 is a flow diagram of a system for performance monitoring of a high performance computing application in accordance with the present invention;
FIG. 3 is a schematic diagram of the operation of parallel development modules in a performance monitoring system for high performance computing applications of the present invention;
FIG. 4 is a flowchart illustrating the operation of an evaluation module in a performance monitoring system for high performance computing applications, in accordance with the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1: as shown in fig. 1-4, a performance monitoring device for high performance computing application includes a monitoring device body, an evaluation module is disposed on the left side inside the monitoring device body, a computing module is disposed on one side of the evaluation module, a parallel development module is disposed at the bottom end of the computing module, a fault-tolerant module is disposed on one side of the parallel development module away from the evaluation module, a display device is disposed on the surface of the monitoring device body, a data transmission line is disposed between the computing module and the parallel development module, the evaluation module and the fault-tolerant module are symmetrically disposed along the central axis of the computing module, and the evaluation module and the fault-tolerant module are electrically connected to each other, thereby facilitating data transmission and reducing transmission loss.
Example 2: as shown in fig. 1-4, a performance monitoring system for high-performance computing applications includes an evaluation module, a computation module, a parallel development module, a fault-tolerant module and a display device, where the high-performance computing system is provided with a balancing module in both a bottom hardware architecture and a system software layer to fully consider I/O performance, so that the computing power and data access capability of the system can be balanced;
the evaluation module evaluates the system in three aspects of system scale, system function and system performance, wherein the system scale refers to quantitative indexes which reflect the actual resource scale of the system, such as theoretical calculation peak performance, processor core number, storage capacity and the like provided by a high-performance calculation system; system functionality refers to the functional support provided by a high performance computing system to multiple applications; the system performance comprises system single performance and system comprehensive performance, wherein the system single performance refers to single performance indexes of floating point computing capacity, system single node computing capacity, high-performance computing network data exchange capacity and system I/O capacity of a system processor core; the comprehensive performance of the system refers to whether the typical numerical mode is reasonable in continuous computing capacity, system software and hardware configuration and coordinated in operation which can be obtained by a high-performance computing system, and comprises indexes of expandability, balance, fault tolerance, stability, usability, reliability and site environment supporting capacity of the system, the typical numerical mode is adopted for system application to finish evaluation, and in the application evaluation indexes, the evaluation indexes are set according to the service requirement and the test purpose of each test question, wherein the minimum system resource configuration and the minimum time required by the calculation task are finished on the premise of fixing the problem scale and meeting the timeliness requirement;
the system evaluation mainly considers the system theoretical/continuous calculation peak performance, whether the overall system performance is balanced, whether the single-core CPU peak performance and the cache/memory ratio are reasonable, whether application software and system software meet requirements and the like; the application evaluation mainly considers the mapping relation between the continuous computing performance and the parallel expandability of each test system and the requirement on computing resources, and calculates the configuration scale of a target system by combining the computing capacity, the I/O capacity and the high-speed computing network data exchange capacity of the test systems; whether the system is matched with the application can be judged, when the performance attribute and the application calculation characteristic of the system are considered, the similarity and the difference between the system and the application can be evaluated, the calculation in the system is associated with the typical numerical mode, a basis is provided for the calculation evaluation of the typical numerical mode in the calculation system, an enough job management scheduling system can be provided, the application requirement of flexible allocation of calculation resources is provided, the stability of the system is further improved, and certain allocation can be performed according to the data and the calculation processing capacity of the system;
the computing module comprises a high-performance general processor, an acceleration processor and a heterogeneous hybrid high-performance processor, wherein the high-performance general processor adopts a vectorization and continuous broadening SIMD design, the SIMD design increases the data width of a single functional component to 512 bit width, 8 long integers or double-precision vectors on the premise of supporting floating point/fixed point multiply-add operation, the performance of the processor is effectively improved on the premise of low power consumption and high reliability, the computing module also comprises a carbon nano tube, a low-temperature superconducting quantum tube and a memristor, the carbon nano tube, the low-temperature superconducting quantum tube and the memristor can thoroughly break through a power consumption wall, an access wall or a performance wall obstacle, the memristor has logic computing capacity, the structural design of fusing image processing operation and storage can be carried out simultaneously, and simple image processing operation can be carried out on the storage position of image data, so that the access operation of the image data is greatly reduced, and the purpose of improving the image processing performance and efficiency is achieved; for big data processing, the processing operation of data query, sequencing and data aggregation is researched and designed, and a novel processor based on probability and cognition and based on brain-like, nano-optics and superconducting quantum devices is used, and after the evaluation module finishes evaluation, high calculation data is sent into a calculation module for calculation;
the parallel development module comprises a memory storage unit and a plurality of programming units, the plurality of programming units adopt a plurality of programming languages to develop data of programming in a mixed manner under different environments, each programming unit obtains a task, the plurality of tasks are stored in the memory storage unit in parallel, the plurality of programming units can share information in the memory storage unit, data exchange among the tasks is completed by implicitly using shared data, a flexible task scheduling strategy is provided to call a plurality of programs and data structures at a time, the calculated data are classified in the parallel development module and are matched with different programming languages to work together, the data are stored in the memory storage unit and are displayed through a display device, so that high internet bandwidth and low delay are obtained, parallel calculation efficiency is further improved, the mixed development programming can effectively reduce the occurrence of data repetition, the expansion capability is improved, the calculation capability and the data access capability of the system can be balanced, the memory storage unit in the parallel development module designs a mixed multi-level storage structure by using a novel storage medium, the I/O local processing response to I/O access requests at a plurality of storage levels, the I/O access requests and the I/O access capability can be customized, and the I/O access performance is optimized by using a novel storage medium, and the I/O function-oriented to realize the I/O optimization, and O function optimization, and the I/O optimization;
the computing module sends the computed data into the parallel development module to be matched with the parallel development module to work, so that the bottleneck of a storage unit can be eliminated, and a computing and storage integrated system structure and a computing and storage microcosmic integrated system structure are researched; based on a novel storage device technology, a three-dimensional packaging technology and an interconnection technology, according to the technical maturity and application requirements, the data distribution work is completed by realizing the fusion of calculation and storage and processing data nearby on different levels such as macroscopicity, microcosmicity and the like.
The fault-tolerant module provides a reliability guarantee technology and a parallel fault-tolerant method for the digital module and the computing module, guarantees reliability and stability of parallel computing, obtains an instruction to stop the work of the system when the evaluating module evaluates that the system is unqualified, monitors the accuracy of the computing module and the bearing rate of the parallel development module in real time when the computing module and the parallel development module run, guarantees the stability of the system, not only provides certain index requirements for each module of the whole system, but also can guarantee balanced performance and stable running of the system, and can obtain higher computing efficiency and better using effect by combining with an external software supporting system.
The working principle of the invention is as follows: the system is evaluated by an evaluation module, calculation in the system is associated with a typical numerical mode, a basis is provided for calculation and evaluation of the typical numerical mode in the calculation system, an enough job management scheduling system can be provided, application requirements of flexible allocation of calculation resources are provided, stability of the system is further improved, instructions are sent to each module after calculation is finished, data are led into the calculation module to obtain a high-performance numerical value after calculation is finished, the high-performance numerical value is sent to a parallel development module to be stored and displayed, a plurality of programming units in the parallel development module can share information in a memory storage unit, data exchange among tasks is finished by implicitly using shared data, therefore, high internet bandwidth and low delay are obtained, mixed development and programming can effectively reduce the occurrence of repeated data, expansion capacity is improved, the calculation capacity and data access capacity of the system can be balanced, and a fault-tolerant module can detect the data and the system at all times in the process.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention should be determined by the scope of the claims.

Claims (10)

1. A performance monitoring device for high performance computing applications, characterized by: the device comprises a monitoring device main body, wherein an evaluation module is arranged on the left side inside the monitoring device main body, a calculation module is arranged on one side of the evaluation module, a parallel development module is arranged at the bottom end of the calculation module, a fault-tolerant module is arranged on one side, far away from the evaluation module, of the parallel development module, a display device is arranged on the surface of the detection device main body, and a data transmission line is arranged between the calculation module and the parallel development module.
2. The performance monitoring device of claim 1, wherein: the evaluation module and the fault-tolerant module are symmetrically arranged along the central axis of the calculation module and are electrically connected.
3. A system for performance monitoring of high performance computing applications, characterized by: the system comprises an evaluation module, a calculation module, a parallel development module, a fault-tolerant module and a display device;
the evaluation module evaluates the system in three aspects of system scale, system function and system performance, and adopts a typical numerical mode to evaluate the system application;
the computing module comprises a high-performance general processor, an acceleration processor, a heterogeneous hybrid high-performance processor and a novel processor which can thoroughly break through barriers of a power consumption wall, a memory access wall or a performance wall, a carbon nano tube, a low-temperature superconducting quantum tube and a memristor and is established based on a novel computing model with brain-like, nano-optical and superconducting quantum devices based on probability and cognition, and high computing data are sent into the computing module for computing after the evaluation module finishes evaluation;
the parallel development module comprises a memory storage unit and a plurality of programming units, the plurality of programming units adopt a plurality of programming languages to develop programmed data in a mixed manner under different environments, each programming unit obtains a task, the plurality of tasks are stored in the memory storage unit in parallel, the plurality of programming units share information in the memory storage unit, data exchange among the tasks is completed by implicitly using shared data, a flexible task scheduling strategy is provided to call a plurality of programs and data structures at a time, the calculated data are classified in the parallel development module, work is matched with different programming languages, are stored in the memory storage unit together, and are displayed through a display device;
the fault-tolerant module provides a reliability guarantee technology and a parallel fault-tolerant method for the digital module and the computing module, when the evaluating module evaluating system is unqualified, the fault-tolerant module obtains an instruction to stop the work of the system, and when the computing module and the parallel development module run, the fault-tolerant module monitors the accuracy of the computing module and the bearing rate of the parallel development module in real time.
4. The performance monitoring system of claim 3, wherein: the high performance general purpose processor employs a vectorized, continuously-broadened SIMD design that increases the data width of a single functional unit to 512 bit wide, 8 long integers, or double precision vectors on the premise of supporting floating point/fixed point multiply-add operations.
5. The performance monitoring system of claim 3, wherein: the memory storage unit in the parallel development module utilizes a novel storage medium to design a mixed multi-level storage structure, responds I/O access requests at a plurality of storage levels by utilizing I/O locality, so as to provide high-bandwidth and balanced-expansion I/O capacity, simultaneously adopts a software defined storage and server platform to realize storage-oriented function customization and concurrent I/O performance optimization function, and the I/O performance optimization function refers to responding the I/O access requests at the plurality of storage levels by utilizing the I/O locality.
6. The performance monitoring system of claim 3, wherein: the computing module sends the computed data into the parallel development module to be matched with the parallel development module to work, so that the bottleneck of a storage unit is eliminated, and a computing and storage integrated system structure and a computing and storage microcosmic integrated system structure are researched; based on a novel memory device technology, a three-dimensional packaging technology and an interconnection technology, according to the technical maturity and application requirements, the data distribution work is completed by realizing the fusion of calculation and storage on different levels of macroscopy and microcosmic and processing data nearby.
7. The system for performance monitoring of high performance computing applications of claim 3, wherein: the memristor has logic calculation capacity, simultaneously performs structural design of image processing operation and storage fusion, and can perform simple image processing operation at the storage position of image data; for big data processing, processing operations of data query, sequencing and data aggregation are researched and designed.
8. The system of claim 3 for performance monitoring of high performance computing applications, wherein: the system scale refers to a quantitative index that theoretical calculation peak performance, processor core number and storage capacity provided by a high-performance calculation system reflect the actual resource scale of the system; system functionality refers to the functional support provided by a high performance computing system to multiple applications; the system performance comprises system single performance and system comprehensive performance, wherein the system single performance refers to single performance indexes of floating point computing capacity, system single node computing capacity, high-performance computing network data exchange capacity and system I/O capacity of a system processor core; the comprehensive performance of the system refers to continuous computing power, whether system software and hardware configuration is reasonable and whether operation is coordinated, which are obtained by a typical numerical mode in a high-performance computing system, and comprises indexes of expandability, balance, fault tolerance, stability, usability, reliability and site environment supporting capability of the system.
9. The system of claim 3 for performance monitoring of high performance computing applications, wherein: in the application evaluation indexes, the evaluation indexes are set according to the service requirements and the test purposes of each test question, and the minimum system resource allocation and the minimum time required by the calculation task are completed on the premise that the problem scale is fixed and the timeliness requirements are met.
10. The system of claim 3 for performance monitoring of high performance computing applications, wherein: the high-performance computing system is provided with a balance module on a bottom hardware architecture and a system software layer, and is used for considering I/O performance and balancing the computing capacity and the data access capacity of the system.
CN202211504001.0A 2022-11-28 2022-11-28 Performance monitoring device and system for high-performance computing application Pending CN115712551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211504001.0A CN115712551A (en) 2022-11-28 2022-11-28 Performance monitoring device and system for high-performance computing application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211504001.0A CN115712551A (en) 2022-11-28 2022-11-28 Performance monitoring device and system for high-performance computing application

Publications (1)

Publication Number Publication Date
CN115712551A true CN115712551A (en) 2023-02-24

Family

ID=85235193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211504001.0A Pending CN115712551A (en) 2022-11-28 2022-11-28 Performance monitoring device and system for high-performance computing application

Country Status (1)

Country Link
CN (1) CN115712551A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074179A (en) * 2023-03-06 2023-05-05 鹏城实验室 High expansion node system based on CPU-NPU cooperation and training method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074179A (en) * 2023-03-06 2023-05-05 鹏城实验室 High expansion node system based on CPU-NPU cooperation and training method
CN116074179B (en) * 2023-03-06 2023-07-14 鹏城实验室 High expansion node system based on CPU-NPU cooperation and training method

Similar Documents

Publication Publication Date Title
Zheng et al. Real-time big data processing framework: challenges and solutions
CN107391258B (en) Software and hardware integrated portable remote sensing image real-time processing system
CN107329828A (en) A kind of data flow programmed method and system towards CPU/GPU isomeric groups
CN105808358B (en) A kind of data dependence thread packet mapping method for many-core system
CN113821332B (en) Method, device, equipment and medium for optimizing efficiency of automatic machine learning system
CN115712551A (en) Performance monitoring device and system for high-performance computing application
Mousavi Khaneghah et al. A mathematical multi-dimensional mechanism to improve process migration efficiency in peer-to-peer computing environments
CN111190735A (en) Linux-based on-chip CPU/GPU (Central processing Unit/graphics processing Unit) pipelined computing method and computer system
Song et al. Energy efficiency optimization in big data processing platform by improving resources utilization
Haji et al. Performance Monitoring and Controlling of Multicore Shared-Memory Parallel Processing Systems
Zhou et al. Semantic-based discovery method for high-performance computing resources in cyber-physical systems
Kwon et al. Dynamic scheduling method for cooperative resource sharing in mobile cloud computing environments
CN113806606A (en) Three-dimensional scene-based electric power big data rapid visual analysis method and system
Wen et al. The application of artificial intelligence technology in cloud computing environment resources
Zhou et al. Canary: Decentralized distributed deep learning via gradient sketch and partition in multi-interface networks
Zhou et al. Task offloading strategy of 6G heterogeneous edge-cloud computing model considering mass customization mode collaborative manufacturing environment
CN104331336B (en) Be matched with the multilayer nest balancing method of loads of high-performance computer structure
Wang et al. Directive-based hybrid parallel power system dynamic simulation on multi-core cpu and many-core gpu architecture
Yang et al. Study on static task scheduling based on heterogeneous multi-core processor
Fang et al. A Scheduling Strategy for Reduced Power Consumption in Mobile Edge Computing
CN114819367A (en) Public service platform based on industrial internet
CN109446294B (en) Parallel mutual subspace Skyline query method
CN104951369A (en) Hotspot resource competition eliminating method and device
Zhou et al. Optimization Control Strategy of Electricity Information Acquisition System Based on Edge-Cloud Computing Collaboration
Dandamudi et al. Architectures for parallel query processing on networks of workstations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination