CN114116236A - Construction method and system of heterogeneous computing system - Google Patents

Construction method and system of heterogeneous computing system Download PDF

Info

Publication number
CN114116236A
CN114116236A CN202210089943.0A CN202210089943A CN114116236A CN 114116236 A CN114116236 A CN 114116236A CN 202210089943 A CN202210089943 A CN 202210089943A CN 114116236 A CN114116236 A CN 114116236A
Authority
CN
China
Prior art keywords
model
heterogeneous computing
computing system
task type
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210089943.0A
Other languages
Chinese (zh)
Other versions
CN114116236B (en
Inventor
李波
王滨
黄茗
杨军
张鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 15 Research Institute
Original Assignee
CETC 15 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 15 Research Institute filed Critical CETC 15 Research Institute
Priority to CN202210089943.0A priority Critical patent/CN114116236B/en
Publication of CN114116236A publication Critical patent/CN114116236A/en
Application granted granted Critical
Publication of CN114116236B publication Critical patent/CN114116236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7803System on board, i.e. computer system on one or more PCB, e.g. motherboards, daughterboards or blades
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/544Remote

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to a construction method and a system of a heterogeneous computing system, which determine task types according to different service requirements; acquiring training data according to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert a model file corresponding to a task type to obtain a model deployment file and a configuration file; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.

Description

Construction method and system of heterogeneous computing system
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for constructing a heterogeneous computing system.
Background
With the development of cloud computing, big data and artificial intelligence technology application, the demand for computing power in the intelligent world is increasing at a speed ten times per year, the computing boundary is also extending, and the intelligent computing is beyond the reach of the data center and the terminal. For years, the mainstream intelligent computing platform is controlled by foreign manufacturers and open source communities, and most hardware manufacturers and intelligent computing frames are monopolized by some manufacturers, so that the development and cost control of intelligent computing are severely restricted.
How to effectively prevent monopoly is the problem which needs to be solved at present by ensuring the diversity of the application of the intelligent computing platform and the rapid and variable deployment environment and meeting the requirements of autonomous controllability, rapid system deployment, dynamic adjustment and the like.
Disclosure of Invention
The invention aims to provide a method and a system for constructing a heterogeneous computing system, which aim to solve the defects in the prior art, and the technical problem to be solved by the invention is realized by the following technical scheme.
In a first aspect, an embodiment of the present invention provides a method for building a heterogeneous computing system, where the method includes:
determining a task type corresponding to the service requirement according to different service requirements;
according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type;
training the neural network model by adopting a pre-established heterogeneous computing system and the training data to obtain a model file corresponding to the task type;
an XPU acceleration stack is adopted to convert the model file corresponding to the task type to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack;
and packaging the model deployment file and the configuration file by adopting the pre-established heterogeneous computing system to obtain different business services, and issuing the business services.
Optionally, the obtaining training data corresponding to the task type according to the task type includes:
receiving training data which is input by a user and corresponds to the task type according to the task type;
or
And acquiring training data corresponding to the task type according to a network crawling mode.
Optionally, the constructing a neural network model corresponding to the task type includes:
carrying out data cleaning, data labeling and data set type classification on the training data to obtain processed training data;
storing the processed training data in a pre-established database;
establishing a neural network model corresponding to the training data according to the processed training data;
and generating a calculation graph according to the neural network model.
Optionally, the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, the accelerator card, the operating system, and the deep learning framework component are of different types, and the pre-established heterogeneous computing system comprises at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.
Optionally, the training, by using a pre-established heterogeneous computing system, the neural network model by using training data corresponding to the task to obtain a model file corresponding to the task type includes:
and training the neural network model by adopting a pre-established heterogeneous computing system and adopting training data corresponding to the task to generate a model file corresponding to the task type, wherein the output format of the model file at least comprises one of an extensible computational graph module, a standard data type or a built-in operator.
Optionally, the converting the model file corresponding to the task type by using an XPU stack to obtain a model deployment file and a configuration file corresponding to the XPU stack includes:
adopting acceleration stacks corresponding to different acceleration cards to carry out inference of preset rules on the model files corresponding to the task types to obtain processing results corresponding to the neural network model, wherein the preset rules at least comprise one of a computation graph pruning algorithm, an operator fusion algorithm and a model INT8 quantization algorithm;
and compiling the processing result in a static compiling mode according to different heterogeneous computing systems to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack.
Optionally, the encapsulating, by using the pre-established heterogeneous computing system, the model deployment file and the configuration file to obtain different business services, and publishing the business services includes:
and for different pre-established heterogeneous computing systems, adopting a containerization deployment mode, packaging the model deployment file and the configuration to obtain different business servers, and issuing the business services.
Optionally, the method further comprises:
and taking the service as a remote calling port so that a remote control end can call the service to execute corresponding operation.
In a second aspect, an embodiment of the present invention provides a heterogeneous computing system, where the apparatus includes: the heterogeneous computing system at least comprises a processor, an accelerator card, an operating system and a deep learning framework component, wherein the processor and the accelerator card are respectively different types of board cards, and the heterogeneous computing system is used for executing the construction method of the heterogeneous computing system in the first aspect.
Optionally, the system further includes a remote control end, and the remote control end is configured to invoke various services generated by the heterogeneous computing system.
The embodiment of the invention has the following advantages:
according to the construction method of the heterogeneous computing system and the heterogeneous computing system, the task type corresponding to the business requirement is determined according to different business requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.
Drawings
FIG. 1 is a flow chart illustrating steps of an embodiment of a method for building a heterogeneous computing system in accordance with the present invention;
FIG. 2 is a flow chart illustrating steps of a method of building a heterogeneous computing system according to an embodiment of the present invention;
FIG. 3 is a block diagram of a heterogeneous computing system in accordance with the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for building a heterogeneous computing system according to the present invention is shown, where the method specifically includes the following steps:
s101, determining a task type corresponding to a service requirement according to different service requirements;
specifically, the heterogeneous computing system comprises hardware equipment and software carried on the hardware equipment, wherein the software carried on the hardware equipment comprises operating systems and deep learning framework components of different types, different manufacturers and different models.
Based on the heterogeneous computing system, a user can input different service requirements, and determine a task type corresponding to the service requirements according to the different service requirements, such as a service for image processing or a voice recognition service.
In practical application, the application fields of the service requirements to be met, such as images, natural languages, voices, recommendations, reinforcement learning and the like, are determined according to the service requirements input by the user, then specific service requirements are analyzed in combination with different application fields, a clear task type is determined, and further subsequent operations are executed in a heterogeneous computing system in combination with corresponding software and hardware.
S102, acquiring training data corresponding to the task type according to the task type, and constructing a neural network model corresponding to the task type;
specifically, after determining the task type required by the user, the heterogeneous computing system needs to acquire training data corresponding to the task type, and establish a neural network model corresponding to the task type according to the training data. Collecting data sets required by related tasks, and designing corresponding network models;
that is, after the task type is determined, further collecting data sets required by related tasks, and establishing a corresponding neural network model;
for example, if the task type is a face recognition task, training data corresponding to the face recognition task needs to be acquired, that is, a large number of different people and different angle face photos are collected as training data.
S103, training the neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type;
specifically, based on the pre-established heterogeneous computing system, training a neural network model by using acquired training data, namely a data set corresponding to a service type, to obtain a model file corresponding to a task type, wherein the model file is output according to a uniform model output format, and the uniform model output format comprises the following three components: (1) the method comprises the following steps of (1) defining an extensible computation graph model, (2) defining a standard data type, and (3) defining a built-in operator;
s104, converting the model file corresponding to the task type by adopting an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack;
specifically, after a training model is determined by a pre-established heterogeneous computing system, different XPU accelerator cards can be adopted to perform inference acceleration on the generated training model due to the fact that the heterogeneous computing system comprises the different XPU accelerator cards, specifically, inference acceleration modes such as operator fusion, calculation chart key values and model quantization are adopted, model operation model libraries or executable files corresponding to different XPUs are generated after different XPU acceleration stack optimization, and meanwhile, corresponding configuration files are generated;
and S105, encapsulating the model deployment file and the configuration file by adopting a pre-established heterogeneous computing system to obtain different business services, and issuing the business services.
Specifically, different types of model deployment files and configuration files are respectively encapsulated by a pre-established heterogeneous computing system to obtain different business services, and the business services are released.
Exemplarily, if the service is a face recognition service, the model deployment file and the configuration file corresponding to the face recognition are encapsulated to obtain a face recognition service, so that other users can directly call the service if the face recognition service is to be executed.
For example, if the service is a voice recognition service, the model deployment file and the configuration file corresponding to the voice recognition service are encapsulated to obtain a voice recognition service, so that other users can directly call the voice recognition service when using the voice recognition service.
According to the construction method of the heterogeneous computing system, the task type corresponding to the business requirement is determined according to different business requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.
The present invention further provides a supplementary description of the method for constructing a heterogeneous computing system according to the above embodiment.
Optionally, obtaining training data corresponding to the task type according to the task type includes:
receiving training data corresponding to the task type and input by a user according to the task type;
specifically, a user may prepare training data corresponding to a task type in advance and then input the training data into a pre-established heterogeneous computing system.
Or
And acquiring training data corresponding to the task type according to a network crawling mode.
Specifically, the web crawling method is to obtain training data corresponding to the task type through a program or a script capable of automatically capturing web information according to a preset rule.
Optionally, constructing a neural network model corresponding to the task type includes:
carrying out data cleaning, data labeling and data set type classification on the training data to obtain processed training data;
storing the processed training data in a pre-established database;
establishing a neural network model corresponding to the training data according to the processed training data;
a computational graph is generated from the neural network model.
Specifically, a large amount of training data is acquired according to user supply or network crawling and other manners, the training data may include some data with inconsistent formats, and the data needs to be cleaned, that is, data consistency detection is performed on the training data, and data with invalid values is deleted and missing value data is determined. After the data cleaning is completed, effective data in the training data is labeled, for example, data at key point positions in the face recognition data is labeled, then the training data is classified according to different types, so that neural network models of different types can be trained subsequently, and the classified training data is stored in a database respectively, that is, a matched neural network model structure is established according to a data set type and a task type, so as to generate a calculation graph.
Wherein, the neural network model at least comprises: BP network, the BP network includes the multi-layer feedforward network, RBF (radial basis function) network, Hopfield (associative memory) network, SOM self-organizing feature mapping model or ART adaptive resonance theory network, quantum neural network. The present invention is not particularly limited in the embodiments.
Optionally, the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, the accelerator card, the operating system, and the deep learning framework component are of different types, the pre-established heterogeneous computing system comprising at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.
The processor is various types of cpus (central processing units), the accelerator card is an XPU, wherein the XPU may be a GPU, a TPU, an NPU, an APU, an FPU, an HPU, an IPU, an MPU, an RPU, a VPU, a WPU, an XPU, a ZPU, etc., and is not specifically limited in the embodiment of the present invention. In the embodiment of the invention, the CPU can be different types of chips of different manufacturers in China, and the CPU can be combined with different XPU acceleration cards to execute the scheme in the embodiment of the invention.
Optionally, training the neural network model by using a pre-established heterogeneous computing system and using training data corresponding to the task to obtain a model file corresponding to the task type, where the method includes:
and training the neural network model by adopting a pre-established heterogeneous computing system and adopting training data corresponding to the task to generate a model file corresponding to the task type, wherein the output format of the model file at least comprises one of an extensible computational graph module, a standard data type or a built-in operator.
Optionally, converting the model file corresponding to the task type by using an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack, including:
adopting acceleration stacks corresponding to different acceleration cards to carry out inference of preset rules on model files corresponding to task types to obtain processing results corresponding to the neural network model, wherein the preset rules at least comprise one of a computation graph pruning algorithm, an operator fusion algorithm and a model INT8 quantization algorithm;
specifically, the computation graph pruning algorithm specifically includes: the pruning algorithm prunes some subtrees from the bottom of the 'fully grown' decision tree, making the decision tree smaller (model simpler) and thus enabling more accurate prediction of unknown data. The CART pruning algorithm consists of two steps: firstly, continuously pruning from the bottom end of a decision tree T0 generated by a generation algorithm until a root node of T0 forms a subtree sequence { T0, T1, …, Tn }; and then testing the sub-tree sequences on the independent verification data sets through a cross verification method, and selecting the optimal sub-tree from the sub-tree sequences.
And compiling the processing result in a static compiling mode according to different heterogeneous computing systems to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack.
Specifically, an XPU acceleration stack conversion pre-training model is used to generate a corresponding model deployment file and configuration file, and the specific method is as follows:
different XPU accelerator cards have different acceleration stacks to convert the pre-training model, and the aim is to carry out inference optimization on the pre-training model to better adapt to XPU hardware, and the process comprises methods of computation graph pruning, operator fusion, model INT8 quantization and the like. Meanwhile, according to different deployment platforms, the difference of the platforms can be avoided by using a static compiling method, and one-time compiling multi-platform deployment is realized. And finally outputting the model deployment file and the configuration file of the corresponding XPU hardware.
Optionally, encapsulating the model deployment file and the configuration file by using a pre-established heterogeneous computing system to obtain different business services, and publishing the business services, including:
and for different pre-established heterogeneous computing systems, a containerization deployment mode is adopted to package the model deployment file and the configuration to obtain different business servers, and the business servers are issued.
Optionally, the method further comprises:
and taking the service as a remote calling port so that the remote control terminal can call the service to execute corresponding operation.
Packaging the different intelligent heterogeneous computing systems into corresponding services for issuing, and simultaneously supporting containerization deployment and remote API (application program interface) calling, wherein the specific method comprises the following steps:
aiming at different computing platform bottom layer environments, in order to realize rapid deployment of model application, a containerization deployment scheme is adopted, runtime errors caused by hardware differences are avoided to the greatest extent, meanwhile, a static compiling technology is adopted, the problem of repeated compiling of an intelligent heterogeneous computing system is avoided, and deployment difficulty is reduced. The containerized deployment scheme also supports remote API call functionality.
The generated model file and the configuration file are packaged into corresponding services for issuing according to different intelligent heterogeneous computing systems, and meanwhile containerization deployment and remote API calling are supported.
Fig. 2 is a flowchart illustrating steps of a method for building a heterogeneous computing system according to another embodiment of the present invention, where, as shown in fig. 2, the method for building a heterogeneous computing system includes:
s201, collecting user requirements and defining task types;
s202, collecting a data set and constructing a deep learning model;
s203, completing model training and unified format model storage based on a domestic intelligent heterogeneous computing system;
s204, completing reasoning acceleration and deployment engine output of the model according to different NPU acceleration stacks;
s205, according to software and hardware configuration of the intelligent heterogeneous computing system, the corresponding model files and the API remote calling interfaces are deployed in a container mode.
The embodiment of the invention provides an intelligent heterogeneous computing system development system based on domestic software and hardware, which determines the application fields of deep learning, such as image classification, target recognition, recommendation, voice, text and reinforcement learning, according to the actual task types; collecting data sets required by related tasks and designing corresponding network models; carrying out model training by using a domestic calculation frame and a corresponding domestic training XPU, and storing a training model according to a set model output format; the model is optimized aiming at the unused XPU acceleration stack and the CPU platform, and a corresponding model deployment file and a corresponding configuration document are generated; aiming at a target deployment platform, an intelligent application model is deployed, a model reasoning process is executed, and functions of containerization deployment, API remote access and the like are supported.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
According to the construction method of the heterogeneous computing system, the task type corresponding to the business requirement is determined according to different business requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.
Another embodiment of the present invention provides a heterogeneous computing system, configured to execute the method for constructing a heterogeneous computing system provided in the foregoing embodiment.
Referring to fig. 3, a schematic structural diagram of a heterogeneous computing system according to the present invention is shown, where the heterogeneous computing system includes at least a processor, an accelerator card, an operating system, and a deep learning framework component, where the processor and the accelerator card are respectively different types of boards, and the heterogeneous computing system is configured to execute the above-mentioned method for building the heterogeneous computing system.
The method specifically comprises a model training module, an inference acceleration module and a service publishing module, wherein the model training module comprises a training set, a deep learning framework and an XPU heterogeneous cluster, and then the deep learning framework is trained according to the training set to obtain a neural network model;
the inference acceleration module comprises an XPU acceleration stack module and a model deployment file, and carries out acceleration inference on a neural network model through the XPU acceleration stack to generate the model deployment file and a configuration file;
and the service publishing module performs containerized deployment on the model deployment file and the configuration file and then serves as an API service interface for a remote terminal to use.
Illustratively, the heterogeneous computing system provided by the embodiment of the present invention is a domestic intelligent heterogeneous computing platform, that is, an autonomous controllable computing platform based on a domestic CPU, an XPU, an operating system, and a deep learning framework component. The CPU part supports the types of Feiteng 2000Plus and Shenwei 6 BCPU; the XPU part supports the palace to be Atlas series, Baidu Kunlun series, Bizhiluan SC5 series and Membranan MLU series, and the XPU accelerating cards supporting training are the palas 300T, Membranan MLU290, Baidu Kunlun K200 and the like; the domestic deep learning framework comprises hundred degrees PaddlePaddle, Hua MindSpore, Qing Hua Jitto and the like, and the functional adaptation and integrity test is completed aiming at the software and hardware.
Optionally, the system further includes a remote control end, and the remote control end is used for calling various services generated by the heterogeneous computing system.
According to the heterogeneous computing system of the embodiment, the task type corresponding to the service requirement is determined according to different service requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.
It should be noted that the above detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. Furthermore, it will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
It should be noted that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or otherwise described herein.
Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
Spatially relative terms, such as "above … …," "above … …," "above … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may also be oriented in other different ways, such as by rotating it 90 degrees or at other orientations, and the spatially relative descriptors used herein interpreted accordingly.
In the foregoing detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components, unless context dictates otherwise. The illustrated embodiments described in the detailed description and drawings are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of building a heterogeneous computing system, the method comprising:
determining a task type corresponding to the service requirement according to different service requirements;
according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type;
training the neural network model by adopting a pre-established heterogeneous computing system and the training data to obtain a model file corresponding to the task type;
an XPU acceleration stack is adopted to convert the model file corresponding to the task type to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack;
and packaging the model deployment file and the configuration file by adopting the pre-established heterogeneous computing system to obtain different business services, and issuing the business services.
2. The method of claim 1, wherein the obtaining training data corresponding to the task type according to the task type comprises:
receiving training data which is input by a user and corresponds to the task type according to the task type;
or
And acquiring training data corresponding to the task type according to a network crawling mode.
3. The method of claim 2, wherein constructing the neural network model corresponding to the task type comprises:
carrying out data cleaning, data labeling and data set type classification on the training data to obtain processed training data;
storing the processed training data in a pre-established database;
establishing a neural network model corresponding to the training data according to the processed training data;
and generating a calculation graph according to the neural network model.
4. The method of claim 3, wherein the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, accelerator card, operating system, and deep learning framework component are of different types, and wherein the pre-established heterogeneous computing system comprises at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.
5. The method of claim 3, wherein the training the neural network model using the pre-established heterogeneous computing system using the training data corresponding to the task to obtain a model file corresponding to the task type comprises:
and training the neural network model by adopting a pre-established heterogeneous computing system and adopting training data corresponding to the task to generate a model file corresponding to the task type, wherein the output format of the model file at least comprises one of an extensible computational graph module, a standard data type or a built-in operator.
6. The method of claim 5, wherein the converting the model file corresponding to the task type by using an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack comprises:
adopting acceleration stacks corresponding to different acceleration cards to carry out inference of preset rules on the model files corresponding to the task types to obtain processing results corresponding to the neural network model, wherein the preset rules at least comprise one of a computation graph pruning algorithm, an operator fusion algorithm and a model INT8 quantization algorithm;
and compiling the processing result in a static compiling mode according to different heterogeneous computing systems to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack.
7. The method according to claim 6, wherein the encapsulating the model deployment file and the configuration file with the pre-established heterogeneous computing system to obtain different business services and publishing the business services comprises:
and for different pre-established heterogeneous computing systems, adopting a containerization deployment mode, packaging the model deployment file and the configuration to obtain different business servers, and issuing the business services.
8. The method of claim 7, further comprising:
and taking the service as a remote calling port so that a remote control end can call the service to execute corresponding operation.
9. A heterogeneous computing system, comprising at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor and the accelerator card are respectively different types of boards, and the heterogeneous computing system is configured to perform the method for building the heterogeneous computing system according to any one of claims 1 to 8.
10. The system of claim 9, further comprising a remote control for invoking various services generated by the heterogeneous computing system.
CN202210089943.0A 2022-01-26 2022-01-26 Construction method and system of heterogeneous computing system Active CN114116236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210089943.0A CN114116236B (en) 2022-01-26 2022-01-26 Construction method and system of heterogeneous computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210089943.0A CN114116236B (en) 2022-01-26 2022-01-26 Construction method and system of heterogeneous computing system

Publications (2)

Publication Number Publication Date
CN114116236A true CN114116236A (en) 2022-03-01
CN114116236B CN114116236B (en) 2022-04-08

Family

ID=80361491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210089943.0A Active CN114116236B (en) 2022-01-26 2022-01-26 Construction method and system of heterogeneous computing system

Country Status (1)

Country Link
CN (1) CN114116236B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282641A (en) * 2022-03-07 2022-04-05 麒麟软件有限公司 Construction method of universal heterogeneous acceleration framework
CN115048177A (en) * 2022-08-15 2022-09-13 成都中科合迅科技有限公司 Dynamic configuration method for completing business scene based on custom container
CN116450486A (en) * 2023-06-16 2023-07-18 浪潮电子信息产业股份有限公司 Modeling method, device, equipment and medium for nodes in multi-element heterogeneous computing system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129828A1 (en) * 2016-11-04 2018-05-10 Qualcomm Incorporated Exclusive execution environment within a system-on-a-chip computing system
CN111860867A (en) * 2020-07-24 2020-10-30 苏州浪潮智能科技有限公司 Model training method and system for hybrid heterogeneous system and related device
CN112085217A (en) * 2020-09-08 2020-12-15 中国平安人寿保险股份有限公司 Method, device, equipment and computer medium for deploying artificial intelligence service
CN113434261A (en) * 2021-08-27 2021-09-24 阿里云计算有限公司 Heterogeneous computing device virtualization method and system
CN113672374A (en) * 2021-10-21 2021-11-19 深圳致星科技有限公司 Task scheduling method and system for federal learning and privacy computation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129828A1 (en) * 2016-11-04 2018-05-10 Qualcomm Incorporated Exclusive execution environment within a system-on-a-chip computing system
CN111860867A (en) * 2020-07-24 2020-10-30 苏州浪潮智能科技有限公司 Model training method and system for hybrid heterogeneous system and related device
CN112085217A (en) * 2020-09-08 2020-12-15 中国平安人寿保险股份有限公司 Method, device, equipment and computer medium for deploying artificial intelligence service
CN113434261A (en) * 2021-08-27 2021-09-24 阿里云计算有限公司 Heterogeneous computing device virtualization method and system
CN113672374A (en) * 2021-10-21 2021-11-19 深圳致星科技有限公司 Task scheduling method and system for federal learning and privacy computation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KEJIE LYU 等: "Attention-Aware Multi-Task Convolutional Neural Networks", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
许浩博 等: "面向多任务处理的神经网络加速器设计", 《高技术通讯》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282641A (en) * 2022-03-07 2022-04-05 麒麟软件有限公司 Construction method of universal heterogeneous acceleration framework
CN115048177A (en) * 2022-08-15 2022-09-13 成都中科合迅科技有限公司 Dynamic configuration method for completing business scene based on custom container
CN115048177B (en) * 2022-08-15 2022-11-04 成都中科合迅科技有限公司 Dynamic configuration method for completing business scene based on custom container
CN116450486A (en) * 2023-06-16 2023-07-18 浪潮电子信息产业股份有限公司 Modeling method, device, equipment and medium for nodes in multi-element heterogeneous computing system
CN116450486B (en) * 2023-06-16 2023-09-05 浪潮电子信息产业股份有限公司 Modeling method, device, equipment and medium for nodes in multi-element heterogeneous computing system

Also Published As

Publication number Publication date
CN114116236B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN114116236B (en) Construction method and system of heterogeneous computing system
Ribeiro et al. Mlaas: Machine learning as a service
CN107317724A (en) Data collecting system and method based on cloud computing technology
CN111191789B (en) Model optimization deployment system, chip, electronic equipment and medium
CN105760272B (en) Monitoring backstage business customizing method and its system based on plug-in unit
CN113778871A (en) Mock testing method, device, equipment and storage medium
CN112099848B (en) Service processing method, device and equipment
Matsubara et al. Split computing for complex object detectors: Challenges and preliminary results
CN115512005A (en) Data processing method and device
CN111813910A (en) Method, system, terminal device and computer storage medium for updating customer service problem
WO2020143236A1 (en) Method, device, and equipment for accelerating convolutional neural network, and storage medium
JP2019128831A (en) Calculation technique determining system, calculation technique determining device, processing device, calculation technique determining method, processing method, calculation technique determining program, and processing program
CN107480115B (en) Method and system for format conversion of caffe frame residual error network configuration file
Xie et al. Energy efficiency enhancement for cnn-based deep mobile sensing
CN104866310A (en) Knowledge data processing method and system
CN113157917A (en) OpenCL-based optimized classification model establishing and optimized classification method and system
CN113627422A (en) Image classification method and related equipment thereof
CN115202868A (en) Autonomous controllable heterogeneous intelligent computing service platform and intelligent scene matching method
CN112099882B (en) Service processing method, device and equipment
KR102188044B1 (en) Framework system for intelligent application development based on neuromorphic architecture
CN116862951A (en) Transformer-based lightweight target identification and tracking system and method
CN117275086A (en) Gesture recognition method, gesture recognition device, computer equipment and storage medium
CN116739154A (en) Fault prediction method and related equipment thereof
CN112748953A (en) Data processing method and device based on neural network model and electronic equipment
CN110674935B (en) Method for transplanting intelligent algorithm to airborne embedded platform and intelligent computing platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant