CN114116236A

CN114116236A - Construction method and system of heterogeneous computing system

Info

Publication number: CN114116236A
Application number: CN202210089943.0A
Authority: CN
Inventors: 李波; 王滨; 黄茗; 杨军; 张鑫
Original assignee: CETC 15 Research Institute
Current assignee: CETC 15 Research Institute
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-03-01
Anticipated expiration: 2042-01-26
Also published as: CN114116236B

Abstract

The invention relates to a construction method and a system of a heterogeneous computing system, which determine task types according to different service requirements; acquiring training data according to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert a model file corresponding to a task type to obtain a model deployment file and a configuration file; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.

Description

Construction method and system of heterogeneous computing system

Technical Field

The invention relates to the technical field of computers, in particular to a method and a system for constructing a heterogeneous computing system.

Background

With the development of cloud computing, big data and artificial intelligence technology application, the demand for computing power in the intelligent world is increasing at a speed ten times per year, the computing boundary is also extending, and the intelligent computing is beyond the reach of the data center and the terminal. For years, the mainstream intelligent computing platform is controlled by foreign manufacturers and open source communities, and most hardware manufacturers and intelligent computing frames are monopolized by some manufacturers, so that the development and cost control of intelligent computing are severely restricted.

How to effectively prevent monopoly is the problem which needs to be solved at present by ensuring the diversity of the application of the intelligent computing platform and the rapid and variable deployment environment and meeting the requirements of autonomous controllability, rapid system deployment, dynamic adjustment and the like.

Disclosure of Invention

The invention aims to provide a method and a system for constructing a heterogeneous computing system, which aim to solve the defects in the prior art, and the technical problem to be solved by the invention is realized by the following technical scheme.

In a first aspect, an embodiment of the present invention provides a method for building a heterogeneous computing system, where the method includes:

determining a task type corresponding to the service requirement according to different service requirements;

according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type;

training the neural network model by adopting a pre-established heterogeneous computing system and the training data to obtain a model file corresponding to the task type;

an XPU acceleration stack is adopted to convert the model file corresponding to the task type to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack;

and packaging the model deployment file and the configuration file by adopting the pre-established heterogeneous computing system to obtain different business services, and issuing the business services.

Optionally, the obtaining training data corresponding to the task type according to the task type includes:

receiving training data which is input by a user and corresponds to the task type according to the task type;

or

And acquiring training data corresponding to the task type according to a network crawling mode.

Optionally, the constructing a neural network model corresponding to the task type includes:

carrying out data cleaning, data labeling and data set type classification on the training data to obtain processed training data;

storing the processed training data in a pre-established database;

establishing a neural network model corresponding to the training data according to the processed training data;

and generating a calculation graph according to the neural network model.

Optionally, the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, the accelerator card, the operating system, and the deep learning framework component are of different types, and the pre-established heterogeneous computing system comprises at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.

Optionally, the training, by using a pre-established heterogeneous computing system, the neural network model by using training data corresponding to the task to obtain a model file corresponding to the task type includes:

and training the neural network model by adopting a pre-established heterogeneous computing system and adopting training data corresponding to the task to generate a model file corresponding to the task type, wherein the output format of the model file at least comprises one of an extensible computational graph module, a standard data type or a built-in operator.

Optionally, the converting the model file corresponding to the task type by using an XPU stack to obtain a model deployment file and a configuration file corresponding to the XPU stack includes:

adopting acceleration stacks corresponding to different acceleration cards to carry out inference of preset rules on the model files corresponding to the task types to obtain processing results corresponding to the neural network model, wherein the preset rules at least comprise one of a computation graph pruning algorithm, an operator fusion algorithm and a model INT8 quantization algorithm;

and compiling the processing result in a static compiling mode according to different heterogeneous computing systems to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack.

Optionally, the encapsulating, by using the pre-established heterogeneous computing system, the model deployment file and the configuration file to obtain different business services, and publishing the business services includes:

and for different pre-established heterogeneous computing systems, adopting a containerization deployment mode, packaging the model deployment file and the configuration to obtain different business servers, and issuing the business services.

Optionally, the method further comprises:

and taking the service as a remote calling port so that a remote control end can call the service to execute corresponding operation.

In a second aspect, an embodiment of the present invention provides a heterogeneous computing system, where the apparatus includes: the heterogeneous computing system at least comprises a processor, an accelerator card, an operating system and a deep learning framework component, wherein the processor and the accelerator card are respectively different types of board cards, and the heterogeneous computing system is used for executing the construction method of the heterogeneous computing system in the first aspect.

Optionally, the system further includes a remote control end, and the remote control end is configured to invoke various services generated by the heterogeneous computing system.

The embodiment of the invention has the following advantages:

according to the construction method of the heterogeneous computing system and the heterogeneous computing system, the task type corresponding to the business requirement is determined according to different business requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.

Drawings

FIG. 1 is a flow chart illustrating steps of an embodiment of a method for building a heterogeneous computing system in accordance with the present invention;

FIG. 2 is a flow chart illustrating steps of a method of building a heterogeneous computing system according to an embodiment of the present invention;

FIG. 3 is a block diagram of a heterogeneous computing system in accordance with the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for building a heterogeneous computing system according to the present invention is shown, where the method specifically includes the following steps:

s101, determining a task type corresponding to a service requirement according to different service requirements;

specifically, the heterogeneous computing system comprises hardware equipment and software carried on the hardware equipment, wherein the software carried on the hardware equipment comprises operating systems and deep learning framework components of different types, different manufacturers and different models.

Based on the heterogeneous computing system, a user can input different service requirements, and determine a task type corresponding to the service requirements according to the different service requirements, such as a service for image processing or a voice recognition service.

In practical application, the application fields of the service requirements to be met, such as images, natural languages, voices, recommendations, reinforcement learning and the like, are determined according to the service requirements input by the user, then specific service requirements are analyzed in combination with different application fields, a clear task type is determined, and further subsequent operations are executed in a heterogeneous computing system in combination with corresponding software and hardware.

S102, acquiring training data corresponding to the task type according to the task type, and constructing a neural network model corresponding to the task type;

specifically, after determining the task type required by the user, the heterogeneous computing system needs to acquire training data corresponding to the task type, and establish a neural network model corresponding to the task type according to the training data. Collecting data sets required by related tasks, and designing corresponding network models;

that is, after the task type is determined, further collecting data sets required by related tasks, and establishing a corresponding neural network model;

for example, if the task type is a face recognition task, training data corresponding to the face recognition task needs to be acquired, that is, a large number of different people and different angle face photos are collected as training data.

S103, training the neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type;

specifically, based on the pre-established heterogeneous computing system, training a neural network model by using acquired training data, namely a data set corresponding to a service type, to obtain a model file corresponding to a task type, wherein the model file is output according to a uniform model output format, and the uniform model output format comprises the following three components: (1) the method comprises the following steps of (1) defining an extensible computation graph model, (2) defining a standard data type, and (3) defining a built-in operator;

s104, converting the model file corresponding to the task type by adopting an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack;

specifically, after a training model is determined by a pre-established heterogeneous computing system, different XPU accelerator cards can be adopted to perform inference acceleration on the generated training model due to the fact that the heterogeneous computing system comprises the different XPU accelerator cards, specifically, inference acceleration modes such as operator fusion, calculation chart key values and model quantization are adopted, model operation model libraries or executable files corresponding to different XPUs are generated after different XPU acceleration stack optimization, and meanwhile, corresponding configuration files are generated;

and S105, encapsulating the model deployment file and the configuration file by adopting a pre-established heterogeneous computing system to obtain different business services, and issuing the business services.

Specifically, different types of model deployment files and configuration files are respectively encapsulated by a pre-established heterogeneous computing system to obtain different business services, and the business services are released.

Exemplarily, if the service is a face recognition service, the model deployment file and the configuration file corresponding to the face recognition are encapsulated to obtain a face recognition service, so that other users can directly call the service if the face recognition service is to be executed.

For example, if the service is a voice recognition service, the model deployment file and the configuration file corresponding to the voice recognition service are encapsulated to obtain a voice recognition service, so that other users can directly call the voice recognition service when using the voice recognition service.

According to the construction method of the heterogeneous computing system, the task type corresponding to the business requirement is determined according to different business requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.

The present invention further provides a supplementary description of the method for constructing a heterogeneous computing system according to the above embodiment.

Optionally, obtaining training data corresponding to the task type according to the task type includes:

receiving training data corresponding to the task type and input by a user according to the task type;

specifically, a user may prepare training data corresponding to a task type in advance and then input the training data into a pre-established heterogeneous computing system.

Or

Specifically, the web crawling method is to obtain training data corresponding to the task type through a program or a script capable of automatically capturing web information according to a preset rule.

Optionally, constructing a neural network model corresponding to the task type includes:

storing the processed training data in a pre-established database;

a computational graph is generated from the neural network model.

Specifically, a large amount of training data is acquired according to user supply or network crawling and other manners, the training data may include some data with inconsistent formats, and the data needs to be cleaned, that is, data consistency detection is performed on the training data, and data with invalid values is deleted and missing value data is determined. After the data cleaning is completed, effective data in the training data is labeled, for example, data at key point positions in the face recognition data is labeled, then the training data is classified according to different types, so that neural network models of different types can be trained subsequently, and the classified training data is stored in a database respectively, that is, a matched neural network model structure is established according to a data set type and a task type, so as to generate a calculation graph.

Wherein, the neural network model at least comprises: BP network, the BP network includes the multi-layer feedforward network, RBF (radial basis function) network, Hopfield (associative memory) network, SOM self-organizing feature mapping model or ART adaptive resonance theory network, quantum neural network. The present invention is not particularly limited in the embodiments.

Optionally, the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, the accelerator card, the operating system, and the deep learning framework component are of different types, the pre-established heterogeneous computing system comprising at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.

The processor is various types of cpus (central processing units), the accelerator card is an XPU, wherein the XPU may be a GPU, a TPU, an NPU, an APU, an FPU, an HPU, an IPU, an MPU, an RPU, a VPU, a WPU, an XPU, a ZPU, etc., and is not specifically limited in the embodiment of the present invention. In the embodiment of the invention, the CPU can be different types of chips of different manufacturers in China, and the CPU can be combined with different XPU acceleration cards to execute the scheme in the embodiment of the invention.

Optionally, training the neural network model by using a pre-established heterogeneous computing system and using training data corresponding to the task to obtain a model file corresponding to the task type, where the method includes:

Optionally, converting the model file corresponding to the task type by using an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack, including:

adopting acceleration stacks corresponding to different acceleration cards to carry out inference of preset rules on model files corresponding to task types to obtain processing results corresponding to the neural network model, wherein the preset rules at least comprise one of a computation graph pruning algorithm, an operator fusion algorithm and a model INT8 quantization algorithm;

specifically, the computation graph pruning algorithm specifically includes: the pruning algorithm prunes some subtrees from the bottom of the 'fully grown' decision tree, making the decision tree smaller (model simpler) and thus enabling more accurate prediction of unknown data. The CART pruning algorithm consists of two steps: firstly, continuously pruning from the bottom end of a decision tree T0 generated by a generation algorithm until a root node of T0 forms a subtree sequence { T0, T1, …, Tn }; and then testing the sub-tree sequences on the independent verification data sets through a cross verification method, and selecting the optimal sub-tree from the sub-tree sequences.

Specifically, an XPU acceleration stack conversion pre-training model is used to generate a corresponding model deployment file and configuration file, and the specific method is as follows:

different XPU accelerator cards have different acceleration stacks to convert the pre-training model, and the aim is to carry out inference optimization on the pre-training model to better adapt to XPU hardware, and the process comprises methods of computation graph pruning, operator fusion, model INT8 quantization and the like. Meanwhile, according to different deployment platforms, the difference of the platforms can be avoided by using a static compiling method, and one-time compiling multi-platform deployment is realized. And finally outputting the model deployment file and the configuration file of the corresponding XPU hardware.

Optionally, encapsulating the model deployment file and the configuration file by using a pre-established heterogeneous computing system to obtain different business services, and publishing the business services, including:

and for different pre-established heterogeneous computing systems, a containerization deployment mode is adopted to package the model deployment file and the configuration to obtain different business servers, and the business servers are issued.

Optionally, the method further comprises:

and taking the service as a remote calling port so that the remote control terminal can call the service to execute corresponding operation.

Packaging the different intelligent heterogeneous computing systems into corresponding services for issuing, and simultaneously supporting containerization deployment and remote API (application program interface) calling, wherein the specific method comprises the following steps:

aiming at different computing platform bottom layer environments, in order to realize rapid deployment of model application, a containerization deployment scheme is adopted, runtime errors caused by hardware differences are avoided to the greatest extent, meanwhile, a static compiling technology is adopted, the problem of repeated compiling of an intelligent heterogeneous computing system is avoided, and deployment difficulty is reduced. The containerized deployment scheme also supports remote API call functionality.

The generated model file and the configuration file are packaged into corresponding services for issuing according to different intelligent heterogeneous computing systems, and meanwhile containerization deployment and remote API calling are supported.

Fig. 2 is a flowchart illustrating steps of a method for building a heterogeneous computing system according to another embodiment of the present invention, where, as shown in fig. 2, the method for building a heterogeneous computing system includes:

s201, collecting user requirements and defining task types;

s202, collecting a data set and constructing a deep learning model;

s203, completing model training and unified format model storage based on a domestic intelligent heterogeneous computing system;

s204, completing reasoning acceleration and deployment engine output of the model according to different NPU acceleration stacks;

s205, according to software and hardware configuration of the intelligent heterogeneous computing system, the corresponding model files and the API remote calling interfaces are deployed in a container mode.

The embodiment of the invention provides an intelligent heterogeneous computing system development system based on domestic software and hardware, which determines the application fields of deep learning, such as image classification, target recognition, recommendation, voice, text and reinforcement learning, according to the actual task types; collecting data sets required by related tasks and designing corresponding network models; carrying out model training by using a domestic calculation frame and a corresponding domestic training XPU, and storing a training model according to a set model output format; the model is optimized aiming at the unused XPU acceleration stack and the CPU platform, and a corresponding model deployment file and a corresponding configuration document are generated; aiming at a target deployment platform, an intelligent application model is deployed, a model reasoning process is executed, and functions of containerization deployment, API remote access and the like are supported.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Another embodiment of the present invention provides a heterogeneous computing system, configured to execute the method for constructing a heterogeneous computing system provided in the foregoing embodiment.

Referring to fig. 3, a schematic structural diagram of a heterogeneous computing system according to the present invention is shown, where the heterogeneous computing system includes at least a processor, an accelerator card, an operating system, and a deep learning framework component, where the processor and the accelerator card are respectively different types of boards, and the heterogeneous computing system is configured to execute the above-mentioned method for building the heterogeneous computing system.

The method specifically comprises a model training module, an inference acceleration module and a service publishing module, wherein the model training module comprises a training set, a deep learning framework and an XPU heterogeneous cluster, and then the deep learning framework is trained according to the training set to obtain a neural network model;

the inference acceleration module comprises an XPU acceleration stack module and a model deployment file, and carries out acceleration inference on a neural network model through the XPU acceleration stack to generate the model deployment file and a configuration file;

and the service publishing module performs containerized deployment on the model deployment file and the configuration file and then serves as an API service interface for a remote terminal to use.

Illustratively, the heterogeneous computing system provided by the embodiment of the present invention is a domestic intelligent heterogeneous computing platform, that is, an autonomous controllable computing platform based on a domestic CPU, an XPU, an operating system, and a deep learning framework component. The CPU part supports the types of Feiteng 2000Plus and Shenwei 6 BCPU; the XPU part supports the palace to be Atlas series, Baidu Kunlun series, Bizhiluan SC5 series and Membranan MLU series, and the XPU accelerating cards supporting training are the palas 300T, Membranan MLU290, Baidu Kunlun K200 and the like; the domestic deep learning framework comprises hundred degrees PaddlePaddle, Hua MindSpore, Qing Hua Jitto and the like, and the functional adaptation and integrity test is completed aiming at the software and hardware.

Optionally, the system further includes a remote control end, and the remote control end is used for calling various services generated by the heterogeneous computing system.

According to the heterogeneous computing system of the embodiment, the task type corresponding to the service requirement is determined according to different service requirements; according to the task type, acquiring training data corresponding to the task type, and constructing a neural network model corresponding to the task type; training a neural network model by adopting a pre-established heterogeneous computing system and training data to obtain a model file corresponding to the task type; an XPU acceleration stack is adopted to convert the model files corresponding to the task types to obtain model deployment files and configuration files corresponding to the XPU acceleration stack; the method adopts a pre-established heterogeneous computing system to package the model deployment file and the configuration file to obtain different business services, and releases the business services, so that monopolized risks can be effectively prevented, and meanwhile, the requirements of autonomous control, system rapid deployment, dynamic adjustment and the like on the aspects of diversity of future intelligent application, rapid and variable deployment environment, software and hardware facilities of an information system to a bottom layer information foundation and the like are met.

It should be noted that the above detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. Furthermore, it will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than those illustrated or otherwise described herein.

Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "above … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may also be oriented in other different ways, such as by rotating it 90 degrees or at other orientations, and the spatially relative descriptors used herein interpreted accordingly.

In the foregoing detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like numerals typically identify like components, unless context dictates otherwise. The illustrated embodiments described in the detailed description and drawings are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of building a heterogeneous computing system, the method comprising:

2. The method of claim 1, wherein the obtaining training data corresponding to the task type according to the task type comprises:

or

3. The method of claim 2, wherein constructing the neural network model corresponding to the task type comprises:

storing the processed training data in a pre-established database;

and generating a calculation graph according to the neural network model.

4. The method of claim 3, wherein the pre-established heterogeneous computing system comprises at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor, accelerator card, operating system, and deep learning framework component are of different types, and wherein the pre-established heterogeneous computing system comprises at least one of image classification, object recognition, recommendation, speech, text, and reinforcement learning.

5. The method of claim 3, wherein the training the neural network model using the pre-established heterogeneous computing system using the training data corresponding to the task to obtain a model file corresponding to the task type comprises:

6. The method of claim 5, wherein the converting the model file corresponding to the task type by using an XPU acceleration stack to obtain a model deployment file and a configuration file corresponding to the XPU acceleration stack comprises:

7. The method according to claim 6, wherein the encapsulating the model deployment file and the configuration file with the pre-established heterogeneous computing system to obtain different business services and publishing the business services comprises:

8. The method of claim 7, further comprising:

9. A heterogeneous computing system, comprising at least a processor, an accelerator card, an operating system, and a deep learning framework component, wherein the processor and the accelerator card are respectively different types of boards, and the heterogeneous computing system is configured to perform the method for building the heterogeneous computing system according to any one of claims 1 to 8.

10. The system of claim 9, further comprising a remote control for invoking various services generated by the heterogeneous computing system.