WO2022012123A1

WO2022012123A1 - Data processing method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022012123A1
Application number: PCT/CN2021/092448
Authority: WO
Inventors: 钟卫东; 张晓帆
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-07-17
Filing date: 2021-05-08
Publication date: 2022-01-20
Also published as: CN111782402A

Abstract

Disclosed are a data processing method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a model parameter of a model to be operated; determining a target algorithm from among a plurality of algorithms according to the model parameter; and loading said model to a corresponding processing unit on the basis of the target algorithm, so as to operate said model. Therefore, by means of such manner, after a model to be operated is determined, specifically which algorithm is selected to be the basis on which an algorithm to be operated is operated can then be determined by means of determining a model parameter, such that the operation of a model can better match the parameter of said model, thereby improving the performance in a model operation process.

Description

Data processing method, device, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Application No. 202010693821.3 filed on Jul. 17, 2020, which is hereby incorporated by reference in its entirety for all purposes.

technical field

The present application relates to the field of computer technology, and more particularly, to a data processing method, apparatus, electronic device, and storage medium.

Background technique

Algorithmic models, such as neural network models, are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Some algorithmic models have massive parallelism, distributed storage and processing, self-organization, self-adaptation, and self-learning capabilities. However, in the process of running the neural network model in the related electronic equipment, there is still a problem that the running performance needs to be improved.

SUMMARY OF THE INVENTION

In view of the above problems, the present application proposes a data processing method, apparatus, electronic device and storage medium to improve the above problems.

In a first aspect, the present application provides a data processing method, the method includes: acquiring model parameters of a model to be run; determining a target algorithm from a plurality of algorithms according to the model parameters; The running model is loaded into the corresponding processing unit to run the to-be-run model.

In a second aspect, the present application provides a data processing device, the device includes: a parameter acquisition unit for acquiring model parameters of a model to be run; an algorithm determination unit for determining from a plurality of algorithms according to the model parameters A target algorithm; a model running unit, configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.

In a third aspect, the present application provides an electronic device including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the above method.

In a fourth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is executed by a startup controller.

In a data processing method, device, electronic device and storage medium provided by the present application, model parameters of a model to be run are obtained, and then a target algorithm is determined from a plurality of algorithms according to the model parameters, and then the target algorithm is determined based on the target algorithm. The to-be-run model is loaded into the corresponding processing unit to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

FIG. 1 shows a flowchart of a data processing method proposed by an embodiment of the present application;

FIG. 2 shows a flowchart of a data processing method proposed by another embodiment of the present application;

FIG. 3 shows a flowchart of a data processing method proposed by still another embodiment of the present application;

FIG. 4 shows a flowchart of a data processing method proposed by another embodiment of the present application;

FIG. 5 shows a structural block diagram of a data processing apparatus proposed by an embodiment of the present application;

FIG. 6 shows a structural block diagram of a data processing apparatus proposed by another embodiment of the present application;

FIG. 7 shows a structural block diagram of an electronic device of the present application for executing the data processing method according to an embodiment of the present application;

FIG. 8 is a storage unit for storing or carrying a program code for implementing the data processing method according to the embodiment of the present application according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

Algorithmic models such as Neural Networks (NN) are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Neural networks have massive parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities. A large number of operators are usually included in the neural algorithm model. Among them, it can be understood that an operator can be regarded as a part of the algorithm process in a neural algorithm model, and an operator can map a function into a function, or map a function into a number.

However, the inventor found in the research that in the process of running the neural network model of the related electronic equipment, there is still a problem that the running performance needs to be improved. For example, electronic devices run based on certain algorithms in the process of running a neural network model. However, the related electronic devices all run the neural network model based on a fixed algorithm, so that no matter what the model parameters of the neural network model currently running on the electronic device are, they all run in a fixed manner, which in turn causes the electronic device to operate in a fixed manner. The performance of the neural network model is poor, and it will also limit the performance of the neural network model itself.

Therefore, the inventor proposes a data processing method, device, electronic device and storage medium that can improve the above problems in the present application. By acquiring the model parameters of the model to be run, and then determining the target algorithm from multiple algorithms according to the model parameters, Then, the to-be-run model is loaded into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to FIG. 1, a data processing method provided by an embodiment of the present application includes:

S110: Obtain model parameters of the model to be run.

The model to be run in this embodiment is a model that will be loaded into the processing unit for running later. In this embodiment, there may be various ways of determining the model to be run.

As one approach, the model to be run may be a neural network model called by the application. It should be noted that the application may need to process some data during the running process, and the application can process the data by calling the neural network during this process. For example, an image processing application may need to perform image recognition, and then the image processing application can process the image by invoking the neural network model used for image recognition.

Alternatively, the electronic device may periodically perform a designated task. In this way, the neural network model invoked by the electronic device during the execution of the specified task can be determined as the model to be run. Optionally, the specified task may be a task of predicting an application program to be run by the electronic device in the future, a task of performing video processing, a task of predicting user preferences of the electronic device, or a task of predicting the remaining power of the electronic device. task.

After the model to be run can be determined in the foregoing manner, the model parameters of the model to be run can be obtained. The model parameters in this embodiment may include one or more parameters such as input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model.

Among them, the input data splitting parameter represents whether the model supports splitting the input data. For example, for an image classification model, if the input image as input data is split into two parts, two different classification results are likely to be obtained, and the image classification model cannot support splitting the input data. For another example, for an image enhancement model, the output of the model is also a picture, and even if two output pictures are obtained after splitting the input picture as input data, the two output pictures can still be spliced into one, so for image enhancement The class model can support splitting the input data.

The input data size characterizes the size of the storage space occupied by the input data that will be input to the model. For example, if the size of the image to be input to the model to be run is 1000*1000*3Byte, then the input data size is determined to be 1000*1000*3Byte. Among them, 1000*1000 is the product of resolution.

The number of layers whose number of included operators exceeds the operator threshold represents how many layers in the model contain operators that exceed the operator threshold. It should be noted that a neural network model usually includes multiple layers, and each layer includes an operator. For example, a neural network model can include an input layer, a convolutional layer, and an output layer. Similarly, the number of layers of the model represents the number of layers in the model to be run. For example, for the aforementioned neural network model including an input layer, a convolution layer and an output layer, the number of layers corresponding to the model is 3 .

S120: Determine a target algorithm from a plurality of algorithms according to the model parameters.

In this embodiment, model parameters of different models may be different, and different models may require different running modes to be run, so that the run models can have higher performance. Then, after acquiring the model parameters of the model to be run, the electronic device can determine an appropriate running algorithm as the target algorithm according to the model parameters.

As a way, in this embodiment, the correspondence between the model parameters and the algorithm may be established in advance, and then the electronic device may determine the target algorithm corresponding to the model parameter of the current model to be run by querying the correspondence. Exemplarily, the model parameters may include input data splitting parameters, input data size and the number of layers of the model, then the electronic device may be configured with input data splitting parameters A, input data size A and model layers A corresponding to Algorithm a, input data splitting parameter B, input data size B and model layer number B correspond to algorithm b, input data splitting parameter A, input data size C and model layer number C correspond to algorithm c , if the model parameters of the model to be run are obtained including the input data splitting parameter A, the input data size A, and the number of layers of the model A, the meeting will determine algorithm a as the target algorithm from algorithm a, algorithm b, and algorithm c. If the model parameters of the model to be run are obtained, including the input data split parameter A, the input data size C, and the number of layers C of the model, the meeting will determine algorithm c as the target algorithm from algorithm a, algorithm b, and algorithm c.

S130: Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.

It should be noted that, in this embodiment, the processing unit included in the electronic device may be one or more of a CPU, a GPU (Graphics Processing Unit), a DSP (Digital Signal Process), and an NPU (Neural-network Processing Unit) . The loading methods corresponding to different algorithms may be different. Exemplarily, the loading method corresponding to some target algorithms may be to load all the models to be run into the same processing unit for running, while the loading method corresponding to some target algorithms may be to split the to-be-run model into multiple Different parts are loaded into different processing units for operation, and in this way, it is beneficial to select suitable operation modes for different models, thereby improving the performance of the electronic device running models.

It should be noted that, in the embodiment of the present application, the performance of the operating model of the electronic device can be understood as the time-consuming of operating the model. Correspondingly, if the performance of the operating model of the electronic device is improved, the corresponding can be understood as the consumption of operating the model. will be relatively shortened.

In a data processing method provided by the present application, model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.

Referring to FIG. 2, a data processing method provided by an embodiment of the present application includes:

S210: Obtain model parameters of the model to be run.

In the embodiment of the present application, as an approach, a configuration file may be correspondingly configured for each model, and the configuration file may store the model parameters of the static class among the model parameters of the model. Among them, the model parameters of the static class can be understood as parameters inherent in the model itself, or can be understood as parameters that are not dynamically changed due to changes in input data.

For example, the input data splitting parameters, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model listed in the foregoing embodiments are inherent parameters of the model itself. For the three parameters, even if When the input data changes, it will remain unchanged, and then the input data split parameters, the number of layers with the number of included operators exceeding the operator threshold, and the number of layers of the model can be stored in the configuration file. For the parameter of the input data size in the model parameters, because it will change dynamically with the change of the input data size, it will be recognized as a parameter of the dynamic class. After the model to be run is determined, the model parameters of the corresponding static class can be obtained through the configuration file corresponding to the model to be run, and the model parameters of the dynamic class of the input data size can be obtained through the actual input data, and then Take the model parameters of the static class and the model parameters of the dynamic class as full model parameters.

It should be noted that the storage space in the electronic device can include two types of storage space: disk and memory. Among them, the disk can be used to store data for a longer time, but the rate at which the electronic device obtains data from the memory will be faster than The rate at which data is fetched from disk. In this case, after obtaining the configuration file of the model to be run, the electronic device can pre-load the model parameters of the static class in the configuration file into the memory, so that the subsequent judgment process can be faster. to obtain the required model parameters to further improve the running performance of the model.

S211: Detect whether the input data splitting parameter indicates that the input data splitting is supported.

It should be noted that, in this embodiment, the model parameters may correspond to parameter values, and the electronic device may determine the content specifically represented by the model parameters through the parameter values corresponding to the model parameters. Exemplarily, the parameter value corresponding to the input data splitting parameter may be 1 or 0. If the parameter value corresponding to the input data splitting parameter is 1, it indicates that the input data splitting is supported. If the parameter value corresponding to the sub-parameter is 0, it indicates that input data splitting is not supported.

S212: If the input data splitting parameter indicates that the input data splitting is supported, detect whether the size of the input data input to the to-be-run model is greater than a first specified threshold.

It should be noted that, the first specified threshold may be 1024*1024*3Byte=3MByte.

S213: If the size of the input data input to the to-be-run model is greater than the first specified threshold, determine a data parallelization algorithm from the multiple algorithms as the target algorithm.

Among them, the data parallelism algorithm (Data Parallelism) can be understood as running the same function in parallel with different data inputs. Parallel processing on separate threads ensures that the task can be distributed among the available processing units.

S221: If the size of the input data input to the to-be-run model is not greater than the first specified threshold, detect whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, or if the input data split parameter Indicates that input data splitting is not supported, and detects whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold.

As one approach, the second specified threshold may be 20% to 30% of the total number of layers of the model. Exemplarily, if the total number of layers is M, the second specified threshold may be M×20% to M×30%.

S222: If the number of layers in which the number of the included operators exceeds the operator threshold is greater than the second specified threshold, determine an operator parallelization algorithm from the multiple algorithms as the target algorithm.

It should be noted that the operator parallelism algorithm (Operator Parallelism) can be understood as loading multiple fully parallel operators in the same layer of the model into one or more of the multiple processing units for parallel operation.

S231: If the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, detect whether the number of layers of the model is greater than the third specified threshold.

Optionally, in this embodiment, the third specified threshold may be 2, or may be an integer larger than 2.

S232: If the number of layers of the model is greater than the third specified threshold, determine an inter-layer pipeline algorithm from a plurality of algorithms as a target algorithm.

It should be noted that the layer pipeline algorithm (Layer Pipelining) can be understood as loading multiple layers of the model into one or more of the multiple processing units respectively for parallel operation.

S241: If the number of layers of the model is not greater than the third specified threshold, determine a non-parallelized algorithm from among multiple algorithms as the target algorithm.

S250: Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.

As a way, in this embodiment, the loading the model to be run into the corresponding processing unit based on the target algorithm to run the model to be run includes: based on the target algorithm The running model is split to obtain a plurality of subsections, wherein the splitting rules corresponding to different target algorithms are different; the subsections are respectively loaded into the corresponding processing units for running.

It should be noted that, for the neural network model, it will include a plurality of operators, and then the data processing flow of the neural network model is completed by sequentially performing data processing through the plurality of operators. Then for different target algorithms, there can be different splitting rules. For example, for a data parallelization algorithm, the model can be split into multiple sub-sections with the same structure, and then the input data is also split and input into the multiple sub-sections for data parallelization processing. Among them, the same structure can be understood as the same type of layer structure included in the model. Exemplarily, the model to be run includes an input layer, a convolution layer, and an output layer. Among them, the input layer includes 4 operators, the convolution layer includes 8 operators, and the output layer also includes 4 operators. The model is split based on the splitting rules corresponding to the data parallelization algorithm. In the case of , the sub-parts obtained by splitting will also include the input layer, the convolutional layer and the output layer, so as to achieve the same type of layer structure as the original model to be run. Only the number of operators included in each layer in the subsection will be less than the number of operators in each layer in the original model to be run. Taking the split into two sub-parts as an example, the input layer of each sub-part may only include 2 operators, the convolution layer only includes 4 operators, and the output layer also includes only 2 operators.

In the case of an operator-based parallelization algorithm as the target algorithm, the operators in the same layer can be split, in this case, the operators in the same layer will be distributed into different subsections, and Each subsection obtained by splitting can include partial operators in different layers.

When the target algorithm is based on the inter-layer pipeline algorithm, the multi-layer structure included in the model to be run can be split in units of layers. In this case, the sub-sections obtained by splitting include Some layers in the model to run. Exemplarily, if the model to be run includes an input layer, a convolution layer, and an output layer, the input layer can be split into a subsection, the convolutional layer can be split into a subsection, and the output layer can be split into a subsection. part.

After the model to be run is divided into multiple sub-parts based on the foregoing method, each sub-part can be loaded into the corresponding processing unit for execution. Illustratively, take the target algorithm based on the inter-layer pipeline algorithm as an example. In the case where the processing unit includes a CPU and a GPU, if the model to be run is divided into sub-part A and sub-part B, then as a way, sub-part A can be loaded into the CPU to run, and sub-part B can be Loaded into the GPU to run.

It should be noted that the inventor found in the research that the processing units that may be adapted to different operators will be different. For example, for the Conv2D operator, which performs neural network matrix operations, the processing unit adapted by the Conv2D operator may be a GPU or a dedicated AI acceleration chip. For another example, if the ResizeBilinear operator performs image-type operations, the processing unit adapted by the ResizeBilinear operator may be a CPU. In this way, the operators included in the subsection can be identified, and then the processing unit adapted to the operator in the subsection can be used as the processing unit corresponding to the subsection.

Optionally, when there are multiple operators in the subsection, and the processing units to which the multiple operators are adapted are different, the processing unit with the shortest total time consuming to run the multiple operators is used as the processing unit that includes the multiple operators. The processing units corresponding to the subsections of multiple operators, so that the overall model running speed can be improved. Exemplarily, if the subsection includes an operator a, an operator b, and an operator c, where the processing unit configured for the operator a is the CPU, the processing unit adapted for the operator b is the GPU, and the processing unit for the operator c is the CPU. If the adapted processing unit is a dedicated AI acceleration chip, the total time t1 for the CPU to run operator a, operator b, and operator c can be obtained, and the total time consumption for the GPU to run operator a, operator b, and operator c can be obtained. At t2, the total time t3 for the dedicated AI acceleration chip to run operator a, operator b and operator c is obtained. In the case where t1 is the smallest, the CPU can be used as a processor including operator a, operator b and operator c. The processing unit corresponding to the subsection of subc.

In a data processing method provided by the present application, model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running. In addition, the model parameters in this embodiment may include input data splitting parameters, input data size, the number of layers in which the number of included operators exceeds the operator threshold, and the number of layers of the model, and these specific parameters can be more accurate. Determines the running algorithm that is more suitable for the current model to be run, and further improves the running performance of the electronic device in the process of running the neural network model.

Referring to FIG. 3, a data processing method provided by an embodiment of the present application includes:

S310: Obtain model parameters of the model to be run.

S320: Determine a target algorithm from multiple algorithms according to the model parameters.

S330: Split the to-be-run model based on the target algorithm to obtain a plurality of sub-parts, wherein different target algorithms correspond to different splitting rules.

S340: Load the multiple sub-sections into corresponding processing units respectively for execution.

S350: Acquire running performance parameters corresponding to the model to be run.

S360: If the operating performance parameter does not meet the first target condition, reselect the target algorithm.

Optionally, the first target condition includes: an average data communication duration between the plurality of processing units is not greater than a duration threshold. Optionally, the average data communication duration T ₂ can be calculated based on the following formula:

Among them, T _2ij is the data communication time between processing unit i and processing unit j, and n is the number of times of communication. Optionally, the duration threshold may be the product of the average time consumption of multiple processing units and 0.05. The time consumption may be inference time.

In this embodiment, there may be various ways of reselection of the target algorithm. As a method, an algorithm other than the current target algorithm may be randomly selected as a new target algorithm, and then S330 and S340 are performed based on the new target algorithm. Exemplarily, when the multiple algorithms include a data parallelization algorithm, an operator parallelization algorithm, an interlayer pipeline algorithm, and a non-parallelization algorithm, and the currently determined target algorithm is an interlayer pipeline algorithm, the data can be obtained from the data. One of the parallelized algorithms, operator-parallelized algorithms and non-parallelized algorithms is selected as the new target algorithm.

Furthermore, as another way, the selection order of a plurality of algorithms may be pre-configured, and when a target algorithm is re-selected, a new target algorithm is determined based on the selection order. Exemplarily, the configured selection order may be a data parallelization algorithm, an operator parallelization algorithm, an inter-layer pipeline algorithm, and a non-parallelization algorithm in sequence, then if the current target algorithm is an operator parallelization algorithm, When the target algorithm needs to be re-selected, the inter-layer pipeline algorithm in the next selection order corresponding to the operator parallelization algorithm can be used as the new target algorithm.

A data processing method provided by the present application, in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation. Moreover, in this embodiment, the target algorithm can be re-determined according to the real-time running situation during the running of the model, so that the running of the model can be more closely adapted to the current actual situation.

Referring to FIG. 4, a data processing method provided by an embodiment of the present application includes:

S410: Obtain model parameters of the model to be run.

S420: Determine a target algorithm from multiple algorithms according to the model parameters.

S430: Split the to-be-run model based on the target algorithm to obtain a plurality of subsections, wherein different target algorithms have different splitting rules.

S440: Load the multiple sub-sections into corresponding processing units respectively for execution.

S450: Acquire running performance parameters corresponding to the model to be run.

S460: If the running performance parameter does not meet the second target condition, re-split the to-be-running model based on the current target algorithm to obtain multiple new sub-parts, and the ratio of each part of the new multiple sub-parts is the same as the The proportions of each of the plurality of sub-portions are different.

Optionally, the second target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold. Optionally, the standard deviation can be calculated based on the following formula:

Among them, T ₁ is the average time consumption of multiple processing units, and T _1i is the time consumption of processing unit i.

As can be seen from the foregoing content, in the multiple subsections obtained by splitting the model to be run, each subsection may include some operators in the model to be run. The ratio of each part of the multiple sub-parts can be understood as the ratio of the operators included in each of the multiple sub-parts. Then, to split the to-be-running model based on the current target algorithm again, it can be understood as adjusting the number of operators included in at least some of the subsections, so as to adjust the running duration of the processing units corresponding to each subsection. Exemplarily, subsection A includes 3 operators, subsection B includes 6 operators, and subsection C includes 3 operators, then after re-splitting, the subsection may contain 3 operators. There will be 4 operators, subsection B includes 5 operators, and subsection c still includes 3 operators.

Among them, in the case of different target algorithms, the adjusted units may be different. For example, when the operator parallelization algorithm is the target algorithm, the model to be run is directly divided into multiple sub-parts in units of operators, then when adjusting the proportion of each sub-part, it will be carried out in units of operators. adjust. For another example, in the case where the inter-layer pipeline algorithm is the target algorithm, the model to be run is directly divided into multiple sub-sections in units of layers, then when the proportion of each sub-section is adjusted, it will be adjusted in units of layers.

A data processing method provided by the present application, in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation. In addition, in this embodiment, during the model running process, the model to be run can also be split based on the currently determined target algorithm to obtain multiple new sub-parts, thereby enabling the model to run more closely at the current level. adapted to the actual situation.

Referring to FIG. 5, a data processing apparatus 500 provided by an embodiment of the present application, the apparatus 500 includes:

The parameter obtaining unit 510 is configured to obtain model parameters of the model to be run.

The algorithm determining unit 520 is configured to determine a target algorithm from a plurality of algorithms according to the model parameters.

The model running unit 530 is configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.

As an approach, the model parameters include input data splitting parameters and input data size. In this manner, the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is greater than the first specified threshold, select from multiple In the algorithm, the data parallelization algorithm is determined as the target algorithm.

In one way, the model parameters include input data splitting parameters, input data size, and the number of layers whose number of included operators exceeds the operator threshold. In this manner, the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter characterizes that the input data splitting is not supported, and the number of layers whose number of included operators exceeds the operator threshold is greater than the second specified threshold , determine the operator parallelization algorithm from multiple algorithms as the target algorithm; or if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is not larger than the first A threshold is specified, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, and an operator parallelization algorithm is determined from a plurality of algorithms as the target algorithm.

In one way, the model parameters include input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model. In this manner, the algorithm determining unit 520 is specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the number of the first The second specified threshold value, and the number of layers of the model is greater than the third specified threshold value, the interlayer pipeline algorithm is determined as the target algorithm from multiple algorithms; or if the input data splitting parameter representation supports input data splitting, and the input to The size of the input data of the model to be run is not greater than the first specified threshold, and the number of layers where the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers of the model is greater than Thirdly, a threshold is specified, and an interlayer pipeline algorithm is determined from among the plurality of algorithms as the target algorithm.

The algorithm determination unit 520 is also specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and The number of layers of the model is not greater than the third specified threshold, and a non-parallelized algorithm is determined from a plurality of algorithms as the target algorithm.

In one way, the model running unit 530 is specifically configured to split the to-be-run model based on the target algorithm to obtain multiple subsections, wherein the splitting rules corresponding to different target algorithms are different; Each subsection is loaded into the corresponding processing unit for execution.

In one way, as shown in Figure 6, the device further includes:

A performance evaluation unit 540, configured to obtain the operational performance parameters corresponding to the model to be run; if the operational performance parameters do not meet the first target condition, reselect the target algorithm; if the operational performance parameters do not meet the second target condition, re-split the to-be-running model based on the current target algorithm to obtain multiple new sub-sections, and the proportions of each of the new multiple sub-sections are different from the proportions of each of the multiple sub-sections.

Optionally, the first target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold. The second target condition includes: the average data communication duration between the plurality of processing units is not greater than a duration threshold.

A data processing device provided by the present application acquires model parameters of a model to be run, then determines a target algorithm from a plurality of algorithms according to the model parameters, and then loads the model to be run into a corresponding process based on the target algorithm unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.

It should be noted that the apparatus embodiments in the present application correspond to the foregoing method embodiments, and the specific principles in the apparatus embodiments may refer to the content in the foregoing method embodiments, which will not be repeated here.

An electronic device provided by the present application will be described below with reference to FIG. 7 .

Referring to FIG. 7 , based on the above-mentioned data processing method and apparatus, an embodiment of the present application further provides another electronic device 200 that can execute the above-mentioned data processing method. The electronic device 200 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other. Wherein, the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .

The processor 102 may include one or more cores for processing data. The processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 200, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104. Various functions of the electronic device 200 and processing data. Optionally, the processor 102 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). A hardware form is implemented. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.

The memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like. The memory 104 stores an apparatus, for example, the apparatus may be the aforementioned apparatus 500 .

The network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device. The network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. . The network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network. The aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 106 may exchange information with the base station.

Please refer to FIG. 8 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 1100 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.

The computer-readable storage medium 1100 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium. The computer-readable storage medium 1100 has storage space for program code 1110 that performs any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.

To sum up, a data processing method, device, electronic device and storage medium provided by the present application obtain model parameters of a model to be run, and then determine a target algorithm from a plurality of algorithms according to the model parameters, and then determine the target algorithm based on the model parameters. The target algorithm loads the to-be-run model into the corresponding processing unit to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A data processing method, characterized in that the method comprises:

Get the model parameters of the model to be run;

determining a target algorithm from a plurality of algorithms according to the model parameters;

The to-be-run model is loaded into a corresponding processing unit based on the target algorithm to run the to-be-run model.
The method according to claim 1, wherein the model parameters include input data splitting parameters and input data size; and determining a target algorithm from a plurality of algorithms according to the model parameters, comprising:

If the input data splitting parameter indicates that the input data splitting is supported, and the size of the input data input to the to-be-run model is greater than the first specified threshold, a data parallelization algorithm is determined from a plurality of algorithms as the target algorithm.
The method according to claim 2, wherein the model parameters further include the number of layers in which the number of included operators exceeds the operator threshold, and the determining the target algorithm from a plurality of algorithms according to the model parameters further includes:

If the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, an operator parallelization algorithm is determined from multiple algorithms as the target algorithm;

Or if the input data splitting parameter indicates that the input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. When the number of layers is greater than the second specified threshold, the operator parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
The method according to claim 3, wherein the model parameter further comprises the number of layers of the model, and the determining the target algorithm from the plurality of algorithms according to the model parameter further comprises:

If the input data splitting parameter indicates that input data splitting is not supported, the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers in the model is greater than the third Specify the threshold, and determine the inter-layer pipeline algorithm from multiple algorithms as the target algorithm;

Or if the input data splitting parameter indicates that input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. If the number of layers is not greater than the second specified threshold, and the number of layers of the model is greater than the third specified threshold, an interlayer pipeline algorithm is determined from a plurality of algorithms as the target algorithm.
The method according to claim 4, wherein the determining a target algorithm from a plurality of algorithms according to the model parameters further comprises:

If the input data splitting parameter indicates that the input data splitting is not supported, the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers in the model is not greater than the number of layers in the model. The third specified threshold is used to determine the non-parallelized algorithm from among the plurality of algorithms as the target algorithm.
The method according to any one of claims 3-5, wherein the second specified threshold is 20% to 30% of the total number of layers of the model.
The method according to claim 1, wherein the determining a target algorithm from a plurality of algorithms according to the model parameters comprises:

Obtain the correspondence between the pre-established model parameters and the algorithm;

According to the model parameters and the corresponding relationship, the target algorithm corresponding to the model parameters of the model to be run is determined.
The method according to any one of claims 1-7, wherein the loading the to-be-run model based on the target algorithm to a corresponding processing unit to run the to-be-run model comprises:

Splitting the to-be-run model based on the target algorithm to obtain a plurality of subsections, wherein the splitting rules corresponding to different target algorithms are different;

The plurality of subsections are respectively loaded into corresponding processing units for execution.
The method according to claim 8, wherein after loading the plurality of sub-parts into corresponding processing units respectively for execution, the method further comprises:

obtaining the running performance parameters corresponding to the model to be run;

If the operating performance parameter does not meet the first target condition, reselect the target algorithm;

If the operational performance parameter does not meet the second target condition, the to-be-running model is re-split based on the current target algorithm to obtain multiple new sub-parts, and the ratio of each part of the new multiple sub-parts is the same as that of the The proportions of each of the subsections are different.
The method according to claim 9, wherein the reselecting the target algorithm comprises:

An algorithm is randomly selected from the algorithms other than the current target algorithm as the new target algorithm.
The method according to claim 9, wherein the first target condition comprises: an average data communication duration between a plurality of the processing units is not greater than a duration threshold.
The method according to claim 11, wherein the calculation method of the average data communication duration comprises:

Among them, T 2ij is the data communication time between processing unit i and processing unit j, and n is the number of times of communication.
The method according to claim 9, wherein the second target condition comprises: a standard deviation of respective running times corresponding to a plurality of the processing units is not greater than a standard deviation threshold.
The method according to claim 9, wherein the method further comprises: calculating the standard deviation by the following formula:

Among them, T 1 is the average time consumption of multiple processing units, and T 1i is the time consumption of processing unit i.
The apparatus according to any one of claims 1-14, wherein the processing unit may be a CPU, a GPU, a DSP, an NPU, or a dedicated AI acceleration chip.
A data processing device, characterized in that the device comprises:

The parameter obtaining unit is used to obtain the model parameters of the model to be run;

an algorithm determining unit, configured to determine a target algorithm from a plurality of algorithms according to the model parameters;

A model running unit, configured to load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
The apparatus according to claim 16, wherein the model parameters include input data splitting parameters and input data size, and the algorithm determining unit is specifically configured to support input data splitting if the input data splitting parameters represent support for input data splitting and the size of the input data input to the to-be-run model is greater than the first specified threshold, and a data parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
The device according to claim 16, wherein the model parameters further include the number of layers whose number of included operators exceeds the operator threshold; and the algorithm determining unit is specifically configured to represent the input data splitting parameters if the input data is split. Input data splitting is not supported, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, and the operator parallelization algorithm is determined from multiple algorithms as the target algorithm;

Or if the input data splitting parameter indicates that input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. When the number of layers is greater than the second specified threshold, the operator parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
An electronic device, comprising a processor and a memory;

One or more programs are stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-15.
A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, wherein the method of any one of claims 1-15 is executed when the program code is executed by a processor.