WO2022012123A1 - Data processing method and apparatus, electronic device, and storage medium - Google Patents

Data processing method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022012123A1
WO2022012123A1 PCT/CN2021/092448 CN2021092448W WO2022012123A1 WO 2022012123 A1 WO2022012123 A1 WO 2022012123A1 CN 2021092448 W CN2021092448 W CN 2021092448W WO 2022012123 A1 WO2022012123 A1 WO 2022012123A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
algorithm
run
input data
target
Prior art date
Application number
PCT/CN2021/092448
Other languages
French (fr)
Chinese (zh)
Inventor
钟卫东
张晓帆
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022012123A1 publication Critical patent/WO2022012123A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • the present application relates to the field of computer technology, and more particularly, to a data processing method, apparatus, electronic device, and storage medium.
  • Algorithmic models such as neural network models, are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Some algorithmic models have massive parallelism, distributed storage and processing, self-organization, self-adaptation, and self-learning capabilities. However, in the process of running the neural network model in the related electronic equipment, there is still a problem that the running performance needs to be improved.
  • the present application proposes a data processing method, apparatus, electronic device and storage medium to improve the above problems.
  • the present application provides a data processing method, the method includes: acquiring model parameters of a model to be run; determining a target algorithm from a plurality of algorithms according to the model parameters; The running model is loaded into the corresponding processing unit to run the to-be-run model.
  • the present application provides a data processing device, the device includes: a parameter acquisition unit for acquiring model parameters of a model to be run; an algorithm determination unit for determining from a plurality of algorithms according to the model parameters A target algorithm; a model running unit, configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.
  • the present application provides an electronic device including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the above method.
  • the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is executed by a startup controller.
  • model parameters of a model to be run are obtained, and then a target algorithm is determined from a plurality of algorithms according to the model parameters, and then the target algorithm is determined based on the target algorithm.
  • the to-be-run model is loaded into the corresponding processing unit to run the to-be-run model.
  • FIG. 1 shows a flowchart of a data processing method proposed by an embodiment of the present application
  • FIG. 2 shows a flowchart of a data processing method proposed by another embodiment of the present application
  • FIG. 3 shows a flowchart of a data processing method proposed by still another embodiment of the present application.
  • FIG. 4 shows a flowchart of a data processing method proposed by another embodiment of the present application.
  • FIG. 5 shows a structural block diagram of a data processing apparatus proposed by an embodiment of the present application
  • FIG. 6 shows a structural block diagram of a data processing apparatus proposed by another embodiment of the present application.
  • FIG. 7 shows a structural block diagram of an electronic device of the present application for executing the data processing method according to an embodiment of the present application
  • FIG. 8 is a storage unit for storing or carrying a program code for implementing the data processing method according to the embodiment of the present application according to an embodiment of the present application.
  • Neural Networks are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Neural networks have massive parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities. A large number of operators are usually included in the neural algorithm model. Among them, it can be understood that an operator can be regarded as a part of the algorithm process in a neural algorithm model, and an operator can map a function into a function, or map a function into a number.
  • electronic devices run based on certain algorithms in the process of running a neural network model.
  • the related electronic devices all run the neural network model based on a fixed algorithm, so that no matter what the model parameters of the neural network model currently running on the electronic device are, they all run in a fixed manner, which in turn causes the electronic device to operate in a fixed manner.
  • the performance of the neural network model is poor, and it will also limit the performance of the neural network model itself.
  • the inventor proposes a data processing method, device, electronic device and storage medium that can improve the above problems in the present application.
  • the model parameters of the model to be run By acquiring the model parameters of the model to be run, and then determining the target algorithm from multiple algorithms according to the model parameters, Then, the to-be-run model is loaded into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.
  • the model to be run it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model.
  • a data processing method provided by an embodiment of the present application includes:
  • the model to be run in this embodiment is a model that will be loaded into the processing unit for running later.
  • the model to be run may be a neural network model called by the application.
  • the application may need to process some data during the running process, and the application can process the data by calling the neural network during this process.
  • an image processing application may need to perform image recognition, and then the image processing application can process the image by invoking the neural network model used for image recognition.
  • the electronic device may periodically perform a designated task.
  • the neural network model invoked by the electronic device during the execution of the specified task can be determined as the model to be run.
  • the specified task may be a task of predicting an application program to be run by the electronic device in the future, a task of performing video processing, a task of predicting user preferences of the electronic device, or a task of predicting the remaining power of the electronic device. task.
  • the model parameters in this embodiment may include one or more parameters such as input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model.
  • the input data splitting parameter represents whether the model supports splitting the input data.
  • the input data splitting parameter represents whether the model supports splitting the input data.
  • the output of the model is also a picture, and even if two output pictures are obtained after splitting the input picture as input data, the two output pictures can still be spliced into one, so for image enhancement
  • the class model can support splitting the input data.
  • the input data size characterizes the size of the storage space occupied by the input data that will be input to the model. For example, if the size of the image to be input to the model to be run is 1000*1000*3Byte, then the input data size is determined to be 1000*1000*3Byte. Among them, 1000*1000 is the product of resolution.
  • the number of layers whose number of included operators exceeds the operator threshold represents how many layers in the model contain operators that exceed the operator threshold.
  • a neural network model usually includes multiple layers, and each layer includes an operator.
  • a neural network model can include an input layer, a convolutional layer, and an output layer.
  • the number of layers of the model represents the number of layers in the model to be run. For example, for the aforementioned neural network model including an input layer, a convolution layer and an output layer, the number of layers corresponding to the model is 3 .
  • S120 Determine a target algorithm from a plurality of algorithms according to the model parameters.
  • model parameters of different models may be different, and different models may require different running modes to be run, so that the run models can have higher performance. Then, after acquiring the model parameters of the model to be run, the electronic device can determine an appropriate running algorithm as the target algorithm according to the model parameters.
  • the correspondence between the model parameters and the algorithm may be established in advance, and then the electronic device may determine the target algorithm corresponding to the model parameter of the current model to be run by querying the correspondence.
  • the model parameters may include input data splitting parameters, input data size and the number of layers of the model, then the electronic device may be configured with input data splitting parameters A, input data size A and model layers A corresponding to Algorithm a, input data splitting parameter B, input data size B and model layer number B correspond to algorithm b, input data splitting parameter A, input data size C and model layer number C correspond to algorithm c , if the model parameters of the model to be run are obtained including the input data splitting parameter A, the input data size A, and the number of layers of the model A, the meeting will determine algorithm a as the target algorithm from algorithm a, algorithm b, and algorithm c. If the model parameters of the model to be run are obtained, including the input data split parameter A, the input data size C, and the number of layers C of the model, the meeting will determine algorithm a as the target algorithm from
  • S130 Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
  • the processing unit included in the electronic device may be one or more of a CPU, a GPU (Graphics Processing Unit), a DSP (Digital Signal Process), and an NPU (Neural-network Processing Unit) .
  • the loading methods corresponding to different algorithms may be different.
  • the loading method corresponding to some target algorithms may be to load all the models to be run into the same processing unit for running, while the loading method corresponding to some target algorithms may be to split the to-be-run model into multiple Different parts are loaded into different processing units for operation, and in this way, it is beneficial to select suitable operation modes for different models, thereby improving the performance of the electronic device running models.
  • the performance of the operating model of the electronic device can be understood as the time-consuming of operating the model.
  • the corresponding can be understood as the consumption of operating the model. will be relatively shortened.
  • model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run.
  • a target algorithm is determined from a plurality of algorithms according to the model parameters
  • the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run.
  • a data processing method provided by an embodiment of the present application includes:
  • a configuration file may be correspondingly configured for each model, and the configuration file may store the model parameters of the static class among the model parameters of the model.
  • the model parameters of the static class can be understood as parameters inherent in the model itself, or can be understood as parameters that are not dynamically changed due to changes in input data.
  • the input data splitting parameters, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model listed in the foregoing embodiments are inherent parameters of the model itself.
  • the input data split parameters, the number of layers with the number of included operators exceeding the operator threshold, and the number of layers of the model can be stored in the configuration file.
  • the parameter of the input data size in the model parameters because it will change dynamically with the change of the input data size, it will be recognized as a parameter of the dynamic class.
  • the model parameters of the corresponding static class can be obtained through the configuration file corresponding to the model to be run, and the model parameters of the dynamic class of the input data size can be obtained through the actual input data, and then Take the model parameters of the static class and the model parameters of the dynamic class as full model parameters.
  • the storage space in the electronic device can include two types of storage space: disk and memory.
  • the disk can be used to store data for a longer time, but the rate at which the electronic device obtains data from the memory will be faster than The rate at which data is fetched from disk.
  • the electronic device can pre-load the model parameters of the static class in the configuration file into the memory, so that the subsequent judgment process can be faster. to obtain the required model parameters to further improve the running performance of the model.
  • the model parameters may correspond to parameter values, and the electronic device may determine the content specifically represented by the model parameters through the parameter values corresponding to the model parameters.
  • the parameter value corresponding to the input data splitting parameter may be 1 or 0. If the parameter value corresponding to the input data splitting parameter is 1, it indicates that the input data splitting is supported. If the parameter value corresponding to the sub-parameter is 0, it indicates that input data splitting is not supported.
  • Data Parallelism can be understood as running the same function in parallel with different data inputs. Parallel processing on separate threads ensures that the task can be distributed among the available processing units.
  • S221 If the size of the input data input to the to-be-run model is not greater than the first specified threshold, detect whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, or if the input data split parameter Indicates that input data splitting is not supported, and detects whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold.
  • the second specified threshold may be 20% to 30% of the total number of layers of the model.
  • the second specified threshold may be M ⁇ 20% to M ⁇ 30%.
  • Operator Parallelism can be understood as loading multiple fully parallel operators in the same layer of the model into one or more of the multiple processing units for parallel operation.
  • the third specified threshold may be 2, or may be an integer larger than 2.
  • the layer pipeline algorithm (Layer Pipelining) can be understood as loading multiple layers of the model into one or more of the multiple processing units respectively for parallel operation.
  • S250 Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
  • the loading the model to be run into the corresponding processing unit based on the target algorithm to run the model to be run includes: based on the target algorithm
  • the running model is split to obtain a plurality of subsections, wherein the splitting rules corresponding to different target algorithms are different; the subsections are respectively loaded into the corresponding processing units for running.
  • the neural network model it will include a plurality of operators, and then the data processing flow of the neural network model is completed by sequentially performing data processing through the plurality of operators. Then for different target algorithms, there can be different splitting rules. For example, for a data parallelization algorithm, the model can be split into multiple sub-sections with the same structure, and then the input data is also split and input into the multiple sub-sections for data parallelization processing. Among them, the same structure can be understood as the same type of layer structure included in the model.
  • the model to be run includes an input layer, a convolution layer, and an output layer. Among them, the input layer includes 4 operators, the convolution layer includes 8 operators, and the output layer also includes 4 operators.
  • the model is split based on the splitting rules corresponding to the data parallelization algorithm.
  • the sub-parts obtained by splitting will also include the input layer, the convolutional layer and the output layer, so as to achieve the same type of layer structure as the original model to be run. Only the number of operators included in each layer in the subsection will be less than the number of operators in each layer in the original model to be run.
  • the input layer of each sub-part may only include 2 operators, the convolution layer only includes 4 operators, and the output layer also includes only 2 operators.
  • the operators in the same layer can be split, in this case, the operators in the same layer will be distributed into different subsections, and Each subsection obtained by splitting can include partial operators in different layers.
  • the multi-layer structure included in the model to be run can be split in units of layers.
  • the sub-sections obtained by splitting include Some layers in the model to run.
  • the model to be run includes an input layer, a convolution layer, and an output layer
  • the input layer can be split into a subsection
  • the convolutional layer can be split into a subsection
  • the output layer can be split into a subsection. part.
  • each sub-part can be loaded into the corresponding processing unit for execution.
  • the processing unit includes a CPU and a GPU
  • sub-part A can be loaded into the CPU to run
  • sub-part B can be Loaded into the GPU to run.
  • the processing units that may be adapted to different operators will be different.
  • the processing unit adapted by the Conv2D operator may be a GPU or a dedicated AI acceleration chip.
  • the processing unit adapted by the ResizeBilinear operator may be a CPU. In this way, the operators included in the subsection can be identified, and then the processing unit adapted to the operator in the subsection can be used as the processing unit corresponding to the subsection.
  • the processing unit with the shortest total time consuming to run the multiple operators is used as the processing unit that includes the multiple operators.
  • the processing units corresponding to the subsections of multiple operators so that the overall model running speed can be improved.
  • the subsection includes an operator a, an operator b, and an operator c, where the processing unit configured for the operator a is the CPU, the processing unit adapted for the operator b is the GPU, and the processing unit for the operator c is the CPU.
  • the adapted processing unit is a dedicated AI acceleration chip
  • the total time t1 for the CPU to run operator a, operator b, and operator c can be obtained, and the total time consumption for the GPU to run operator a, operator b, and operator c can be obtained.
  • the total time t3 for the dedicated AI acceleration chip to run operator a, operator b and operator c is obtained.
  • the CPU can be used as a processor including operator a, operator b and operator c.
  • the processing unit corresponding to the subsection of subc.
  • model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run.
  • a target algorithm is determined from a plurality of algorithms according to the model parameters
  • the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run.
  • model parameters in this embodiment may include input data splitting parameters, input data size, the number of layers in which the number of included operators exceeds the operator threshold, and the number of layers of the model, and these specific parameters can be more accurate. Determines the running algorithm that is more suitable for the current model to be run, and further improves the running performance of the electronic device in the process of running the neural network model.
  • a data processing method provided by an embodiment of the present application includes:
  • S330 Split the to-be-run model based on the target algorithm to obtain a plurality of sub-parts, wherein different target algorithms correspond to different splitting rules.
  • the first target condition includes: an average data communication duration between the plurality of processing units is not greater than a duration threshold.
  • the average data communication duration T 2 can be calculated based on the following formula:
  • T 2ij is the data communication time between processing unit i and processing unit j
  • n is the number of times of communication.
  • the duration threshold may be the product of the average time consumption of multiple processing units and 0.05.
  • the time consumption may be inference time.
  • an algorithm other than the current target algorithm may be randomly selected as a new target algorithm, and then S330 and S340 are performed based on the new target algorithm.
  • the multiple algorithms include a data parallelization algorithm, an operator parallelization algorithm, an interlayer pipeline algorithm, and a non-parallelization algorithm
  • the currently determined target algorithm is an interlayer pipeline algorithm
  • the data can be obtained from the data.
  • One of the parallelized algorithms, operator-parallelized algorithms and non-parallelized algorithms is selected as the new target algorithm.
  • the selection order of a plurality of algorithms may be pre-configured, and when a target algorithm is re-selected, a new target algorithm is determined based on the selection order.
  • the configured selection order may be a data parallelization algorithm, an operator parallelization algorithm, an inter-layer pipeline algorithm, and a non-parallelization algorithm in sequence, then if the current target algorithm is an operator parallelization algorithm, When the target algorithm needs to be re-selected, the inter-layer pipeline algorithm in the next selection order corresponding to the operator parallelization algorithm can be used as the new target algorithm.
  • a data processing method provided by the present application in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation.
  • the target algorithm can be re-determined according to the real-time running situation during the running of the model, so that the running of the model can be more closely adapted to the current actual situation.
  • a data processing method provided by an embodiment of the present application includes:
  • S430 Split the to-be-run model based on the target algorithm to obtain a plurality of subsections, wherein different target algorithms have different splitting rules.
  • the second target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold.
  • the standard deviation can be calculated based on the following formula:
  • T 1 is the average time consumption of multiple processing units
  • T 1i is the time consumption of processing unit i.
  • each subsection may include some operators in the model to be run.
  • the ratio of each part of the multiple sub-parts can be understood as the ratio of the operators included in each of the multiple sub-parts.
  • it can be understood as adjusting the number of operators included in at least some of the subsections, so as to adjust the running duration of the processing units corresponding to each subsection.
  • subsection A includes 3 operators
  • subsection B includes 6 operators
  • subsection C includes 3 operators
  • the subsection may contain 3 operators.
  • subsection B includes 5 operators
  • subsection c still includes 3 operators.
  • the adjusted units may be different.
  • the model to be run is directly divided into multiple sub-parts in units of operators, then when adjusting the proportion of each sub-part, it will be carried out in units of operators. adjust.
  • the model to be run is directly divided into multiple sub-sections in units of layers, then when the proportion of each sub-section is adjusted, it will be adjusted in units of layers.
  • a data processing method provided by the present application in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation.
  • the model to be run can also be split based on the currently determined target algorithm to obtain multiple new sub-parts, thereby enabling the model to run more closely at the current level. adapted to the actual situation.
  • a data processing apparatus 500 provided by an embodiment of the present application, the apparatus 500 includes:
  • the parameter obtaining unit 510 is configured to obtain model parameters of the model to be run.
  • the algorithm determining unit 520 is configured to determine a target algorithm from a plurality of algorithms according to the model parameters.
  • the model running unit 530 is configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.
  • the model parameters include input data splitting parameters and input data size.
  • the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is greater than the first specified threshold, select from multiple In the algorithm, the data parallelization algorithm is determined as the target algorithm.
  • the model parameters include input data splitting parameters, input data size, and the number of layers whose number of included operators exceeds the operator threshold.
  • the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter characterizes that the input data splitting is not supported, and the number of layers whose number of included operators exceeds the operator threshold is greater than the second specified threshold , determine the operator parallelization algorithm from multiple algorithms as the target algorithm; or if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is not larger than the first A threshold is specified, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, and an operator parallelization algorithm is determined from a plurality of algorithms as the target algorithm.
  • the model parameters include input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model.
  • the algorithm determining unit 520 is specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the number of the first The second specified threshold value, and the number of layers of the model is greater than the third specified threshold value, the interlayer pipeline algorithm is determined as the target algorithm from multiple algorithms; or if the input data splitting parameter representation supports input data splitting, and the input to The size of the input data of the model to be run is not greater than the first specified threshold, and the number of layers where the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers of the model is greater than Thirdly, a threshold is specified, and an interlayer pipeline algorithm is determined from among the plurality of algorithms as the target algorithm.
  • the algorithm determination unit 520 is also specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and The number of layers of the model is not greater than the third specified threshold, and a non-parallelized algorithm is determined from a plurality of algorithms as the target algorithm.
  • the model running unit 530 is specifically configured to split the to-be-run model based on the target algorithm to obtain multiple subsections, wherein the splitting rules corresponding to different target algorithms are different; Each subsection is loaded into the corresponding processing unit for execution.
  • the device further includes:
  • a performance evaluation unit 540 configured to obtain the operational performance parameters corresponding to the model to be run; if the operational performance parameters do not meet the first target condition, reselect the target algorithm; if the operational performance parameters do not meet the second target condition, re-split the to-be-running model based on the current target algorithm to obtain multiple new sub-sections, and the proportions of each of the new multiple sub-sections are different from the proportions of each of the multiple sub-sections.
  • the first target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold.
  • the second target condition includes: the average data communication duration between the plurality of processing units is not greater than a duration threshold.
  • a data processing device acquires model parameters of a model to be run, then determines a target algorithm from a plurality of algorithms according to the model parameters, and then loads the model to be run into a corresponding process based on the target algorithm unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
  • an embodiment of the present application further provides another electronic device 200 that can execute the above-mentioned data processing method.
  • the electronic device 200 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other.
  • the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .
  • the processor 102 may include one or more cores for processing data.
  • the processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 200, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104.
  • the processor 102 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA programmable logic array
  • the processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used for rendering and drawing of the display content
  • the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.
  • the memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
  • the storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like.
  • the memory 104 stores an apparatus, for example, the apparatus may be the aforementioned apparatus 500 .
  • the network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device.
  • the network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. .
  • the network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network.
  • the aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network.
  • the network module 106 may exchange information with the base station.
  • FIG. 8 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 1100 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.
  • the computer-readable storage medium 1100 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 1100 has storage space for program code 1110 that performs any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.
  • a data processing method, device, electronic device and storage medium provided by the present application obtain model parameters of a model to be run, and then determine a target algorithm from a plurality of algorithms according to the model parameters, and then determine the target algorithm based on the model parameters.
  • the target algorithm loads the to-be-run model into the corresponding processing unit to run the to-be-run model.

Abstract

Disclosed are a data processing method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a model parameter of a model to be operated; determining a target algorithm from among a plurality of algorithms according to the model parameter; and loading said model to a corresponding processing unit on the basis of the target algorithm, so as to operate said model. Therefore, by means of such manner, after a model to be operated is determined, specifically which algorithm is selected to be the basis on which an algorithm to be operated is operated can then be determined by means of determining a model parameter, such that the operation of a model can better match the parameter of said model, thereby improving the performance in a model operation process.

Description

数据处理方法、装置、电子设备及存储介质Data processing method, device, electronic device and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年7月17日提交的申请号为202010693821.3的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文。This application claims priority to Chinese Application No. 202010693821.3 filed on Jul. 17, 2020, which is hereby incorporated by reference in its entirety for all purposes.
技术领域technical field
本申请涉及计算机技术领域,更具体地,涉及一种数据处理方法、装置、电子设备及存储介质。The present application relates to the field of computer technology, and more particularly, to a data processing method, apparatus, electronic device, and storage medium.
背景技术Background technique
算法模型,例如神经网络模型是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统。一些算法模型具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。但是,相关的电子设备在运行神经网络模型的过程中,还存在运行性能有待提升的问题。Algorithmic models, such as neural network models, are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Some algorithmic models have massive parallelism, distributed storage and processing, self-organization, self-adaptation, and self-learning capabilities. However, in the process of running the neural network model in the related electronic equipment, there is still a problem that the running performance needs to be improved.
发明内容SUMMARY OF THE INVENTION
鉴于上述问题,本申请提出了一种数据处理方法、装置、电子设备及存储介质,以改善上述问题。In view of the above problems, the present application proposes a data processing method, apparatus, electronic device and storage medium to improve the above problems.
第一方面,本申请提供了一种数据处理方法,所述方法包括:获取待运行模型的模型参数;根据所述模型参数从多个算法中确定目标算法;基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。In a first aspect, the present application provides a data processing method, the method includes: acquiring model parameters of a model to be run; determining a target algorithm from a plurality of algorithms according to the model parameters; The running model is loaded into the corresponding processing unit to run the to-be-run model.
第二方面,本申请提供了一种数据处理装置,所述装置包括:参数获取单元,用于获取待运行模型的模型参数;算法确定单元,用于根据所述模型参数从多个算法中确定目标算法;模型运行单元,用于基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。In a second aspect, the present application provides a data processing device, the device includes: a parameter acquisition unit for acquiring model parameters of a model to be run; an algorithm determination unit for determining from a plurality of algorithms according to the model parameters A target algorithm; a model running unit, configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.
第三方面,本申请提供了一种电子设备,包括处理器以及存储器;一个 或多个程序被存储在所述存储器中并被配置为由所述处理器执行以实现上述的方法。In a third aspect, the present application provides an electronic device including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the above method.
第四方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码被启动控制器运行时执行上述的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is executed by a startup controller.
本申请提供的一种数据处理方法、装置、电子设备及存储介质,获取待运行模型的模型参数,然后根据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。In a data processing method, device, electronic device and storage medium provided by the present application, model parameters of a model to be run are obtained, and then a target algorithm is determined from a plurality of algorithms according to the model parameters, and then the target algorithm is determined based on the target algorithm. The to-be-run model is loaded into the corresponding processing unit to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.
图1示出了本申请实施例提出的一种数据处理方法的流程图;FIG. 1 shows a flowchart of a data processing method proposed by an embodiment of the present application;
图2示出了本申请另一实施例提出的一种数据处理方法的流程图;FIG. 2 shows a flowchart of a data processing method proposed by another embodiment of the present application;
图3示出了本申请再一实施例提出的一种数据处理方法的流程图;FIG. 3 shows a flowchart of a data processing method proposed by still another embodiment of the present application;
图4示出了本申请又一实施例提出的一种数据处理方法的流程图;FIG. 4 shows a flowchart of a data processing method proposed by another embodiment of the present application;
图5示出了本申请实施例提出的一种数据处理装置的结构框图;FIG. 5 shows a structural block diagram of a data processing apparatus proposed by an embodiment of the present application;
图6示出了本申请另一实施例提出的一种数据处理装置的结构框图;FIG. 6 shows a structural block diagram of a data processing apparatus proposed by another embodiment of the present application;
图7示出了本申请的用于执行根据本申请实施例的数据处理方法的电子设备的结构框图;FIG. 7 shows a structural block diagram of an electronic device of the present application for executing the data processing method according to an embodiment of the present application;
图8是本申请实施例的用于保存或者携带实现根据本申请实施例的数据处理方法的程序代码的存储单元。FIG. 8 is a storage unit for storing or carrying a program code for implementing the data processing method according to the embodiment of the present application according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行 清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
算法模型,例如神经网络(Neural Networks,NN)是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统。神经网络具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。通常在神经算法模型中包括有大量的算子。其中,可以理解的是,算子可以看做是一个神经算法模型中的部分算法过程,算子可以把函数映成函数,或者把函数映成数。Algorithmic models such as Neural Networks (NN) are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Neural networks have massive parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities. A large number of operators are usually included in the neural algorithm model. Among them, it can be understood that an operator can be regarded as a part of the algorithm process in a neural algorithm model, and an operator can map a function into a function, or map a function into a number.
然而,发明人在研究中发现,相关的电子设备在运行神经网络模型的过程中,还存在运行性能有待提升的问题。例如,电子设备在运行神经网络模型的过程中会基于一定的算法运行。但是,相关的电子设备都是基于固定的算法运行神经网络模型,使得无论电子设备当前所运行的神经网络模型的模型参数有什么不同,都是基于固定的方式运行的,进而就造成了电子设备在运行神经网络模型时性能较差,也同时会对神经网络模型本身的性能造成限制。However, the inventor found in the research that in the process of running the neural network model of the related electronic equipment, there is still a problem that the running performance needs to be improved. For example, electronic devices run based on certain algorithms in the process of running a neural network model. However, the related electronic devices all run the neural network model based on a fixed algorithm, so that no matter what the model parameters of the neural network model currently running on the electronic device are, they all run in a fixed manner, which in turn causes the electronic device to operate in a fixed manner. The performance of the neural network model is poor, and it will also limit the performance of the neural network model itself.
因此,发明人提出了本申请中可以改善上述问题的数据处理方法、装置、电子设备及存储介质,通过获取待运行模型的模型参数,然后根据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。Therefore, the inventor proposes a data processing method, device, electronic device and storage medium that can improve the above problems in the present application. By acquiring the model parameters of the model to be run, and then determining the target algorithm from multiple algorithms according to the model parameters, Then, the to-be-run model is loaded into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
下面将结合附图具体描述本申请的各实施例。The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
请参阅图1,本申请实施例提供的一种数据处理方法,所述方法包括:Referring to FIG. 1, a data processing method provided by an embodiment of the present application includes:
S110:获取待运行模型的模型参数。S110: Obtain model parameters of the model to be run.
其中,本实施例中的待运行模型为后续会加载到处理单元进行运行的模型。在本实施例中可以有多种确定待运行模型的方式。The model to be run in this embodiment is a model that will be loaded into the processing unit for running later. In this embodiment, there may be various ways of determining the model to be run.
作为一种方式,待运行模型可以为被应用程序所调用的神经网络模型。需要说明的是,应用程序在运行过程中可能会需要对一些数据进行处理,在这个过程中应用程序可以通过调用神经网络来进行数据处理。例如,图像处理类的应用程序可能需要进行图像识别,进而该图像处理类的应用程序就可以通过调用用于进行图像识别的神经网络模型来对图像进行处理。As one approach, the model to be run may be a neural network model called by the application. It should be noted that the application may need to process some data during the running process, and the application can process the data by calling the neural network during this process. For example, an image processing application may need to perform image recognition, and then the image processing application can process the image by invoking the neural network model used for image recognition.
作为另外一种方式,电子设备可以周期性的执行指定的任务。在这种方式中,电子设备在执行该指定的任务过程中所调用的神经网络模型则可以被确定为待运行模型。可选的,该指定的任务可以为预测电子设备后续将要运行的应用程序的任务,可以为进行视频处理的任务,可以为预测电子设备的用户偏好的任务,还可以为预测电子设备的剩余电量的任务。Alternatively, the electronic device may periodically perform a designated task. In this way, the neural network model invoked by the electronic device during the execution of the specified task can be determined as the model to be run. Optionally, the specified task may be a task of predicting an application program to be run by the electronic device in the future, a task of performing video processing, a task of predicting user preferences of the electronic device, or a task of predicting the remaining power of the electronic device. task.
在可以通过前述方式确定待运行模型后,就可以获取到待运行模型的模型参数。本实施例中的模型参数可以包括有输入数据拆分参数、输入数据大小、所包括算子数量超过算子阈值的层数以及模型的层数等参数中的一个或者多个。After the model to be run can be determined in the foregoing manner, the model parameters of the model to be run can be obtained. The model parameters in this embodiment may include one or more parameters such as input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model.
其中,输入数据拆分参数表征模型是否支持将输入数据进行拆分。例如,对于图像分类模型,如果将作为输入数据的输入图像拆分成两部分后很可能会得到两个不同的分类结果,进而对于图像分类模型则不能支持将输入数据进行拆分。再例如,对于图像增强类模型,该模型的输出也是图片,进而即使将作为输入数据的输入图片拆分后得到两张输出图片,仍然可以将输出的两张图片拼接成一张,所以对于图像增强类模型就可以支持将输入数据进行拆分。Among them, the input data splitting parameter represents whether the model supports splitting the input data. For example, for an image classification model, if the input image as input data is split into two parts, two different classification results are likely to be obtained, and the image classification model cannot support splitting the input data. For another example, for an image enhancement model, the output of the model is also a picture, and even if two output pictures are obtained after splitting the input picture as input data, the two output pictures can still be spliced into one, so for image enhancement The class model can support splitting the input data.
输入数据大小表征将要输入到模型的输入数据所占用的存储空间的大小。例如,若获取到将要输入到待运行模型的图片大小为1000*1000*3Byte,那么则确定输入数据大小为1000*1000*3Byte。其中,1000*1000为分辨率的乘积。The input data size characterizes the size of the storage space occupied by the input data that will be input to the model. For example, if the size of the image to be input to the model to be run is 1000*1000*3Byte, then the input data size is determined to be 1000*1000*3Byte. Among them, 1000*1000 is the product of resolution.
所包括算子数量超过算子阈值的层数表征的是模型中具体有多少层所包括的算子超过了算子阈值。需要说明的是,神经网络模型通常会包括有多层,而每层又会包括算子。例如,神经网络模型可以包括有输入层、卷积层以及输出层等。类似的,模型的层数则是表征待运行模型具体有多少层,例如,对于前述所示的包括有输入层、卷积层以及输出层的神经网络模型,所对应的模型的层数为3。The number of layers whose number of included operators exceeds the operator threshold represents how many layers in the model contain operators that exceed the operator threshold. It should be noted that a neural network model usually includes multiple layers, and each layer includes an operator. For example, a neural network model can include an input layer, a convolutional layer, and an output layer. Similarly, the number of layers of the model represents the number of layers in the model to be run. For example, for the aforementioned neural network model including an input layer, a convolution layer and an output layer, the number of layers corresponding to the model is 3 .
S120:根据所述模型参数从多个算法中确定目标算法。S120: Determine a target algorithm from a plurality of algorithms according to the model parameters.
在本实施例中,不同的模型的模型参数可能是不同的,进而不同的模型则可能需要不同的运行方式来进行运行,以便可以使得所运行的模型能够有较高的性能体现。那么电子设备在获取到待运行模型的模型参数后,就可以根据模型参数来确定合适的进行运行的算法来作为目标算法。In this embodiment, model parameters of different models may be different, and different models may require different running modes to be run, so that the run models can have higher performance. Then, after acquiring the model parameters of the model to be run, the electronic device can determine an appropriate running algorithm as the target algorithm according to the model parameters.
作为一种方式,在本实施例中,可以预先建立模型参数与算法之间的对应关系,进而电子设备可以通过查询该对应关系来确定当前待运行模型的模型参数所对应的目标算法。示例性的,模型参数可以包括有输入数据拆分参数、输入数据大小以及模型的层数,则在电子设备中可以配置有输入数据拆分参数A、输入数据大小A以及模型的层数A对应与算法a,输入数据拆分参数B、输入数据大小B以及模型的层数B对应与算法b,输入数据拆分参数A、输入数据大小C以及模型的层数C对应与算法c的情况下,若获取到待运行模型的模型参数包括输入数据拆分参数A、输入数据大小A以及模型的层数A,会则会从算法a、算法b以及算法c中确定算法a为目标算法。若获取到待运行模型的模型参数包括输入数据拆分参数A、输入数据大小C以及模型的层数C,会则会从算法a、算法b以及算法c中确定算法c为目标算法。As a way, in this embodiment, the correspondence between the model parameters and the algorithm may be established in advance, and then the electronic device may determine the target algorithm corresponding to the model parameter of the current model to be run by querying the correspondence. Exemplarily, the model parameters may include input data splitting parameters, input data size and the number of layers of the model, then the electronic device may be configured with input data splitting parameters A, input data size A and model layers A corresponding to Algorithm a, input data splitting parameter B, input data size B and model layer number B correspond to algorithm b, input data splitting parameter A, input data size C and model layer number C correspond to algorithm c , if the model parameters of the model to be run are obtained including the input data splitting parameter A, the input data size A, and the number of layers of the model A, the meeting will determine algorithm a as the target algorithm from algorithm a, algorithm b, and algorithm c. If the model parameters of the model to be run are obtained, including the input data split parameter A, the input data size C, and the number of layers C of the model, the meeting will determine algorithm c as the target algorithm from algorithm a, algorithm b, and algorithm c.
S130:基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。S130: Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
需要说明的是,在本实施例中,电子设备所包括的处理单元可以为CPU、GPU(Graphics Processing Unit)、DSP(Digital Signal Process)以及NPU(Neural-network Processing Unit)中的一个或多个。而不同的算法所对应的加载方式可能会有所不同。示例性的,有的目标算法所对应的加载方式可能是将待运行模型均加载到同一个处理单元中运行,而有的目标算法所对应的加载方 式可能是将待运行模型拆分为多个部分,而将不同的部分加载到不同的处理单元中进行运行,进而通过这种方式,有利于为不同的模型选择适配的运行方式,进而提升电子设备运行模型的性能。It should be noted that, in this embodiment, the processing unit included in the electronic device may be one or more of a CPU, a GPU (Graphics Processing Unit), a DSP (Digital Signal Process), and an NPU (Neural-network Processing Unit) . The loading methods corresponding to different algorithms may be different. Exemplarily, the loading method corresponding to some target algorithms may be to load all the models to be run into the same processing unit for running, while the loading method corresponding to some target algorithms may be to split the to-be-run model into multiple Different parts are loaded into different processing units for operation, and in this way, it is beneficial to select suitable operation modes for different models, thereby improving the performance of the electronic device running models.
需要说明的是,在本申请实施例中电子设备运行模型的性能可以理解为运行模型的耗时,对应的,若电子设备运行模型的性能得到提升,那么所对应的可以理解为运行模型的耗时会相对缩短。It should be noted that, in the embodiment of the present application, the performance of the operating model of the electronic device can be understood as the time-consuming of operating the model. Correspondingly, if the performance of the operating model of the electronic device is improved, the corresponding can be understood as the consumption of operating the model. will be relatively shortened.
本申请提供的一种数据处理方法,获取待运行模型的模型参数,然后根据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。In a data processing method provided by the present application, model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
请参阅图2,本申请实施例提供的一种数据处理方法,所述方法包括:Referring to FIG. 2, a data processing method provided by an embodiment of the present application includes:
S210:获取待运行模型的模型参数。S210: Obtain model parameters of the model to be run.
在本申请的实施例中,作为一种方式,对于每一个模型可以对应配置有一个配置文件,在该配置文件中可以存储有模型的模型参数中静态类的模型参数。其中,静态类的模型参数可以理解为模型自身固有的参数,或者可以理解为不会因为输入数据的改变而动态改变的参数。In the embodiment of the present application, as an approach, a configuration file may be correspondingly configured for each model, and the configuration file may store the model parameters of the static class among the model parameters of the model. Among them, the model parameters of the static class can be understood as parameters inherent in the model itself, or can be understood as parameters that are not dynamically changed due to changes in input data.
例如,对于前述实施例中所列举的模型参数中的输入数据拆分参数、所包括算子数量超过算子阈值的层数以及模型的层数等为模型自身固有的参数,对于三个参数即使在输入数据改变的情况下依然会保持不变,进而可以将输入数据拆分参数、所包括算子数量超过算子阈值的层数以及模型的层数这三个参数存储在配置文件中。而对于模型参数中的输入数据大小这一参数,因为会随着输入数据大小的改变而动态改变,则会被识别为动态类的参数。则在确定待运行模型后,可以通过待运行模型对应的配置文件来获取到对应的静态类的模型参数,以及并通过实际的输入数据来获取到输入数据大小这一动态类的模型参数,进而将静态类的模型参数和动态类的模型参数作为完整的模型参数。For example, the input data splitting parameters, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model listed in the foregoing embodiments are inherent parameters of the model itself. For the three parameters, even if When the input data changes, it will remain unchanged, and then the input data split parameters, the number of layers with the number of included operators exceeding the operator threshold, and the number of layers of the model can be stored in the configuration file. For the parameter of the input data size in the model parameters, because it will change dynamically with the change of the input data size, it will be recognized as a parameter of the dynamic class. After the model to be run is determined, the model parameters of the corresponding static class can be obtained through the configuration file corresponding to the model to be run, and the model parameters of the dynamic class of the input data size can be obtained through the actual input data, and then Take the model parameters of the static class and the model parameters of the dynamic class as full model parameters.
需要说明的是,电子设备中的存储空间可以包括有磁盘以及内存两种存储空间,其中,磁盘可以用于更长时间的对数据进行存储,但是电子设备从内存中获取数据的速率会快于从磁盘中获取数据的速率。在这种情况下,电子设备可以在获取到待运行模型的配置文件后,将配置文件中的静态类的模型参数均预先加载到内存中,从而以便于在后续的判断过程中可以能够更快的获取到所需的模型参数,进一步的提升模型运行性能。It should be noted that the storage space in the electronic device can include two types of storage space: disk and memory. Among them, the disk can be used to store data for a longer time, but the rate at which the electronic device obtains data from the memory will be faster than The rate at which data is fetched from disk. In this case, after obtaining the configuration file of the model to be run, the electronic device can pre-load the model parameters of the static class in the configuration file into the memory, so that the subsequent judgment process can be faster. to obtain the required model parameters to further improve the running performance of the model.
S211:检测输入数据拆分参数是否表征支持输入数据拆分。S211: Detect whether the input data splitting parameter indicates that the input data splitting is supported.
需要说明的是,在本实施例中模型参数可以对应有参数值,进而电子设备可以通过模型参数所对应的参数值来确定模型参数具体所表征的内容。示例性的,输入数据拆分参数所对应的参数值可以为1也可以为0,其中,若输入数据拆分参数所对应的参数值为1,则表征支持输入数据拆分,若输入数据拆分参数所对应的参数值为0,则表征不支持输入数据拆分。It should be noted that, in this embodiment, the model parameters may correspond to parameter values, and the electronic device may determine the content specifically represented by the model parameters through the parameter values corresponding to the model parameters. Exemplarily, the parameter value corresponding to the input data splitting parameter may be 1 or 0. If the parameter value corresponding to the input data splitting parameter is 1, it indicates that the input data splitting is supported. If the parameter value corresponding to the sub-parameter is 0, it indicates that input data splitting is not supported.
S212:若所述输入数据拆分参数表征支持输入数据拆分,检测输入到所述待运行模型的输入数据大小是否大于第一指定阈值。S212: If the input data splitting parameter indicates that the input data splitting is supported, detect whether the size of the input data input to the to-be-run model is greater than a first specified threshold.
需要说明的是,其中的第一指定阈值可以为1024*1024*3Byte=3MByte。It should be noted that, the first specified threshold may be 1024*1024*3Byte=3MByte.
S213:若输入到所述待运行模型的输入数据大小大于第一指定阈值,从多个算法中确定数据并行化算法作为目标算法。S213: If the size of the input data input to the to-be-run model is greater than the first specified threshold, determine a data parallelization algorithm from the multiple algorithms as the target algorithm.
其中,数据并行化算法(Data Parallelism),可以理解为不同的数据输入以并行方式运行同一个函数,在基于数据并行化算法的方式中,可以把一个任务分解成不连续的单元,以便可以在单独的线程上并行处理,保证这个任务可以在可用的处理单元之间进行分配。Among them, the data parallelism algorithm (Data Parallelism) can be understood as running the same function in parallel with different data inputs. Parallel processing on separate threads ensures that the task can be distributed among the available processing units.
S221:若输入到所述待运行模型的输入数据大小不大于第一指定阈值,检测所包括算子数量超过算子阈值的层数是否大于第二指定阈值,或者若所述输入数据拆分参数表征不支持输入数据拆分,检测所包括算子数量超过算子阈值的层数是否大于第二指定阈值。S221: If the size of the input data input to the to-be-run model is not greater than the first specified threshold, detect whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, or if the input data split parameter Indicates that input data splitting is not supported, and detects whether the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold.
作为一种方式,该第二指定阈值可以为模型的总层数的20%到30%。示例性的,若总层数为M,那么该第二指定阈值可以为M×20%到M×30%。As one approach, the second specified threshold may be 20% to 30% of the total number of layers of the model. Exemplarily, if the total number of layers is M, the second specified threshold may be M×20% to M×30%.
S222:若所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法。S222: If the number of layers in which the number of the included operators exceeds the operator threshold is greater than the second specified threshold, determine an operator parallelization algorithm from the multiple algorithms as the target algorithm.
需要说明的是,算子并行化算法(Operator Parallelism)可以理解为将模型的同一个层中多个可完全并行算子分别加载到多个处理单元中的一个或者多个中进行并行化运行。It should be noted that the operator parallelism algorithm (Operator Parallelism) can be understood as loading multiple fully parallel operators in the same layer of the model into one or more of the multiple processing units for parallel operation.
S231:若所述所包括算子数量超过算子阈值的层数不大于第二指定阈值,检测所述模型的层数是否大于第三指定阈值。S231: If the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, detect whether the number of layers of the model is greater than the third specified threshold.
可选的,在本实施例中该第三指定阈值可以为2,或者可以为比2大的整数。Optionally, in this embodiment, the third specified threshold may be 2, or may be an integer larger than 2.
S232:若所述模型的层数大于第三指定阈值,从多个算法中确定层间流水线算法作为目标算法。S232: If the number of layers of the model is greater than the third specified threshold, determine an inter-layer pipeline algorithm from a plurality of algorithms as a target algorithm.
需要说明的是,层间流水线算法(Layer Pipelining)可以理解为将模型的多个层分别加载到多个处理单元中的一个或者多个中进行并行化运行。It should be noted that the layer pipeline algorithm (Layer Pipelining) can be understood as loading multiple layers of the model into one or more of the multiple processing units respectively for parallel operation.
S241:若所述模型的层数不大于第三指定阈值,从多个算法中确定非并行化算法作为目标算法。S241: If the number of layers of the model is not greater than the third specified threshold, determine a non-parallelized algorithm from among multiple algorithms as the target algorithm.
S250:基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。S250: Load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
作为一种方式,在本实施例中,所述基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型,包括:基于所述目标算法对所述待运行模型进行拆分,得到多个子部分,其中,不同目标算法所对应的拆分规则不同;将所述多个子部分分别加载到对应的处理单元进行运行。As a way, in this embodiment, the loading the model to be run into the corresponding processing unit based on the target algorithm to run the model to be run includes: based on the target algorithm The running model is split to obtain a plurality of subsections, wherein the splitting rules corresponding to different target algorithms are different; the subsections are respectively loaded into the corresponding processing units for running.
需要说明的是,对于神经网络模型而言,其会包括有多个算子,进而通过这多个算子依次进行数据处理而完成神经网络模型的数据处理流程。那么对于不同的目标算法则可以对应有不同的拆分规则。例如,对于数据并行化算法,则可以将模型拆分为多个结构一样的子部分,进而将输入数据也进行 拆分后分别输入到该多个子部分进行数据并行化处理。其中,结构一样可以理解为模型所包括的层结构的种类相同。示例性的,待运行模型包括有输入层、卷积层以及输出层。其中,输入层中包括有4个算子,卷积层中包括有8个算子,输出层中也包括有4个算子,在基于数据并行化算法所对应的拆分规则进行模型拆分的情况下,所拆分得到的子部分也会包括有输入层、卷积层以及输出层,进而实现与原来的待运行模型的层结构种类相同。只是在子部分中每层所包括的算子数量会少于原来的待运行模型中每层中的算子的数量。以拆分为两个子部分为例,则每个子部分的输入层可能只包括2个算子,卷积层中只包括4个算子,而输出层中也只包括2个算子。It should be noted that, for the neural network model, it will include a plurality of operators, and then the data processing flow of the neural network model is completed by sequentially performing data processing through the plurality of operators. Then for different target algorithms, there can be different splitting rules. For example, for a data parallelization algorithm, the model can be split into multiple sub-sections with the same structure, and then the input data is also split and input into the multiple sub-sections for data parallelization processing. Among them, the same structure can be understood as the same type of layer structure included in the model. Exemplarily, the model to be run includes an input layer, a convolution layer, and an output layer. Among them, the input layer includes 4 operators, the convolution layer includes 8 operators, and the output layer also includes 4 operators. The model is split based on the splitting rules corresponding to the data parallelization algorithm. In the case of , the sub-parts obtained by splitting will also include the input layer, the convolutional layer and the output layer, so as to achieve the same type of layer structure as the original model to be run. Only the number of operators included in each layer in the subsection will be less than the number of operators in each layer in the original model to be run. Taking the split into two sub-parts as an example, the input layer of each sub-part may only include 2 operators, the convolution layer only includes 4 operators, and the output layer also includes only 2 operators.
在基于算子并行化算法作为目标算法的情况下,则可以将同一层中的算子进行拆分,在这种情况下,同一层中的算子则会分布到不同的子部分中,且拆分所得到的每个子部分则可以包括有不同层中的部分算子。In the case of an operator-based parallelization algorithm as the target algorithm, the operators in the same layer can be split, in this case, the operators in the same layer will be distributed into different subsections, and Each subsection obtained by splitting can include partial operators in different layers.
在基于层间流水线算法作为目标算法的情况下,则可以将待运行模型所包括的多层结构以层为单位进行拆分,在这种情况下,拆分得到的多个子部分则会分别包括有待运行模型中的部分层。示例性的,待运行模型包括有输入层、卷积层以及输出层,则可以将输入层拆分为一个子部分,将卷积层拆分为一个子部分,将输出层拆分为一个子部分。When the target algorithm is based on the inter-layer pipeline algorithm, the multi-layer structure included in the model to be run can be split in units of layers. In this case, the sub-sections obtained by splitting include Some layers in the model to run. Exemplarily, if the model to be run includes an input layer, a convolution layer, and an output layer, the input layer can be split into a subsection, the convolutional layer can be split into a subsection, and the output layer can be split into a subsection. part.
在基于前述方式将待运行模型拆分为多个子部分后,就可以将各个子部分加载到所对应的处理单元中进行运行。示例性的,以基于层间流水线算法作为目标算法为例。在处理单元包括有CPU以及GPU的情况下,若待运行模型拆分成了子部分A以及子部分B,那么作为一种方式,可以将子部分A加载到CPU中运行,而将子部分B加载到GPU中运行。After the model to be run is divided into multiple sub-parts based on the foregoing method, each sub-part can be loaded into the corresponding processing unit for execution. Illustratively, take the target algorithm based on the inter-layer pipeline algorithm as an example. In the case where the processing unit includes a CPU and a GPU, if the model to be run is divided into sub-part A and sub-part B, then as a way, sub-part A can be loaded into the CPU to run, and sub-part B can be Loaded into the GPU to run.
需要说明的是,发明人在研究中发现对于不同的算子可能所适配的处理单元会有所不同。例如,对于Conv2D算子,其所进行的是神经网络矩阵类运算,那么Conv2D算子适配的处理单元可以为GPU或专用AI加速芯片。再例如,ResizeBilinear算子,其所进行的是图像类运算,那么ResizeBilinear算子所适配的处理单元可以为CPU。在这种方式下,可以对子部分所包括的算子进行识别,进而将子部分中算子所适配的处理单元作为该子部分所对应的处理单元。It should be noted that the inventor found in the research that the processing units that may be adapted to different operators will be different. For example, for the Conv2D operator, which performs neural network matrix operations, the processing unit adapted by the Conv2D operator may be a GPU or a dedicated AI acceleration chip. For another example, if the ResizeBilinear operator performs image-type operations, the processing unit adapted by the ResizeBilinear operator may be a CPU. In this way, the operators included in the subsection can be identified, and then the processing unit adapted to the operator in the subsection can be used as the processing unit corresponding to the subsection.
可选的,在子部分中有多个算子,且该多个算子所适配的处理单元不同的情况下,将运行该多个算子的总耗时最短的一个处理单元作为包括该多个算子的子部分所对应的处理单元,以便可以提升整体的模型运行速率。示例性的,若子部分中包括有算子a、算子b以及算子c,其中,算子a是配置的处理单元为CPU,算子b所适配的处理单元为GPU,算子c所适配的处理单元为专用AI加速芯片,则可以得到CPU运行算子a、算子b以及算子c的总耗时t1,得到GPU运行算子a、算子b以及算子c的总耗时t2,得到专用AI加速芯片运行算子a、算子b以及算子c的总耗时t3,在t1为最小的情况下,则可以将CPU作为包括有算子a、算子b以及算子c的子部分所对应的处理单元。Optionally, when there are multiple operators in the subsection, and the processing units to which the multiple operators are adapted are different, the processing unit with the shortest total time consuming to run the multiple operators is used as the processing unit that includes the multiple operators. The processing units corresponding to the subsections of multiple operators, so that the overall model running speed can be improved. Exemplarily, if the subsection includes an operator a, an operator b, and an operator c, where the processing unit configured for the operator a is the CPU, the processing unit adapted for the operator b is the GPU, and the processing unit for the operator c is the CPU. If the adapted processing unit is a dedicated AI acceleration chip, the total time t1 for the CPU to run operator a, operator b, and operator c can be obtained, and the total time consumption for the GPU to run operator a, operator b, and operator c can be obtained. At t2, the total time t3 for the dedicated AI acceleration chip to run operator a, operator b and operator c is obtained. In the case where t1 is the smallest, the CPU can be used as a processor including operator a, operator b and operator c. The processing unit corresponding to the subsection of subc.
本申请提供的一种数据处理方法,获取待运行模型的模型参数,然后根 据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。并且,在本实施例中的模型参数可以包括有输入数据拆分参数、输入数据大小、所包括算子数量超过算子阈值的层数以及模型的层数,进而通过这些具体的参数可以更加准确的确定出与当前待运行模型更加适配的运行算法,进而更进一步的提升电子设备在运行神经网络模型过程中的运行性能。In a data processing method provided by the present application, model parameters of a model to be run are obtained, a target algorithm is determined from a plurality of algorithms according to the model parameters, and the model to be run is loaded into a corresponding processing method based on the target algorithm. unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running. In addition, the model parameters in this embodiment may include input data splitting parameters, input data size, the number of layers in which the number of included operators exceeds the operator threshold, and the number of layers of the model, and these specific parameters can be more accurate. Determines the running algorithm that is more suitable for the current model to be run, and further improves the running performance of the electronic device in the process of running the neural network model.
请参阅图3,本申请实施例提供的一种数据处理方法,所述方法包括:Referring to FIG. 3, a data processing method provided by an embodiment of the present application includes:
S310:获取待运行模型的模型参数。S310: Obtain model parameters of the model to be run.
S320:根据所述模型参数从多个算法中确定目标算法。S320: Determine a target algorithm from multiple algorithms according to the model parameters.
S330:基于所述目标算法对所述待运行模型进行拆分,得到多个子部分,其中,不同目标算法所对应的拆分规则不同。S330: Split the to-be-run model based on the target algorithm to obtain a plurality of sub-parts, wherein different target algorithms correspond to different splitting rules.
S340:将所述多个子部分分别加载到对应的处理单元进行运行。S340: Load the multiple sub-sections into corresponding processing units respectively for execution.
S350:获取所述待运行模型对应的运行性能参数。S350: Acquire running performance parameters corresponding to the model to be run.
S360:若所述运行性能参数不满足第一目标条件,重新选择所述目标算法。S360: If the operating performance parameter does not meet the first target condition, reselect the target algorithm.
可选的,所述第一目标条件包括:多个所述处理单元之间的平均数据通信时长不大于时长阈值。可选的,可以基于下列公式来计算平均数据通信时长T 2Optionally, the first target condition includes: an average data communication duration between the plurality of processing units is not greater than a duration threshold. Optionally, the average data communication duration T 2 can be calculated based on the following formula:
Figure PCTCN2021092448-appb-000001
Figure PCTCN2021092448-appb-000001
其中,T 2ij为处理单元i和处理单元j之间的数据通信时间,n为通信次数。可选的,其中的时长阈值可以为多个处理单元耗时的平均值与0.05的乘积。该耗时可以为推理时间。 Among them, T 2ij is the data communication time between processing unit i and processing unit j, and n is the number of times of communication. Optionally, the duration threshold may be the product of the average time consumption of multiple processing units and 0.05. The time consumption may be inference time.
在本实施例中,可以有多种的重新选择目标算法的方式。作为一种方式,可以从当前目标算法以外的算法中随机选取一种算法作为新的目标算法,进而再基于新的目标算法执行S330以及S340。示例性的,在多个算法包括数据并行化算法、算子并行化算法、层间流水线算法以及非并行化算法,且当前所确定的目标算法为层间流水线算法的情况下,则可以从数据并行化算法、算子并行化算法以及非并行化算法中选择一个算法作为新的目标算法。In this embodiment, there may be various ways of reselection of the target algorithm. As a method, an algorithm other than the current target algorithm may be randomly selected as a new target algorithm, and then S330 and S340 are performed based on the new target algorithm. Exemplarily, when the multiple algorithms include a data parallelization algorithm, an operator parallelization algorithm, an interlayer pipeline algorithm, and a non-parallelization algorithm, and the currently determined target algorithm is an interlayer pipeline algorithm, the data can be obtained from the data. One of the parallelized algorithms, operator-parallelized algorithms and non-parallelized algorithms is selected as the new target algorithm.
再者,作为另外一种方式,可以预先配置有多个算法的选择顺序,进而在重新选择目标算法时,基于该选择顺序来确定新的目标算法。示例性的,所配置的选择顺序可以依次为数据并行化算法、算子并行化算法、层间流水线算法以及非并行化算法,那么在当前的目标算法为算子并行化算法的情况下,在需要重新选择目标算法时,则可以将算子并行化算法对应的下一个选择顺序的层间流水线算法作为新的目标算法。Furthermore, as another way, the selection order of a plurality of algorithms may be pre-configured, and when a target algorithm is re-selected, a new target algorithm is determined based on the selection order. Exemplarily, the configured selection order may be a data parallelization algorithm, an operator parallelization algorithm, an inter-layer pipeline algorithm, and a non-parallelization algorithm in sequence, then if the current target algorithm is an operator parallelization algorithm, When the target algorithm needs to be re-selected, the inter-layer pipeline algorithm in the next selection order corresponding to the operator parallelization algorithm can be used as the new target algorithm.
本申请提供的一种数据处理方法,通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行 算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。并且,在本实施例中,在模型运行过程中还可以根据实时的运行情况重新确定目标算法,进而使得能够更加紧密的使得模型的运行于当前的实际情况适配。A data processing method provided by the present application, in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation. Moreover, in this embodiment, the target algorithm can be re-determined according to the real-time running situation during the running of the model, so that the running of the model can be more closely adapted to the current actual situation.
请参阅图4,本申请实施例提供的一种数据处理方法,所述方法包括:Referring to FIG. 4, a data processing method provided by an embodiment of the present application includes:
S410:获取待运行模型的模型参数。S410: Obtain model parameters of the model to be run.
S420:根据所述模型参数从多个算法中确定目标算法。S420: Determine a target algorithm from multiple algorithms according to the model parameters.
S430:基于所述目标算法对所述待运行模型进行拆分,得到多个子部分,其中,不同目标算法所对应的拆分规则不同。S430: Split the to-be-run model based on the target algorithm to obtain a plurality of subsections, wherein different target algorithms have different splitting rules.
S440:将所述多个子部分分别加载到对应的处理单元进行运行。S440: Load the multiple sub-sections into corresponding processing units respectively for execution.
S450:获取所述待运行模型对应的运行性能参数。S450: Acquire running performance parameters corresponding to the model to be run.
S460:若所述运行性能参数不满足第二目标条件,重新基于当前的目标算法对所述待运行模型进行拆分,得到新的多个子部分,所述新的多个子部分各部分的比例与所述多个子部分中各个部分的比例不同。S460: If the running performance parameter does not meet the second target condition, re-split the to-be-running model based on the current target algorithm to obtain multiple new sub-parts, and the ratio of each part of the new multiple sub-parts is the same as the The proportions of each of the plurality of sub-portions are different.
可选的,所述第二目标条件包括:多个所述处理单元各自对应的运行时间的标准差不大于标准差阈值。可选的,可以基于下列公式来计算该标准差:Optionally, the second target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold. Optionally, the standard deviation can be calculated based on the following formula:
Figure PCTCN2021092448-appb-000002
Figure PCTCN2021092448-appb-000002
其中,T 1为多个处理单元耗时的平均值,T 1i为处理单元i的耗时。 Among them, T 1 is the average time consumption of multiple processing units, and T 1i is the time consumption of processing unit i.
如前述内容可知,在对待运行模型进行拆分得到的多个子部分中,每个子部分均可以包括有待运行模型中的部分算子。其中,多个子部分各部分的比例可以理解为多个子部分各自所包括的算子的比例。那么重新基于当前的目标算法对所述待运行模型进行拆分,则可以理解为调整至少部分子部分中所包括的算子的数量,以便实现调节各个子部分所对应的处理单元的运行时长。示例性的,子部分A中包括有3个算子,子部分B中包括有6个算子,子部分C中包括有3个算子,那么在重新进行拆分之后,子部分中则可能会包括有4个算子,子部分B中包括有5个算子,而子部分c中依然包括有3个算子。As can be seen from the foregoing content, in the multiple subsections obtained by splitting the model to be run, each subsection may include some operators in the model to be run. The ratio of each part of the multiple sub-parts can be understood as the ratio of the operators included in each of the multiple sub-parts. Then, to split the to-be-running model based on the current target algorithm again, it can be understood as adjusting the number of operators included in at least some of the subsections, so as to adjust the running duration of the processing units corresponding to each subsection. Exemplarily, subsection A includes 3 operators, subsection B includes 6 operators, and subsection C includes 3 operators, then after re-splitting, the subsection may contain 3 operators. There will be 4 operators, subsection B includes 5 operators, and subsection c still includes 3 operators.
其中,在目标算法不同的情况下,所进行调节的单位可能会有所不同。例如,在算子并行化算法为目标算法的情况下,是直接以算子为单位将待运行模型拆分为多个子部分,那么在调节各个子部分的比例时则会以算子为单位进行调节。再例如,在层间流水线算法为目标算法的情况下是直接以层为单位将待运行模型拆分为多个子部分的,那么在调节各个子部分的比例时则会以层为单位进行调节。Among them, in the case of different target algorithms, the adjusted units may be different. For example, when the operator parallelization algorithm is the target algorithm, the model to be run is directly divided into multiple sub-parts in units of operators, then when adjusting the proportion of each sub-part, it will be carried out in units of operators. adjust. For another example, in the case where the inter-layer pipeline algorithm is the target algorithm, the model to be run is directly divided into multiple sub-sections in units of layers, then when the proportion of each sub-section is adjusted, it will be adjusted in units of layers.
本申请提供的一种数据处理方法,通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。并且,在本实施例中,在模型运行过程中还可以重新基于当前所确定的目标算法进行待运行模型的拆分,得到新的多个子部分,进而使得能够更加紧密的使得模型的运行于当前的实际情况适配。A data processing method provided by the present application, in this way, after the model to be run is determined, it is possible to select which algorithm to run the algorithm to run based on by determining the model parameters, so that the running of the model can be more efficient Match the parameters of the model to be run to improve the performance of the model during operation. In addition, in this embodiment, during the model running process, the model to be run can also be split based on the currently determined target algorithm to obtain multiple new sub-parts, thereby enabling the model to run more closely at the current level. adapted to the actual situation.
请参阅图5,本申请实施例提供的一种数据处理装置500,所述装置500包括:Referring to FIG. 5, a data processing apparatus 500 provided by an embodiment of the present application, the apparatus 500 includes:
参数获取单元510,用于获取待运行模型的模型参数。The parameter obtaining unit 510 is configured to obtain model parameters of the model to be run.
算法确定单元520,用于根据所述模型参数从多个算法中确定目标算法。The algorithm determining unit 520 is configured to determine a target algorithm from a plurality of algorithms according to the model parameters.
模型运行单元530,用于基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。The model running unit 530 is configured to load the to-be-run model into a corresponding processing unit based on the target algorithm, so as to run the to-be-run model.
作为一种方式,所述模型参数包括输入数据拆分参数以及输入数据大小。在这种方式下,算法确定单元520,具体用于若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小大于第一指定阈值,从多个算法中确定数据并行化算法作为目标算法。As an approach, the model parameters include input data splitting parameters and input data size. In this manner, the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is greater than the first specified threshold, select from multiple In the algorithm, the data parallelization algorithm is determined as the target algorithm.
作为一种方式,所述模型参数包括输入数据拆分参数、输入数据大小以及所包括算子数量超过算子阈值的层数。在这种方式下,算法确定单元520,具体用于若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法;或者若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小不大于所述第一指定阈值,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法。In one way, the model parameters include input data splitting parameters, input data size, and the number of layers whose number of included operators exceeds the operator threshold. In this manner, the algorithm determination unit 520 is specifically configured to, if the input data splitting parameter characterizes that the input data splitting is not supported, and the number of layers whose number of included operators exceeds the operator threshold is greater than the second specified threshold , determine the operator parallelization algorithm from multiple algorithms as the target algorithm; or if the input data splitting parameter representation supports input data splitting, and the size of the input data input to the to-be-run model is not larger than the first A threshold is specified, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, and an operator parallelization algorithm is determined from a plurality of algorithms as the target algorithm.
作为一种方式,所述模型参数包括输入数据拆分参数、输入数据大小、所包括算子数量超过算子阈值的层数以及模型的层数。在这种方式下,算法确定单元520,具体用于若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数大于第三指定阈值,从多个算法中确定层间流水线算法作为目标算法;或者若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小不大于所述第一指定阈值,且所述所包括算子数量超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数大于第三指定阈值,从多个算法中确定层间流水线算法作为目标算法。In one way, the model parameters include input data splitting parameters, input data size, the number of layers whose number of included operators exceeds the operator threshold, and the number of layers of the model. In this manner, the algorithm determining unit 520 is specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the number of the first The second specified threshold value, and the number of layers of the model is greater than the third specified threshold value, the interlayer pipeline algorithm is determined as the target algorithm from multiple algorithms; or if the input data splitting parameter representation supports input data splitting, and the input to The size of the input data of the model to be run is not greater than the first specified threshold, and the number of layers where the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers of the model is greater than Thirdly, a threshold is specified, and an interlayer pipeline algorithm is determined from among the plurality of algorithms as the target algorithm.
算法确定单元520,还具体用于若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数不大于所述第三指定阈值,从多个算法中确定非并行化算法作为目标算法。The algorithm determination unit 520 is also specifically configured to, if the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and The number of layers of the model is not greater than the third specified threshold, and a non-parallelized algorithm is determined from a plurality of algorithms as the target algorithm.
作为一种方式,模型运行单元530,具体用于基于所述目标算法对所述待运行模型进行拆分,得到多个子部分,其中,不同目标算法所对应的拆分规则不同;将所述多个子部分分别加载到对应的处理单元进行运行。In one way, the model running unit 530 is specifically configured to split the to-be-run model based on the target algorithm to obtain multiple subsections, wherein the splitting rules corresponding to different target algorithms are different; Each subsection is loaded into the corresponding processing unit for execution.
作为一种方式,如图6所示,所述装置还包括:In one way, as shown in Figure 6, the device further includes:
性能评估单元540,用于获取所述待运行模型对应的运行性能参数;若所述运行性能参数不满足第一目标条件,重新选择所述目标算法;若所述运行性能参数不满足第二目标条件,重新基于当前的目标算法对所述待运行模型进行拆分,得到新的多个子部分,所述新的多个子部分各部分的比例与所述多个子部分中各个部分的比例不同。A performance evaluation unit 540, configured to obtain the operational performance parameters corresponding to the model to be run; if the operational performance parameters do not meet the first target condition, reselect the target algorithm; if the operational performance parameters do not meet the second target condition, re-split the to-be-running model based on the current target algorithm to obtain multiple new sub-sections, and the proportions of each of the new multiple sub-sections are different from the proportions of each of the multiple sub-sections.
可选的,所述第一目标条件包括:多个所述处理单元各自对应的运行时间的标准差不大于标准差阈值。所述第二目标条件包括:多个所述处理单元之间的平均数据通信时长不大于时长阈值。Optionally, the first target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold. The second target condition includes: the average data communication duration between the plurality of processing units is not greater than a duration threshold.
本申请提供的一种数据处理装置,获取待运行模型的模型参数,然后根据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。A data processing device provided by the present application acquires model parameters of a model to be run, then determines a target algorithm from a plurality of algorithms according to the model parameters, and then loads the model to be run into a corresponding process based on the target algorithm unit to run the model to be run. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
需要说明的是,本申请中装置实施例与前述方法实施例是相互对应的,装置实施例中具体的原理可以参见前述方法实施例中的内容,此处不再赘述。It should be noted that the apparatus embodiments in the present application correspond to the foregoing method embodiments, and the specific principles in the apparatus embodiments may refer to the content in the foregoing method embodiments, which will not be repeated here.
下面将结合图7对本申请提供的一种电子设备进行说明。An electronic device provided by the present application will be described below with reference to FIG. 7 .
请参阅图7,基于上述的数据处理方法、装置,本申请实施例还提供的另一种可以执行前述数据处理方法的电子设备200。电子设备200包括相互耦合的一个或多个(图中仅示出一个)处理器102、存储器104以及网络模块106。其中,该存储器104中存储有可以执行前述实施例中内容的程序,而处理器102可以执行该存储器104中存储的程序。Referring to FIG. 7 , based on the above-mentioned data processing method and apparatus, an embodiment of the present application further provides another electronic device 200 that can execute the above-mentioned data processing method. The electronic device 200 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other. Wherein, the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .
其中,处理器102可以包括一个或者多个用于处理数据的核。处理器102利用各种接口和线路连接整个电子设备200内的各个部分,通过运行或执行存储在存储器104内的指令、程序、代码集或指令集,以及调用存储在存储器104内的数据,执行电子设备200的各种功能和处理数据。可选地,处理器102可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器102可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器102中,单独通过一块通信芯片进行实现。The processor 102 may include one or more cores for processing data. The processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 200, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104. Various functions of the electronic device 200 and processing data. Optionally, the processor 102 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). A hardware form is implemented. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.
存储器104可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器104可用于存储指令、程序、代码、代码集或指令集。存储器104可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。存储器104中存储有装置,例如,该装置可以为前述的装置500。The memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like. The memory 104 stores an apparatus, for example, the apparatus may be the aforementioned apparatus 500 .
所述网络模块106用于接收以及发送电磁波,实现电磁波与电信号的相互转换,从而与通讯网络或者其他设备进行通讯,例如和音频播放设备进行通讯。所述网络模块106可包括各种现有的用于执行这些功能的电路元件,例如,天 线、射频收发器、数字信号处理器、加密/解密芯片、用户身份模块(SIM)卡、存储器等等。所述网络模块106可与各种网络如互联网、企业内部网、无线网络进行通讯或者通过无线网络与其他设备进行通讯。上述的无线网络可包括蜂窝式电话网、无线局域网或者城域网。例如,网络模块106可以与基站进行信息交互。The network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device. The network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. . The network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network. The aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 106 may exchange information with the base station.
请参考图8,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1100中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 8 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 1100 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.
计算机可读存储介质1100可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质1100包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1100具有执行上述方法中的任何方法步骤的程序代码1110的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1110可以例如以适当形式进行压缩。The computer-readable storage medium 1100 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium. The computer-readable storage medium 1100 has storage space for program code 1110 that performs any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.
综上所述,本申请提供的一种数据处理方法、装置、电子设备及存储介质,获取待运行模型的模型参数,然后根据所述模型参数从多个算法中确定目标算法,进而基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。从而通过这种方式使得在确定待运行模型后,可以再通过确定模型参数的方式来选择具体基于哪种算法来运行待运行算法,从而使得模型的运行能够更加匹配待运行模型的参数,以提升模型运行过程中的性能。To sum up, a data processing method, device, electronic device and storage medium provided by the present application obtain model parameters of a model to be run, and then determine a target algorithm from a plurality of algorithms according to the model parameters, and then determine the target algorithm based on the model parameters. The target algorithm loads the to-be-run model into the corresponding processing unit to run the to-be-run model. In this way, after determining the model to be run, it is possible to determine which algorithm to run the algorithm based on by determining the model parameters, so that the operation of the model can better match the parameters of the model to be run, so as to improve the performance of the model. The performance of the model while it is running.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method comprises:
    获取待运行模型的模型参数;Get the model parameters of the model to be run;
    根据所述模型参数从多个算法中确定目标算法;determining a target algorithm from a plurality of algorithms according to the model parameters;
    基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。The to-be-run model is loaded into a corresponding processing unit based on the target algorithm to run the to-be-run model.
  2. 根据权利要求1所述的方法,其特征在于,所述模型参数包括输入数据拆分参数以及输入数据大小;所述根据所述模型参数从多个算法中确定目标算法,包括:The method according to claim 1, wherein the model parameters include input data splitting parameters and input data size; and determining a target algorithm from a plurality of algorithms according to the model parameters, comprising:
    若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小大于第一指定阈值,从多个算法中确定数据并行化算法作为目标算法。If the input data splitting parameter indicates that the input data splitting is supported, and the size of the input data input to the to-be-run model is greater than the first specified threshold, a data parallelization algorithm is determined from a plurality of algorithms as the target algorithm.
  3. 根据权利要求2所述的方法,其特征在于,所述模型参数还包括所包括算子数量超过算子阈值的层数,所述根据所述模型参数从多个算法中确定目标算法还包括:The method according to claim 2, wherein the model parameters further include the number of layers in which the number of included operators exceeds the operator threshold, and the determining the target algorithm from a plurality of algorithms according to the model parameters further includes:
    若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法;If the input data splitting parameter representation does not support input data splitting, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, an operator parallelization algorithm is determined from multiple algorithms as the target algorithm;
    或者若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小不大于所述第一指定阈值,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法。Or if the input data splitting parameter indicates that the input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. When the number of layers is greater than the second specified threshold, the operator parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
  4. 根据权利要求3所述的方法,其特征在于,所述模型参数还包括模型的层数,所述根据所述模型参数从多个算法中确定目标算法还包括:The method according to claim 3, wherein the model parameter further comprises the number of layers of the model, and the determining the target algorithm from the plurality of algorithms according to the model parameter further comprises:
    若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数大于第三指定阈值,从多个算法中确定层间流水线算法作为目标算法;If the input data splitting parameter indicates that input data splitting is not supported, the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers in the model is greater than the third Specify the threshold, and determine the inter-layer pipeline algorithm from multiple algorithms as the target algorithm;
    或者若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小不大于所述第一指定阈值,且所述所包括算子数量 超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数大于第三指定阈值,从多个算法中确定层间流水线算法作为目标算法。Or if the input data splitting parameter indicates that input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. If the number of layers is not greater than the second specified threshold, and the number of layers of the model is greater than the third specified threshold, an interlayer pipeline algorithm is determined from a plurality of algorithms as the target algorithm.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述模型参数从多个算法中确定目标算法还包括:The method according to claim 4, wherein the determining a target algorithm from a plurality of algorithms according to the model parameters further comprises:
    若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数不大于所述第二指定阈值,且所述模型的层数不大于所述第三指定阈值,从多个算法中确定非并行化算法作为目标算法。If the input data splitting parameter indicates that the input data splitting is not supported, the number of layers in which the number of included operators exceeds the operator threshold is not greater than the second specified threshold, and the number of layers in the model is not greater than the number of layers in the model. The third specified threshold is used to determine the non-parallelized algorithm from among the plurality of algorithms as the target algorithm.
  6. 根据权利要求3-5任一所述的方法,其特征在于,所述第二指定阈值为模型的总层数的20%到30%。The method according to any one of claims 3-5, wherein the second specified threshold is 20% to 30% of the total number of layers of the model.
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述模型参数从多个算法中确定目标算法,包括:The method according to claim 1, wherein the determining a target algorithm from a plurality of algorithms according to the model parameters comprises:
    获取预先建立模型参数与算法之间的对应关系;Obtain the correspondence between the pre-established model parameters and the algorithm;
    根据所述模型参数以及所述对应关系,确定所述待运行模型的模型参数所对应的目标算法。According to the model parameters and the corresponding relationship, the target algorithm corresponding to the model parameters of the model to be run is determined.
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型,包括:The method according to any one of claims 1-7, wherein the loading the to-be-run model based on the target algorithm to a corresponding processing unit to run the to-be-run model comprises:
    基于所述目标算法对所述待运行模型进行拆分,得到多个子部分,其中,不同目标算法所对应的拆分规则不同;Splitting the to-be-run model based on the target algorithm to obtain a plurality of subsections, wherein the splitting rules corresponding to different target algorithms are different;
    将所述多个子部分分别加载到对应的处理单元进行运行。The plurality of subsections are respectively loaded into corresponding processing units for execution.
  9. 根据权利要求8所述的方法,其特征在于,所述将所述多个子部分分别加载到对应的处理单元进行运行之后还包括:The method according to claim 8, wherein after loading the plurality of sub-parts into corresponding processing units respectively for execution, the method further comprises:
    获取所述待运行模型对应的运行性能参数;obtaining the running performance parameters corresponding to the model to be run;
    若所述运行性能参数不满足第一目标条件,重新选择所述目标算法;If the operating performance parameter does not meet the first target condition, reselect the target algorithm;
    若所述运行性能参数不满足第二目标条件,重新基于当前的目标算法对所述待运行模型进行拆分,得到新的多个子部分,所述新的多个子部分各部分的比例与所述多个子部分中各个部分的比例不同。If the operational performance parameter does not meet the second target condition, the to-be-running model is re-split based on the current target algorithm to obtain multiple new sub-parts, and the ratio of each part of the new multiple sub-parts is the same as that of the The proportions of each of the subsections are different.
  10. 根据权利要求9所述的方法,其特征在于,所述重新选择所述目标算法,包括:The method according to claim 9, wherein the reselecting the target algorithm comprises:
    从当前目标算法以外的算法中随机选取一种算法作为新的目标算法。An algorithm is randomly selected from the algorithms other than the current target algorithm as the new target algorithm.
  11. 根据权利要求9所述的方法,其特征在于,所述第一目标条件包括:多个所述处理单元之间的平均数据通信时长不大于时长阈值。The method according to claim 9, wherein the first target condition comprises: an average data communication duration between a plurality of the processing units is not greater than a duration threshold.
  12. 根据权利要求11所述的方法,其特征在于,所述平均数据通信时长的计算方式包括:The method according to claim 11, wherein the calculation method of the average data communication duration comprises:
    Figure PCTCN2021092448-appb-100001
    Figure PCTCN2021092448-appb-100001
    其中,T 2ij为处理单元i和处理单元j之间的数据通信时间,n为通信次数。 Among them, T 2ij is the data communication time between processing unit i and processing unit j, and n is the number of times of communication.
  13. 根据权利要求9所述的方法,其特征在于,所述第二目标条件包括:多个所述处理单元各自对应的运行时间的标准差不大于标准差阈值。The method according to claim 9, wherein the second target condition comprises: a standard deviation of respective running times corresponding to a plurality of the processing units is not greater than a standard deviation threshold.
  14. 根据权利要求9所述的方法,其特征在于,所述方法还包括:通过下列公式计算所述标准差:The method according to claim 9, wherein the method further comprises: calculating the standard deviation by the following formula:
    Figure PCTCN2021092448-appb-100002
    Figure PCTCN2021092448-appb-100002
    其中,T 1为多个处理单元耗时的平均值,T 1i为处理单元i的耗时。 Among them, T 1 is the average time consumption of multiple processing units, and T 1i is the time consumption of processing unit i.
  15. 根据权利要求1-14任一所述的装置,其特征在于,所述处理单元可以为CPU、GPU、DSP、NPU或者专用AI加速芯片。The apparatus according to any one of claims 1-14, wherein the processing unit may be a CPU, a GPU, a DSP, an NPU, or a dedicated AI acceleration chip.
  16. 一种数据处理装置,其特征在于,所述装置包括:A data processing device, characterized in that the device comprises:
    参数获取单元,用于获取待运行模型的模型参数;The parameter obtaining unit is used to obtain the model parameters of the model to be run;
    算法确定单元,用于根据所述模型参数从多个算法中确定目标算法;an algorithm determining unit, configured to determine a target algorithm from a plurality of algorithms according to the model parameters;
    模型运行单元,用于基于所述目标算法将所述待运行模型加载到对应的处理单元,以运行所述待运行模型。A model running unit, configured to load the to-be-run model into a corresponding processing unit based on the target algorithm to run the to-be-run model.
  17. 根据权利要求16所述的装置,其特征在于,所述模型参数包括输入数据拆分参数以及输入数据大小,所述算法确定单元,具体用于若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小大于第一指定阈值,从多个算法中确定数据并行化算法作为目标算法。The apparatus according to claim 16, wherein the model parameters include input data splitting parameters and input data size, and the algorithm determining unit is specifically configured to support input data splitting if the input data splitting parameters represent support for input data splitting and the size of the input data input to the to-be-run model is greater than the first specified threshold, and a data parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
  18. 根据权利要求16所述的装置,其特征在于,所述模型参数还包括所包括算子数量超过算子阈值的层数;所述算法确定单元,具体用于若所述输入数据拆分参数表征不支持输入数据拆分,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法;The device according to claim 16, wherein the model parameters further include the number of layers whose number of included operators exceeds the operator threshold; and the algorithm determining unit is specifically configured to represent the input data splitting parameters if the input data is split. Input data splitting is not supported, and the number of layers in which the number of included operators exceeds the operator threshold is greater than the second specified threshold, and the operator parallelization algorithm is determined from multiple algorithms as the target algorithm;
    或者若所述输入数据拆分参数表征支持输入数据拆分,且输入到所述待运行模型的输入数据大小不大于所述第一指定阈值,且所述所包括算子数量超过算子阈值的层数大于第二指定阈值,从多个算法中确定算子并行化算法作为目标算法。Or if the input data splitting parameter indicates that input data splitting is supported, the size of the input data input to the model to be run is not greater than the first specified threshold, and the number of included operators exceeds the operator threshold. When the number of layers is greater than the second specified threshold, the operator parallelization algorithm is determined from the plurality of algorithms as the target algorithm.
  19. 一种电子设备,其特征在于,包括处理器以及存储器;An electronic device, comprising a processor and a memory;
    一个或多个程序被存储在所述存储器中并被配置为由所述处理器执行以实现权利要求1-15任一所述的方法。One or more programs are stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-15.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码被处理器运行时执行权利要求1-15任一所述的方法。A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, wherein the method of any one of claims 1-15 is executed when the program code is executed by a processor.
PCT/CN2021/092448 2020-07-17 2021-05-08 Data processing method and apparatus, electronic device, and storage medium WO2022012123A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010693821.3 2020-07-17
CN202010693821.3A CN111782402A (en) 2020-07-17 2020-07-17 Data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2022012123A1 true WO2022012123A1 (en) 2022-01-20

Family

ID=72763525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092448 WO2022012123A1 (en) 2020-07-17 2021-05-08 Data processing method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111782402A (en)
WO (1) WO2022012123A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001623A (en) * 2022-05-07 2022-09-02 通号城市轨道交通技术有限公司 Vehicle-mounted electronic map data verification method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782402A (en) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN111782403B (en) * 2020-07-17 2022-04-19 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN113157538B (en) * 2021-02-02 2023-04-18 西安天和防务技术股份有限公司 Spark operation parameter determination method, device, equipment and storage medium
CN114492737B (en) * 2021-12-31 2022-12-09 北京百度网讯科技有限公司 Data processing method, data processing device, electronic equipment, storage medium and program product
CN117349034B (en) * 2023-12-05 2024-02-23 创意信息技术股份有限公司 Hierarchical loading method and device for large language model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030167148A1 (en) * 2002-03-01 2003-09-04 Anastasio Thomas J. Method for determination of spatial target probability using a model of multisensory processing by the brain
US20060224533A1 (en) * 2005-03-14 2006-10-05 Thaler Stephen L Neural network development and data analysis tool
CN102253919A (en) * 2011-05-25 2011-11-23 中国石油集团川庆钻探工程有限公司 Concurrent numerical simulation method and system based on GPU and CPU cooperative computing
CN107798382A (en) * 2017-11-21 2018-03-13 北京地平线信息技术有限公司 For the method and apparatus for the characteristic being adapted in convolutional neural networks
CN110163367A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 A kind of model compression method and apparatus
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology
CN111782402A (en) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030167148A1 (en) * 2002-03-01 2003-09-04 Anastasio Thomas J. Method for determination of spatial target probability using a model of multisensory processing by the brain
US20060224533A1 (en) * 2005-03-14 2006-10-05 Thaler Stephen L Neural network development and data analysis tool
CN102253919A (en) * 2011-05-25 2011-11-23 中国石油集团川庆钻探工程有限公司 Concurrent numerical simulation method and system based on GPU and CPU cooperative computing
CN107798382A (en) * 2017-11-21 2018-03-13 北京地平线信息技术有限公司 For the method and apparatus for the characteristic being adapted in convolutional neural networks
CN110163367A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 A kind of model compression method and apparatus
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology
CN111782402A (en) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115001623A (en) * 2022-05-07 2022-09-02 通号城市轨道交通技术有限公司 Vehicle-mounted electronic map data verification method and device
CN115001623B (en) * 2022-05-07 2024-04-19 通号城市轨道交通技术有限公司 Method and device for checking vehicle-mounted electronic map data

Also Published As

Publication number Publication date
CN111782402A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2022012123A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN111368893B (en) Image recognition method, device, electronic equipment and storage medium
WO2022012118A1 (en) Data processing method and apparatus, electronic device, and storage medium
WO2022012119A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN110674936A (en) Neural network processing method and device, computer equipment and storage medium
CN110458294B (en) Model operation method, device, terminal and storage medium
US11740941B2 (en) Method of accelerating execution of machine learning based application tasks in a computing device
CN109598250B (en) Feature extraction method, device, electronic equipment and computer readable medium
US10031947B2 (en) Method and apparatus for performing a search operation on heterogeneous computing systems
WO2019001323A1 (en) Signal processing system and method
CN111124173A (en) Working state switching method and device of touch screen, mobile terminal and storage medium
US20210073566A1 (en) Rotation and scaling for optical character recognition using end-to-end deep learning
WO2015152876A1 (en) Hash table construction for utilization in recognition of target object in image
US10212291B2 (en) System, method, and non-transitory computer readable storage medium for image recognition based on convolutional neural networks
WO2021000411A1 (en) Neural network-based document classification method and apparatus, and device and storage medium
WO2022121701A1 (en) Image processing method and apparatus, electronic device, and storage medium
US20210241068A1 (en) Convolutional neural network
CN111813529B (en) Data processing method, device, electronic equipment and storage medium
CN114692745A (en) Data processing method and device, integrated chip, electronic equipment and storage medium
CN110837419B (en) Reasoning engine system and method based on elastic batch processing and electronic equipment
CN112329889A (en) Image processing method and device and electronic equipment
CN112070144A (en) Image clustering method and device, electronic equipment and storage medium
US20240005075A1 (en) Graphic neural network acceleration solution with customized board for solid-state drives
CN113536840A (en) Video classification method, device, equipment and storage medium
TWI775084B (en) Image recognition method, device, computer device and storage media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21843066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21843066

Country of ref document: EP

Kind code of ref document: A1