WO2022012119A1 - 数据处理方法、装置、电子设备及存储介质 - Google Patents

数据处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022012119A1
WO2022012119A1 PCT/CN2021/092183 CN2021092183W WO2022012119A1 WO 2022012119 A1 WO2022012119 A1 WO 2022012119A1 CN 2021092183 W CN2021092183 W CN 2021092183W WO 2022012119 A1 WO2022012119 A1 WO 2022012119A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
processing unit
model
data
processing units
Prior art date
Application number
PCT/CN2021/092183
Other languages
English (en)
French (fr)
Inventor
钟卫东
谭维
张晓帆
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022012119A1 publication Critical patent/WO2022012119A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • the present application relates to the field of computer technology, and more particularly, to a data processing method, apparatus, electronic device, and storage medium.
  • Algorithmic models such as neural network models, are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Some algorithmic models have massive parallelism, distributed storage and processing, self-organization, self-adaptation, and self-learning capabilities. However, in the process of running the neural network model in the related electronic equipment, there is still a problem that the running performance needs to be improved.
  • the present application proposes a data processing method, apparatus, electronic device and storage medium to improve the above problems.
  • the present application provides a data processing method applied to an electronic device, the method comprising: acquiring a model to be run and multiple processing units included in the electronic device;
  • the to-be-run model is split to obtain a plurality of sub-sections and respective operating parameters corresponding to the plurality of sub-sections, and the running parameters include the running order corresponding to the sub-sections and the corresponding processing units;
  • the parts are respectively loaded into the respective corresponding processing units; the plurality of processing units are cooperatively controlled to run the respective corresponding sub-parts based on the running order, so as to process the data input to each of the sub-parts.
  • the present application provides a data processing apparatus, which runs on an electronic device, the apparatus includes: a data acquisition unit for acquiring a model to be run and a plurality of processing units included in the electronic equipment; a model processing unit is used to split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the running order corresponding to the sub-sections and all corresponding to the processing unit; a data loading unit for loading the plurality of sub-parts to the respective processing units respectively; a cooperative computing unit for cooperatively controlling the execution of the plurality of processing units based on the execution order Each corresponding subsection is used to process the data input to each of the subsections.
  • the present application provides an electronic device including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the above method.
  • the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is executed by a startup controller.
  • FIG. 1 shows a flowchart of a data processing method proposed by an embodiment of the present application
  • FIG. 2 shows a flowchart of a data processing method proposed by another embodiment of the present application
  • FIG. 3 shows a flowchart of a data processing method proposed by still another embodiment of the present application.
  • FIG. 4 shows a sequence diagram of executing a data processing method through multiple threads in an embodiment of the present application
  • FIG. 5 shows a schematic diagram of data output by the data processing method in the embodiment of the present application
  • FIG. 6 shows a flowchart of a data processing method proposed by another embodiment of the present application.
  • FIG. 7 shows a structural block diagram of a data processing apparatus proposed by another embodiment of the present application.
  • FIG. 8 shows a structural block diagram of a data processing apparatus proposed by still another embodiment of the present application.
  • FIG. 9 shows a structural block diagram of an electronic device of the present application for executing the data processing method according to an embodiment of the present application
  • FIG. 10 is a storage unit for storing or carrying a program code for implementing the data processing method according to the embodiment of the present application according to the embodiment of the present application.
  • Neural Networks are complex network systems formed by extensive interconnection of a large number of simple processing units (called neurons). Neural networks have massive parallelism, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities. A large number of operators are usually included in the neural algorithm model. Among them, it can be understood that an operator can be regarded as a part of the algorithm process in a neural algorithm model, and an operator can map a function into a function, or map a function into a number.
  • an electronic device will call a certain processing unit to run the neural network model in the process of running the neural network model based on the correlation method, and the data processing capability of the called processing unit directly determines the performance of the running model of the electronic device.
  • the processing unit needs to process the data to be processed each time before starting the next processing process, which is also relatively simple. It greatly limits the performance of electronic devices to run neural network models.
  • the processing unit splits the to-be-run model to obtain a plurality of sub-sections, their respective corresponding running sequences and the corresponding processing unit, and loads the plurality of sub-sections into their corresponding A processing unit, and based on the running order, the plurality of processing units are cooperatively controlled to run their respective sub-sections, so as to process the data input to each of the sub-sections.
  • a data processing method provided by an embodiment of the present application includes:
  • S110 Acquire a model to be run and multiple processing units included in the electronic device.
  • the model to be run in this embodiment is a model that will be loaded into the processing unit for running later.
  • the model to be run may be a neural network model called by the application.
  • the application may need to process some data during the running process, and the application can process the data by calling the neural network during this process.
  • an image processing application may need to perform image recognition, and then the image processing application can process the image by invoking the neural network model used for image recognition.
  • the electronic device may periodically perform a designated task.
  • the neural network model invoked by the electronic device during the execution of the specified task can be determined as the model to be run.
  • the specified task may be a task of predicting an application program to be run by the electronic device in the future, a task of performing video processing, a task of predicting user preferences of the electronic device, or a task of predicting the remaining power of the electronic device. task.
  • the processing unit is hardware capable of data processing in the electronic device.
  • the processing unit can be a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a DSP (Digital Signal Process), an NPU (Neural-network Processing Unit) or a dedicated AI acceleration chip.
  • the processing units included in different electronic devices may be different, and further in order to facilitate the subsequent split operation of the model to be run, it is possible to obtain which processing units the electronic device specifically includes.
  • the operating system of the electronic device can interact with the underlying hardware to obtain how many processing units the electronic device specifically includes and the types of processing units included, and store the obtained number and type of processing units in a specified In the system file, and further in the process of executing the data processing method provided by this embodiment, the multiple processing units included in the electronic device may be acquired by reading the specified system file.
  • S120 Split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the running order corresponding to the sub-sections and the corresponding running parameters of the processing unit.
  • the model to be run will include multiple layers, and each layer will include at least one operator, so that the model to be run can be regarded as composed of multiple operators.
  • the model to be run may be split to obtain multiple sub-parts. In this way, each subsection may include at least some of the operators in the model to be run.
  • the running parameters of each subsection can also be generated separately, so that the electronic device can obtain the running order of each subsection and which processing unit each subsection needs to be run by.
  • loading the subsection into the processing unit in the embodiment of the present application may be understood as configuring the processing unit corresponding to the subsection to run the operators included in the subsection.
  • model itself may be stored in the corresponding model file. In this way, if you need to run the model, you can directly read the corresponding model file to obtain the operators included in the model. Then, as a way, when the running model is divided into multiple sub-parts, it can be understood that the model file corresponding to the model to be run is divided into multiple sub-files, and the multiple sub-files are in one-to-one correspondence with the aforementioned multiple sub-parts.
  • each subsection corresponds to a running parameter.
  • the operating parameters corresponding to each subsection can be stored in the subfile corresponding to the subsection, so that the electronic device can read the subfile to obtain the operators included in the subsection, and can also read the subsection.
  • the operation parameters corresponding to the subsections are obtained, which improves the data acquisition efficiency.
  • S140 Cooperatively control the plurality of processing units to execute respective sub-sections based on the running order, so as to process the data input to each of the sub-sections.
  • the model usually processes the input data, and then outputs the processed data.
  • the input and output of each subsection can be interdependent.
  • the processing unit corresponding to each subsection can be called based on the running order corresponding to each subsection, so that each processing unit can process the input data corresponding to the corresponding subsection. to be processed.
  • the to-be-run model A may be split to obtain a sub-section a, a sub-section b, and a sub-section c.
  • the processing unit corresponding to the sub-section a is the CPU
  • the processing unit corresponding to the sub-section b is the GPU
  • the processing unit corresponding to the sub-section c is the NPU.
  • the running order of the subsection a is at the top
  • the running order of the subsection b is after the running order of the subsection a
  • the running order of the subsection c is after the running order of the subsection b.
  • the electronic device can preferentially call the CPU to run the subsection a, so as to process the data input to the subsection a, and obtain the output data of the subsection a.
  • the output data of the subsection a is the input data to subsection b. Then the electronic device will call the GPU to run the subsection b to process the output data of the subsection a, and obtain the output data of the subsection b. It can be understood that the output data of the subsection b is the data input to the subsection c. Then, the electronic device will call the NPU to process the output data of the subsection b to obtain the final output data.
  • a model to be run and multiple processing units included in the electronic device are obtained, and then the model to be run is split based on the multiple processing units to obtain multiple subsections and the respective running sequences of the multiple sub-sections and the corresponding processing units, respectively loading the multiple sub-sections into the respective processing units, and cooperatively controlling the multiple processing units based on the running sequences
  • the respective subsections are run to process the data input to each of the subsections. Therefore, it is possible to determine how to split the to-be-running model according to the currently existing processing units, and then load the split sub-parts into the corresponding processing units, so that the multiple processing units can cooperate to run the to-be-running model.
  • Running the model improves the running performance of the electronic device during the running of the model.
  • a data processing method provided by an embodiment of the present application includes:
  • S210 Acquire the model to be run and multiple processing units included in the electronic device.
  • S220 Acquire the number of the multiple processing units.
  • S230 Split the to-be-run model based on the quantity to obtain a plurality of subsections whose number matches the quantity and respective operating parameters corresponding to the plurality of subsections.
  • the model to be run can be split based on a data parallelization algorithm.
  • the model can be split into multiple sub-sections with the same structure, and then the input data is also split and input to the multiple sub-sections for parallel data processing.
  • the same structure can be understood as the same type of layer structure included in the model.
  • the model to be run includes an input layer, a convolution layer, and an output layer.
  • the input layer includes 4 operators
  • the convolution layer includes 8 operators
  • the output layer also includes 4 operators.
  • the model is split based on the splitting rules corresponding to the data parallelization algorithm.
  • the sub-parts obtained by splitting will also include the input layer, the convolutional layer and the output layer, so as to achieve the same type of layer structure as the original model to be run. Only the number of operators included in each layer in the subsection will be less than the number of operators in each layer in the original model to be run.
  • the input layer of each sub-part may only include 2 operators
  • the convolution layer only includes 4 operators
  • the output layer also includes only 2 operators.
  • the model to be run can be split based on an operator parallelization algorithm.
  • the operators in the same layer can be split, in this case, the operators in the same layer will be distributed into different subsections, and each subsection obtained by splitting Then some operators in different layers can be included.
  • the to-be-running model can be split based on an inter-layer pipeline algorithm.
  • the multi-layer structure included in the model to be run can be split in units of layers.
  • the multiple subsections obtained by splitting will respectively include some layers in the model to be run.
  • the model to be run includes an input layer, a convolution layer, and an output layer
  • the input layer can be split into a subsection
  • the convolutional layer can be split into a subsection
  • the output layer can be split into a subsection. part.
  • S240 Load the multiple sub-sections into the respective processing units.
  • S250 Cooperatively control the plurality of processing units to execute respective sub-sections based on the running order, so as to process the data input to each of the sub-sections.
  • the method provided by this embodiment further includes: acquiring the correspondence between the operator and the adaptation processing unit.
  • splitting the model to be run based on the quantity to obtain a plurality of subsections whose number matches the quantity and respective operating parameters corresponding to the plurality of subsections includes: based on the quantity and the corresponding relationship is used to split the to-be-run model to obtain a plurality of sub-sections whose number matches the number and respective operating parameters corresponding to the plurality of sub-sections.
  • the corresponding relationship between the operator and the adaptation processing unit may be as shown in the following table:
  • the above table stores the calculation type corresponding to each operator, the suitable processing unit, and the running time in the suitable processing unit.
  • the corresponding calculation type is neural network matrix operation
  • the suitable processing unit is GPU and dedicated AI acceleration chip.
  • the running time is 3ms.
  • the corresponding calculation type is mathematical operation
  • the suitable processing units are GPU and CPU.
  • the running time in GPU is 4ms
  • the running time in CPU is 4ms.
  • a data processing method provided by this application makes it possible to determine how to split a to-be-running model according to the number of currently existing processing units, and then load multiple sub-parts obtained by splitting into corresponding processing units , so that the multiple processing units can cooperate to run the model to be run, which improves the running performance of the electronic device in the process of running the model. Moreover, in this embodiment, because the number of parts to be run is determined according to the number of processing units, the number of the divided sub-parts can be more consistent with the actual number of the electronic equipment.
  • the processing unit is adapted to further improve the running performance.
  • a data processing method provided by an embodiment of the present application includes:
  • S310 Acquire the model to be run and multiple processing units included in the electronic device.
  • S320 Split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the running order corresponding to the sub-sections and the corresponding running parameters of the processing unit.
  • returning the data output by the second processing unit can be understood as returning the output data to the application program that triggers the execution of the data processing method.
  • S310 can be executed in response to the collaborative computing request, and then when the data output by the second processing unit is obtained, the data output by the second processing unit is correspondingly processed. Returned to the application that triggered the co-computing request.
  • controlling the first processing unit to process the input data includes: when receiving the input data of the model to be run, processing the input data It is transmitted to the main management thread, and the main management thread is made to call the first thread, so that the first thread controls the first processing unit to process the input data.
  • the second processing unit is the processing unit corresponding to the last sub-part of the corresponding running sequence, including :
  • the management main thread receives the data output by the second processing unit, it returns the data output by the second processing unit, the second processing unit corresponds to the last subsection of the corresponding running sequence processing unit.
  • the first thread is a thread that calls the first processing unit
  • the second thread is a thread that calls a processing unit whose running order is between the first processing unit and the second processing unit .
  • the first processing unit may also be executed for each running sequence.
  • Each of the processing units between the unit and the second processing unit is configured with a second thread.
  • the data processing method provided in this embodiment may be executed in the server, and the main thread of the application therein may be the main thread of the client corresponding to the server.
  • the management main thread, the calculation thread 1, the calculation thread 2 and the calculation thread 3 all run in the server.
  • the calculation thread 1 may be understood as the aforementioned first thread
  • the calculation thread 2 and the calculation thread 3 may be understood as the aforementioned second thread.
  • processing unit 1 in this embodiment can be called by the computing thread 1, and then when the processing unit 1 needs to be initialized, the designation of the initialization processing unit 1 can be sent to the computing thread 1, so that the computing Thread 1 to call the program that initializes processing unit 1.
  • the data processing method provided in this embodiment can be used to process streaming data.
  • it can be used to process video data.
  • video data it can be understood that the video is composed of multiple frames of images.
  • the video data can be processed frame by frame in the process of processing the video data.
  • the current input data can be A frame of image when secondary processing.
  • S375 The management main thread transmits the current input data to the computing thread 1.
  • S377 The management main thread inputs the output data of the computing thread 1 to the computing thread 2.
  • S379 The management main thread inputs the output data of the computing thread 2 to the computing thread 3.
  • multiple processing units process data in a streaming manner, so after a certain processing unit completes processing the data to be processed at the current time, it is not necessary to wait for the running sequence After the subsequent processing unit completes the subsequent processing process, it can directly start processing the data to be processed next time, so that in the process shown in FIG. 4, in the process of processing the current input data, the S390 may be included: return the last output data, so that processing efficiency can be improved.
  • the video B includes a video frame b1, a video frame b2, a video frame b3, a video frame b4, and a video frame b5, correspondingly, the video frame b1, the video frame Frame b2, video frame b3, video frame b4, and video frame b5 are sequentially input to the model to be run as input data for processing.
  • the video frame b2 can be understood as the previous input data
  • the video frame b3 in the process of processing the video frame b3, based on
  • the output data obtained by processing the video frame b2 in the last processing unit (for example, the aforementioned second processing unit) in the running order can be understood as the last output data.
  • the video frame b3 is used as the current output data
  • the video frame b1 can be understood as the previous input data
  • the last processing unit for example, the aforementioned second processing unit
  • the obtained output data can be understood as the last output data.
  • the processing unit 1 is a CPU
  • the processing unit 2 is a GPU
  • the processing unit 3 is an NPU.
  • the CPU, GPU, and NPU each take 30ms to process data
  • the results can be output every 30ms after 90ms, and compared to the CPU, GPU, or NPU alone to execute all data processing processes, it takes every 30ms. Outputting a result at an interval of 90ms can greatly improve the data output efficiency.
  • a data processing method provided by this application makes it possible to determine how to split a to-be-running model according to the number of currently existing processing units, and then load multiple sub-parts obtained by splitting into corresponding processing units , so that the multiple processing units can cooperate to run the model to be run, which improves the running performance of the electronic device in the process of running the model.
  • the input data when there is input data for the model to be run, the input data will be directly input into the processing unit corresponding to the sub-section at the top of the running order, and the output data of the processing unit at the top of the running order will be input again. to the processing unit in the later running order, so that multiple processing units can continuously process data in a pipeline manner, which improves the performance of the electronic device for running the neural network model.
  • a data processing method provided by an embodiment of the present application includes:
  • S410 Acquire the model to be run and multiple processing units included in the electronic device.
  • S420 Split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the running order corresponding to the sub-sections and the corresponding running parameters of the processing unit.
  • S440 Cooperatively control the plurality of processing units to execute respective sub-sections based on the running order, so as to process the data input to each of the sub-sections.
  • S450 Acquire the time-consuming for the multiple processing units to run respective corresponding sub-parts.
  • the target condition includes: the standard deviation of the respective running times corresponding to the plurality of processing units is not greater than a standard deviation threshold.
  • the standard deviation can be calculated based on the following formula:
  • T 1 is the average time consumption of multiple processing units
  • T 1i is the time consumption of processing unit i
  • n is the number of multiple processing units.
  • each subsection may include some operators in the model to be run.
  • splitting the to-be-run model again can be understood as adjusting the number of operators included in at least some of the sub-sections, so as to adjust the running duration of the processing units corresponding to each sub-section.
  • subsection A includes 3 operators
  • subsection B includes 6 operators
  • subsection C includes 3 operators
  • the subsection may contain 3 operators.
  • the method further includes: acquiring all data of the model to be run. Included operators; if it is detected that there is no operator that is not supported by the multiple processing units in the operators, execute the splitting of the to-be-run model based on the multiple processing units, and obtain A plurality of subsections and respective operating parameters corresponding to the plurality of subsections.
  • a data processing method provided by this application makes it possible to determine how to split a to-be-running model according to the number of currently existing processing units, and then load multiple sub-parts obtained by splitting into corresponding processing units , so that the multiple processing units can cooperate to run the model to be run, which improves the running performance of the electronic device in the process of running the model.
  • the to-be-running model will be split again based on the time-consuming of multiple processing units running their corresponding sub-parts, so that the time-consuming of multiple processing units can be balanced, so as to improve the electronic The performance of the device running the model.
  • a data processing apparatus 500 provided by an embodiment of the present application operates on an electronic device, and the apparatus 500 includes:
  • the data acquisition unit 510 is configured to acquire the model to be run and multiple processing units included in the electronic device.
  • the model processing unit 520 is configured to split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the corresponding operating parameters of the sub-sections.
  • the running order and the corresponding processing unit is configured to split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the corresponding operating parameters of the sub-sections.
  • the running order and the corresponding processing unit is configured to split the to-be-run model based on the multiple processing units to obtain multiple sub-sections and respective operating parameters corresponding to the multiple sub-sections, where the running parameters include the corresponding operating parameters of the sub-sections.
  • the data loading unit 530 is configured to respectively load the plurality of sub-parts to the respective processing units.
  • the cooperative computing unit 540 is configured to cooperatively control the plurality of processing units to execute their corresponding sub-sections based on the running order, so as to process the data input to each of the sub-sections.
  • the model processing unit 520 is specifically configured to obtain the number of the multiple processing units; split the model to be run based on the number to obtain a plurality of sub-parts whose number matches the number and Operation parameters corresponding to each of the multiple subsections.
  • the model processing unit 520 is further specifically configured to acquire the corresponding relationship between the operator and the adaptation processing unit.
  • the model processing unit 520 is specifically configured to split the to-be-run model based on the quantity and the corresponding relationship, and obtain a plurality of subsections whose number matches the quantity and each of the plurality of subsections corresponding operating parameters.
  • the collaborative computing unit 540 is specifically configured to, when receiving the input data of the model to be run, control the first processing unit to process the input data, and the first processing unit is the first corresponding running order.
  • the processing unit corresponding to the subsection when receiving the data output by the processing unit corresponding to the subsection in the previous running order, input the output data to the processing unit corresponding to the subsection in the running order for the
  • the processing unit corresponding to the subsection in the later running order processes the output data; when receiving the data output by the second processing unit, returns the data output by the second processing unit, and the The second processing unit is the processing unit corresponding to the last subsection of the corresponding running sequence.
  • the collaborative computing unit 540 is specifically configured to, when receiving the input data of the model to be run, transmit the input data to the main management thread, and make the main management thread call the first thread for causing the first thread to control the first processing unit to process the input data; when the management main thread receives the data output by the processing unit corresponding to the sub-section in the preceding running order, input the output data To the second thread, trigger the second thread to control the processing unit corresponding to the sub-part in the running order to process the output data; when the management main thread receives the data output by the second processing unit, Return the data output by the second processing unit, where the second processing unit is the processing unit corresponding to the last subsection of the corresponding running sequence.
  • the apparatus 500 further includes: a performance evaluation unit 550, configured to acquire the time-consuming of the multiple processing units running respective corresponding sub-parts.
  • the model processing unit 520 is further configured to split the to-be-run model based on the multiple processing units again if the time-consuming does not meet the target condition, to obtain new multiple sub-sections and all the Describe the operating parameters corresponding to each of the new multiple subsections.
  • the operator detection unit 560 is configured to acquire the operators included in the model to be run, and detect whether the processing unit supports the operators. In this manner, the model processing unit 520 is configured to perform the processing based on the multiple processing units when the operator detection unit 560 detects that there is no operator that is not supported by the multiple processing units. The unit splits the to-be-run model to obtain a plurality of subsections and respective operating parameters corresponding to the plurality of subsections.
  • a data processing apparatus obtains a model to be run and multiple processing units included in the electronic device, and then splits the model to be run based on the multiple processing units to obtain multiple sub-sections and the respective running sequences of the multiple sub-sections and the corresponding processing units, respectively loading the multiple sub-sections into the respective processing units, and cooperatively controlling the multiple processing units based on the running sequences
  • the respective subsections are run to process the data input to each of the subsections. Therefore, it is possible to determine how to split the to-be-running model according to the currently existing processing units, and then load the split sub-parts into the corresponding processing units, so that the multiple processing units can cooperate to run the to-be-running model.
  • Running the model improves the running performance of the electronic device during the running of the model.
  • an embodiment of the present application further provides another electronic device 200 that can execute the foregoing data processing method.
  • the electronic device 200 includes one or more (only one shown in the figure) a processor 102, a memory 104, and a network module 106 that are coupled to each other.
  • the memory 104 stores a program that can execute the content in the foregoing embodiments
  • the processor 102 can execute the program stored in the memory 104 .
  • the processor 102 may include one or more cores for processing data.
  • the processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 200, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104.
  • the processor 102 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA programmable logic array
  • the processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used for rendering and drawing of the display content
  • the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.
  • the memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing the operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
  • the storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like.
  • the memory 104 stores an apparatus, for example, the apparatus may be the aforementioned apparatus 500 .
  • the network module 106 is used for receiving and sending electromagnetic waves, realizing mutual conversion between electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices, for example, communicate with an audio playback device.
  • the network module 106 may include various existing circuit elements for performing these functions, eg, antennas, radio frequency transceivers, digital signal processors, encryption/decryption chips, subscriber identity module (SIM) cards, memory, etc. .
  • the network module 106 can communicate with various networks such as the Internet, an intranet, a wireless network, or communicate with other devices through a wireless network.
  • the aforementioned wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network.
  • the network module 106 may interact with the base station for information.
  • the electronic device may further include at least one of an NPU and a dedicated AI acceleration chip.
  • FIG. 10 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 1100 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.
  • the computer-readable storage medium 1100 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 1100 has storage space for program code 1110 that performs any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.
  • a data processing method, device, electronic device and storage medium provided by the present application obtain a model to be run and multiple processing units included in the electronic device, and then process the data based on the multiple processing units.
  • the to-be-run model is split to obtain a plurality of subsections and their corresponding running sequences and the corresponding processing units, and the plurality of subsections are respectively loaded into the corresponding processing units, and Based on the execution sequence, the plurality of processing units are cooperatively controlled to execute respective sub-sections, so as to process the data input to each of the sub-sections.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例公开了一种数据处理方法、装置、电子设备及存储介质。方法包括:获取待运行模型以及电子设备所包括的多个处理单元;基于多个处理单元将待运行模型进行拆分,得到多个子部分以及多个子部分各自对应的运行参数,运行参数包括子部分所对应的运行顺序以及所对应的处理单元;将多个子部分分别加载到各自对应的处理单元;基于运行顺序协同控制多个处理单元运行各自所对应的子部分,以对输入到各个子部分的数据进行处理。从而可以根据当前所存在的处理单元来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。

Description

数据处理方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2020年7月17日提交的申请号为202010694627.7的中国申请的优先权,其在此出于所有目的通过引用将其全部内容并入本文。
技术领域
本申请涉及计算机技术领域,更具体地,涉及一种数据处理方法、装置、电子设备及存储介质。
背景技术
算法模型,例如神经网络模型是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统。一些算法模型具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。但是,相关的电子设备在运行神经网络模型的过程中,还存在运行性能有待提升的问题。
发明内容
鉴于上述问题,本申请提出了一种数据处理方法、装置、电子设备及存储介质,以改善上述问题。
第一方面,本申请提供了一种数据处理方法,应用于电子设备,所述方法包括:获取待运行模型以及所述电子设备所包括的多个处理单元;基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元;将所述多个子部分分别加载到各自对应的所述处理单元;基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
第二方面,本申请提供了一种数据处理装置,运行于电子设备,所述装置包括:数据获取单元,用于获取待运行模型以及所述电子设备所包括的多个处理单元;模型处理单元,用于基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元;数据加载单元,用于将所述多个子部分分别加载到各自对应的所述处理单元;协同计算单元,用于基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
第三方面,本申请提供了一种电子设备,包括处理器以及存储器;一个或多个程序被存储在所述存储器中并被配置为由所述处理器执行以实现上述的方法。
第四方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码被启动控制器运行时执行上述的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请实施例提出的一种数据处理方法的流程图;
图2示出了本申请另一实施例提出的一种数据处理方法的流程图;
图3示出了本申请再一实施例提出的一种数据处理方法的流程图;
图4示出了本申请实施例中通过多个线程来执行数据处理方法的时序图;
图5示出了本申请实施例中数据处理方法进行数据输出的示意图;
图6示出了本申请又一实施例提出的一种数据处理方法的流程图;
图7示出了本申请另一实施例提出的一种数据处理装置的结构框图;
图8示出了本申请再一实施例提出的一种数据处理装置的结构框图;
图9示出了本申请的用于执行根据本申请实施例的数据处理方法的电子设备的结构框图;
图10是本申请实施例的用于保存或者携带实现根据本申请实施例的数据处理方法的程序代码的存储单元。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
算法模型,例如神经网络(Neural Networks,NN)是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统。神经网络具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。通常在神经算法模型中包括有大量的算子。其中,可以理解的是,算子可以看做是一个神经算法模型中的部分算法过程,算子可以把函数映成函数,或者把函数映成数。
然而,发明人在研究中发现,相关的电子设备在运行神经网络模型的过程中,还存在运行性能有待提升的问题。例如,电子设备在基于相关的方式运行神经网络模型的过程中会调用某一个处理单元来运行神经网络模型,进而该所调用的处理单元数据处理能力直接决定了电子设备运行模型的性能。并且,在这种相关的方式中,在神经网络模型输入的数据为流式数据的情况下,处理单元需要将每次所要处理的数据处理完成之后,才会开始下一次的处理过程,也较大的限制了电子设备运行神经网络模型的性能。
因此,发明人提出了本申请中可以改善上述问题的数据处理方法、装置、电子设备及存储介质,通过获取待运行模型以及所述电子设备所包括的多个处理单元,再基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行顺序以及所对应的所述处理单元,将所述多个子部分分别加载到各自对应的所述处理单元,并基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。从而使得可以能够根据当前所存在的处理单元来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。
下面将结合附图具体描述本申请的各实施例。
请参阅图1,本申请实施例提供的一种数据处理方法,所述方法包括:
S110:获取待运行模型以及所述电子设备所包括的多个处理单元。
其中,本实施例中的待运行模型为后续会加载到处理单元进行运行的模型。在本实施例中可以有多种确定待运行模型的方式。
作为一种方式,待运行模型可以为被应用程序所调用的神经网络模型。需要说明的是,应用程序在运行过程中可能会需要对一些数据进行处理,在这个过程中应用程序可以通过调用神经网络来进行数据处理。例如,图像处理类的应用程序可能需要进行图像识别,进而该图像处理类的应用程序就可以通过调用用于进行图像识别的神经网络模型来对图像进行处理。
作为另外一种方式,电子设备可以周期性的执行指定的任务。在这种方式中,电子设备在执行该指定的任务过程中所调用的神经网络模型则可以被确定为待运行模型。可选的,该指定的任务可以为预测电子设备后续将要运行的应用程序的任务,可以为进行视频处理的任务,可以为预测电子设备的用户偏好的任务,还可以为预测电子设备的剩余电量的任务。
处理单元为电子设备中可以进行数据处理的硬件。可选的,处理单元可以为CPU(Central Processing Unit)、GPU(Graphics Processing Unit)、DSP(Digital Signal Process)、NPU(Neural-network Processing Unit)或者专用AI加速芯片。而需要说明的是,对于不同的电子设备所包括的处理单元可能是不同的,进而为了便于后续进行待运行模型的拆分操作,则可以获取到电子设备具体包括哪些处理单元。
作为一种方式,电子设备的操作系统可以与底层硬件交互进而获取到电子设备具体包括多少个处理单元以及所包括的处理单元的类型,并将获取到的处理单元的数量以及类型存储在指定的系统文件中,进而在执行本实施例提供的数据处理方法过程中,可以通过读取该指定的系统文件中的方式来获取到电子设备所包括的多个处理单元。
S120:基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元。
如前述内容所示,待运行模型中会包括有多层,而每层中又会包括有至少一个算子,进而使得待运行模型可以看作是由多个算子组成的。在电子设备所包括的处理单元有多个的情况下,为了使得该多个处理单元可以协同运行待运行模型,可以将待运行模型进行拆分,得到多个子部分。在这种方式下,每个子部分都可以包括有待运行模型中的至少部分算子。
对应的,在对待运行模型进行拆分的过程中,还可以分别生成每个子部分的运行参数,以便电子设备可以获取到每个子部分的运行顺序以及每个子部分需要由哪个处理单元运行。
S130:将所述多个子部分分别加载到各自对应的所述处理单元。
其中,在本申请实施例中的将子部分加载到处理单元可以理解为配置子部分所对应的处理单元来对子部分所包括的算子进行运行。
需要说明的是,模型本身可以是存储在对应的模型文件中的。在这种方式下,若需要运行模型则可以直接读取对应的模型文件以获取到模型中所包括的算子。那么作为一种方式,在将运行模型拆分为多个子部分,可以理解为将待运行模型对应的模型文件拆分为多个子文件,且该多个子文件与前述的多个子部分一一对应。
如前述内容所示,每个子部分对应有运行参数。可选的,每个子部分所对应的运行参数可以存储在子部分所对应的子文件中,从而使得电子设备可以在读取子文件以获取到子部分所包括的算子的同时,也可以同时获取到子部分所对应的运行参数,提升了数据获取效率。
S140:基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
需要说明的是,模型通常是对所输入的数据进行处理,然后将处理得到的数据进行输出。而在将运行模型拆分为多个子部分的情况下,每个子部分的输入和输出可以是相互依赖的。在每个子部分对应有运行顺序的情况下,则可以基于每个子部分所对应运行顺序来调用每个 子部分对应的处理单元,进而使得每个处理单元可以对所对应的子部分所对应的输入数据进行处理。
示例性的,待运行模型A可以拆分得到子部分a、子部分b以及子部分c。其中,子部分a对应的处理单元为CPU,子部分b对应的处理单元为GPU,子部分c对应的处理单元为NPU。并且,子部分a的运行顺序在最前,子部分b的运行顺序在子部分a的运行顺序之后,子部分c的运行顺序在子部分b的运行顺序之后。在这种方式下,电子设备可以优先调用CPU运行子部分a,以便对输入到子部分a的数据进行处理,得到子部分a的输出数据,可以理解的是,子部分a的输出数据为输入到子部分b的数据。然后电子设备会调用GPU运行子部分b来对子部分a的输出数据进行处理,得到子部分b的输出数据,可以理解的是子部分b的输出数据为输入到子部分c的数据。然后,电子设备会再调用NPU来对子部分b的输出数据进行处理,得到最后的输出数据。
本申请提供的一种数据处理方法,获取待运行模型以及所述电子设备所包括的多个处理单元,再基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行顺序以及所对应的所述处理单元,将所述多个子部分分别加载到各自对应的所述处理单元,并基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。从而使得可以能够根据当前所存在的处理单元来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。
请参阅图2,本申请实施例提供的一种数据处理方法,所述方法包括:
S210:获取待运行模型以及所述电子设备所包括的多个处理单元。
S220:获取所述多个处理单元的数量。
S230:基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
其中,在本实施例中可以多种对待运行模型进行拆分的方式。
作为一种方式,可以基于数据并行化算法对待运行模型进行拆分。在这种方式中,则可以将模型拆分为多个结构一样的子部分,进而将输入数据也进行拆分后分别输入到该多个子部分进行数据并行化处理。其中,结构一样可以理解为模型所包括的层结构的种类相同。示例性的,待运行模型包括有输入层、卷积层以及输出层。其中,输入层中包括有4个算子,卷积层中包括有8个算子,输出层中也包括有4个算子,在基于数据并行化算法所对应的拆分规则进行模型拆分的情况下,所拆分得到的子部分也会包括有输入层、卷积层以及输出层,进而实现与原来的待运行模型的层结构种类相同。只是在子部分中每层所包括的算子数量会少于原来的待运行模型中每层中的算子的数量。以拆分为两个子部分为例,则每个子部分的输入层可能只包括2个算子,卷积层中只包括4个算子,而输出层中也只包括2个算子。
作为另一种方式,可以基于算子并行化算法对待运行模型进行拆分。在这种方式中,则可以将同一层中的算子进行拆分,在这种情况下,同一层中的算子则会分布到不同的子部分中,且拆分所得到的每个子部分则可以包括有不同层中的部分算子。
作为再一种方式,可以基于层间流水线算法对待运行模型进行拆分。在这种方式中,则可以将待运行模型所包括的多层结构以层为单位进行拆分,在这种情况下,拆分得到的多个子部分则会分别包括有待运行模型中的部分层。示例性的,待运行模型包括有输入层、卷积层以及输出层,则可以将输入层拆分为一个子部分,将卷积层拆分为一个子部分,将输出层拆分为一个子部分。
S240:将所述多个子部分分别加载到各自对应的所述处理单元。
S250:基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
作为一种方式,本实施例提供的方法还包括:获取算子与适配处理单元的对应关系。 在这种方式下,所述基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数,包括:基于所述数量以及所述对应关系对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
示例性的,算子与适配处理单元的对应关系可以如下表所示:
Figure PCTCN2021092183-appb-000001
需要说明的是,上表中存储了每个算子所对应的计算类型、适合的处理单元以及在适合的处理单元中运行耗时。例如,算子名称为Conv2D的算子,对应的计算类型为神经网络矩阵类运算,适合的处理单元为GPU、专用AI加速芯片,其中,在GPU中运行耗时为5ms,在专用AI加速芯片中运行耗时为3ms。再例如,算子名称为Sin的算子,对应的计算类型为数学类运算,适合的处理单元为GPU、CPU,其中,在GPU中运行耗时为4ms,在CPU中运行耗时为4ms。
本申请提供的一种数据处理方法,从而使得可以能够根据当前所存在的处理单元的数量来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。并且,在本实施例中,因为是根据处理单元的数量来确定具体将待运行模型拆分为多少份,可以使得拆分出的多个子部分的个数能够更加的与电子设备的实际所具有的处理单元适配,以便能够进一步的提升运行性能。
请参阅图3,本申请实施例提供的一种数据处理方法,所述方法包括:
S310:获取待运行模型以及所述电子设备所包括的多个处理单元。
S320:基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元。
S330:将所述多个子部分分别加载到各自对应的所述处理单元。
S340:当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处理,所述第一处理单元为对应的运行顺序最前的子部分对应的处理单元。
S350:当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理。
S360:当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
其中,将所述第二处理单元所输出的数据进行返回可以理解为将所输出的数据返回给触发执行数据处理方法的应用程序。示例性的,在该应用程序触发协同计算请求时,就可以响应于该协同计算请求而执行S310,进而在得到第二处理单元所输出的数据时,对应会将第二处理单元所输出的数据返回给触发协同计算请求的应用程序。
作为一种方式,电子设备可以通过建立多个线程来执行S330、S340以及S350。在这种方式下,所述当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处 理,包括:当接收到待运行模型的输入数据时,将所述输入数据传输给管理主线程,并使所述管理主线程调用第一线程,以用于使所述第一线程控制第一处理单元对所述输入数据进行处理。
所述当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理,包括:当所述管理主线程接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到第二线程,触发所述第二线程控制运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理。
所述当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元,包括:当所述管理主线程接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
需要说明的是,在本实施例中第一线程为对第一处理单元进行调用的线程,第二线程为对运行顺序在第一处理单元和第二处理单元之间的处理单元进行调用的线程。可选的,运行顺序在第一处理单元和第二处理单元之间的处理单元有多个的情况下,第二线程也可以有多个,并且,还可以针对每个运行顺序在第一处理单元和第二处理单元之间的处理单元分别配置一个第二线程。
示例性的,下面通过一个时序图再来对前述流程进行介绍。如图4所示,包括:
S370:应用主线程发送协同初始化指令。
需要说明的是,本实施例所提供的数据处理方法可以运行于服务器中,而其中的应用主线程可以为服务器所对应的客户端的主线程。其中的,管理主线程、计算线程1、计算线程2以及计算线程3均运行于该服务器中。其中,计算线程1可以理解为前述的第一线程,计算线程2和计算线程3可以理解为前述的第二线程。
S371:管理主线程触发初始化处理单元1。
需要说明的是,本实施例中的处理单元1可以由计算线程1进行调用,进而在需要对处理单元1进行初始化的情况下,可以向计算线程1发送初始化处理单元1的指定,进而使得计算线程1来调用对处理单元1进行初始化的程序。
S372:管理主线程触发初始化处理单元2。
S373:管理主线程触发初始化处理单元3。
S374:应用主线程发送当次的输入数据。
作为一种方式,本实施例所提供的数据处理方法可以用于对流式数据进行处理。例如,可以用于对视频数据进行处理。而对于视频数据,可以理解的是,视频是由多帧图像组成的,在这种方式下,在对视频数据进行处理的过程中可以逐帧图像进行处理,那么当次的输入数据则可以为当次要处理的一帧图像。
S375:管理主线程向计算线程1传输当次的输入数据。
S376:计算线程1向管理主线程返回计算线程1的输出数据。
S377:管理主线程向计算线程2输入计算线程1的输出数据。
S378:计算线程2向管理主线程返回计算线程2的输出数据。
S379:管理主线程向计算线程3输入计算线程2的输出数据。
S380:计算线程3向管理主线程返回计算线程3的输出数据。
S381:管理主线程向应用主线程返回当次的输出数据。
需要说明的是,本实施例中是由多个处理单元以流式的方式对数据进行处理的,那么在某个处理单元完成对当次所需处理的数据进行处理后,可以不必等运行顺序在后的处理单元完成后续的处理过程,而是可以直接开始对下次所需处理的数据进行处理,进而使得图4所示的流程中,在对当次的输入数据进行处理的过程中还可以包括S390:返回上上次的输出数据,以便可以实现提升处理效率。示例性的,若基于本实施例提供的方法对视频B进行处理, 该视频B包括视频帧b1、视频帧b2、视频帧b3、视频帧b4以及视频帧b5,对应的,视频帧b1、视频帧b2、视频帧b3、视频帧b4以及视频帧b5会依次作为输入数据输入到待运行模型中进行处理。在这种情况下,若当在将视频帧b3作为当次的输入数据进行处理的过程中,视频帧b2则可以理解为上一次的输入数据,那么在对视频帧b3的处理过程中,基于运行顺序在最后的处理单元(例如,前述的第二处理单元)对视频帧b2进行处理所得到的输出数据则可以理解为上一次的输出数据,对应的,若当在将视频帧b3作为当次的输入数据进行处理的过程中,视频帧b1则可以理解为上上次的输入数据,而基于运行顺序在最后的处理单元(例如,前述的第二处理单元)对视频帧b1进行处理所得到的输出数据则可以理解为上上次的输出数据。
下面再通过图5来对图4中所示流程的处理效果进行说明。
示例性的,若处理单元1为CPU,处理单元2为GPU,处理单元3为NPU。且CPU、GPU以及NPU各自处理数据的耗时均为30ms的情况下,可以90ms之后每间隔30ms就输出一次结果,进而相比于在单独由CPU、GPU或者NPU来执行所有数据处理流程需要每间隔90ms输出一次结果可以极大的提升数据输出效率。
本申请提供的一种数据处理方法,从而使得可以能够根据当前所存在的处理单元的数量来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。并且,在本实施例中,在有待运行模型的输入数据时,会直接将输入数据输入到运行顺序最前的子部分对应的处理单元,并且会将运行顺序在前的处理单元的输出数据再输入到运行顺序在后的处理单元,进而使得多个处理单元可以以流水线的方式不断的进行数据处理,提升了电子设备运行神经网络模型的性能。
请参阅图6,本申请实施例提供的一种数据处理方法,所述方法包括:
S410:获取待运行模型以及所述电子设备所包括的多个处理单元。
S420:基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元。
S430:将所述多个子部分分别加载到各自对应的所述处理单元。
S440:基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
S450:获取所述多个处理单元运行各自所对应的子部分的耗时。
S460:若所述耗时不满足目标条件,重新基于所述多个处理单元将所述待运行模型进行拆分,得到新的多个子部分以及所述新的多个子部分各自对应的运行参数。
需要说明的是,重新对待运行模型进行拆分的目的是为了调节原本的多个子部分所包括的算子的比例,进而以起到调节每个子部分所对应的处理单元的耗时的效果。可选的,所述目标条件包括:多个所述处理单元各自对应的运行时间的标准差不大于标准差阈值。可选的,可以基于下列公式来计算该标准差:
Figure PCTCN2021092183-appb-000002
其中,T 1为多个处理单元耗时的平均值,T 1i为处理单元i的耗时,n为多个处理单元的个数。
如前述内容可知,在对待运行模型进行拆分得到的多个子部分中,每个子部分均可以包括有待运行模型中的部分算子。那么重新对所述待运行模型进行拆分,则可以理解为调整至少部分子部分中所包括的算子的数量,以便实现调节各个子部分所对应的处理单元的运行时长。示例性的,子部分A中包括有3个算子,子部分B中包括有6个算子,子部分C中包括有3个算子,那么在重新进行拆分之后,子部分中则可能会包括有4个算子,子部分 B中包括有5个算子,而子部分c中依然包括有3个算子。
作为一种方式,所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数之前还包括:获取所述待运行模型所包括的算子;若检测到所述算子中不存在所述多个处理单元均不支持的算子时,执行所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数。
本申请提供的一种数据处理方法,从而使得可以能够根据当前所存在的处理单元的数量来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。并且,在本实施例中还会基于多个处理单元运行各自所对应的子部分的耗时,来重新对待运行模型进行拆分,以便可以使得多个处理单元的耗时能够均衡,以提升电子设备运行模型的性能。
请参阅图7,本申请实施例提供的一种数据处理装置500,运行于电子设备,所述装置500包括:
数据获取单元510,用于获取待运行模型以及所述电子设备所包括的多个处理单元。
模型处理单元520,用于基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元。
数据加载单元530,用于将所述多个子部分分别加载到各自对应的所述处理单元。
协同计算单元540,用于基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
作为一种方式,模型处理单元520,具体用于获取所述多个处理单元的数量;基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。可选的,模型处理单元520,还具体用于获取算子与适配处理单元的对应关系。在这种方式下,模型处理单元520,具体用于基于所述数量以及所述对应关系对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
作为一种方式,协同计算单元540,具体用于当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处理,所述第一处理单元为对应的运行顺序最前的子部分对应的处理单元;当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理;当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
作为一种方式,协同计算单元540,具体用于当接收到待运行模型的输入数据时,将所述输入数据传输给管理主线程,并使所述管理主线程调用第一线程,以用于使所述第一线程控制第一处理单元对所述输入数据进行处理;当所述管理主线程接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到第二线程,触发所述第二线程控制运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理;当所述管理主线程接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
作为一种方式,如图8所示,所述装置500还包括:性能评估单元550,用于获取所述多个处理单元运行各自所对应的子部分的耗时。在这种方式下,模型处理单元520,还用于若所述耗时不满足目标条件,重新基于所述多个处理单元将所述待运行模型进行拆分,得到新的多个子部分以及所述新的多个子部分各自对应的运行参数。
算子检测单元560,用于获取所述待运行模型所包括的算子,并检测所述处理单元是 否支持所述算子。在这种方式下,模型处理单元520,用于算子检测单元560检测到所述算子中不存在所述多个处理单元均不支持的算子时,执行所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数。
本申请提供的一种数据处理装置,获取待运行模型以及所述电子设备所包括的多个处理单元,再基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行顺序以及所对应的所述处理单元,将所述多个子部分分别加载到各自对应的所述处理单元,并基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。从而使得可以能够根据当前所存在的处理单元来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。
需要说明的是,本申请中装置实施例与前述方法实施例是相互对应的,装置实施例中具体的原理可以参见前述方法实施例中的内容,此处不再赘述。
下面将结合图9对本申请提供的一种电子设备进行说明。
请参阅图9,基于上述的数据处理方法、装置,本申请实施例还提供的另一种可以执行前述数据处理方法的电子设备200。电子设备200包括相互耦合的一个或多个(图中仅示出一个)处理器102、存储器104以及网络模块106。其中,该存储器104中存储有可以执行前述实施例中内容的程序,而处理器102可以执行该存储器104中存储的程序。
其中,处理器102可以包括一个或者多个用于处理数据的核。处理器102利用各种接口和线路连接整个电子设备200内的各个部分,通过运行或执行存储在存储器104内的指令、程序、代码集或指令集,以及调用存储在存储器104内的数据,执行电子设备200的各种功能和处理数据。可选地,处理器102可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器102可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器102中,单独通过一块通信芯片进行实现。
存储器104可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器104可用于存储指令、程序、代码、代码集或指令集。存储器104可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。存储器104中存储有装置,例如,该装置可以为前述的装置500。
所述网络模块106用于接收以及发送电磁波,实现电磁波与电信号的相互转换,从而与通讯网络或者其他设备进行通讯,例如和音频播放设备进行通讯。所述网络模块106可包括各种现有的用于执行这些功能的电路元件,例如,天线、射频收发器、数字信号处理器、加密/解密芯片、用户身份模块(SIM)卡、存储器等等。所述网络模块106可与各种网络如互联网、企业内部网、无线网络进行通讯或者通过无线网络与其他设备进行通讯。上述的无线网络可包括蜂窝式电话网、无线局域网或者城域网。例如,网络模块106可以与基站进行信息交互。
此外,电子设备还可以包括NPU以及专用AI加速芯片中的至少一个器件。
请参考图10,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1100中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质1100可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质1100包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1100具有执行上述方法中的任何方法步骤的程序代码1110的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1110可以例如以适当形式进行压缩。
综上所述,本申请提供的一种数据处理方法、装置、电子设备及存储介质,获取待运行模型以及所述电子设备所包括的多个处理单元,再基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行顺序以及所对应的所述处理单元,将所述多个子部分分别加载到各自对应的所述处理单元,并基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。从而使得可以能够根据当前所存在的处理单元来确定如何对待运行模型进行拆分,进而再将拆分得到的多个子部分加载到所对应的处理单元,以便使得该多个处理单元可以协同运行待运行模型,提升了电子设备在运行模型过程中的运行性能。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种数据处理方法,其特征在于,应用于电子设备,所述方法包括:
    获取待运行模型以及所述电子设备所包括的多个处理单元;
    基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元;
    将所述多个子部分分别加载到各自对应的所述处理单元;
    基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数包括:
    获取所述多个处理单元的数量;
    基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取算子与适配处理单元的对应关系;
    所述基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数,包括:
    基于所述数量以及所述对应关系对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理,包括:
    当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处理,所述第一处理单元为对应的运行顺序最前的子部分对应的处理单元;
    当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理;
    当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
  5. 根据权利要求4所述的方法,其特征在于,所述当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处理,包括:当接收到待运行模型的输入数据时,将所述输入数据传输给管理主线程,并使所述管理主线程调用第一线程,以用于使所述第一线程控制第一处理单元对所述输入数据进行处理;
    所述当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理,包括:
    当所述管理主线程接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到第二线程,触发所述第二线程控制运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理;
    所述当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元,包括:当所述管理主线程接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
  6. 根据权利要求1-5所述的方法,其特征在于,所述基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理之后还包括:
    获取所述多个处理单元运行各自所对应的子部分的耗时;
    若所述耗时不满足目标条件,重新基于所述多个处理单元将所述待运行模型进行拆分,得到新的多个子部分以及所述新的多个子部分各自对应的运行参数。
  7. 根据权利要求6所述的方法,其特征在于,所述目标条件包括:多个所述处理单元各自对应的运行时间的标准差不大于标准差阈值。
  8. 根据权利要求7所述的方法,其特征在于,所述标准差阈值基于下列公式计算:
    Figure PCTCN2021092183-appb-100001
    其中,T 1为多个所述处理单元耗时的平均值,T 1i为处理单元i的耗时,n为多个所述处理单元的个数。
  9. 根据权利要求1-8任一所述的方法,其特征在于,所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数之前还包括:
    获取所述待运行模型所包括的算子;
    若检测到所述算子中不存在所述多个处理单元均不支持的算子时,执行所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数。
  10. 根据权利要求1-9任一所述的方法,其特征在于,所述处理单元可以为CPU、GPU、DSP、NPU或者专用AI加速芯片。
  11. 根据权利要求1-10任一所述的方法,其特征在于,所述获取待运行模型以及所述电子设备所包括的多个处理单元,包括:
    获取待运行模型以及指定的系统文件,所述指定的系统文件中存储有所述电子设备所包括的处理单元的数量以及类型;
    通过读取所述指定的系统文件获取到电子设备所包括的多个处理单元。
  12. 根据权利要求1所述的方法,其特征在于,所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,包括:
    基于所述多个处理单元以及数据并行化算法将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,其中,每个所述子部分所包括的层结构的种类相同。
  13. 根据权利要求1所述的方法,其特征在于,所述基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,包括:
    基于所述多个处理单元以及层间流水线算法将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,其中,每个所述子部分所包括所述待运行模型的部分层。
  14. 一种数据处理装置,其特征在于,运行于电子设备,所述装置包括:
    数据获取单元,用于获取待运行模型以及所述电子设备所包括的多个处理单元;
    模型处理单元,用于基于所述多个处理单元将所述待运行模型进行拆分,得到多个子部分以及所述多个子部分各自对应的运行参数,所述运行参数包括子部分所对应的运行顺序以及所对应的所述处理单元;
    数据加载单元,用于将所述多个子部分分别加载到各自对应的所述处理单元;
    协同计算单元,用于基于所述运行顺序协同控制所述多个处理单元运行各自所对应的子部分,以对输入到各个所述子部分的数据进行处理。
  15. 根据权利要求14所述的装置,其特征在于,所述模型处理单元,具体用于获取所述多个处理单元的数量;基于所述数量对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
  16. 根据权利要求15所述的装置,其特征在于,所述模型处理单元,还具体用于获取算子与适配处理单元的对应关系;所述模型处理单元,具体用于基于所述数量以及所述对应关系对所述待运行模型进行拆分,得到个数与所述数量匹配的多个子部分以及多个子部分各自对应的运行参数。
  17. 根据权利要求14-16任一所述的装置,其特征在于,协同计算单元,具体用于当接收到待运行模型的输入数据时,控制第一处理单元对所述输入数据进行处理,所述第一处理单元为对应的运行顺序最前的子部分对应的处理单元;
    当接收到运行顺序在前的子部分对应的处理单元输出的数据时,将所述输出的数据输入到运行顺序在后的子部分对应的处理单元,以用于所述运行顺序在后的子部分对应的处理单元对所述输出的数据进行处理;
    当接收到第二处理单元所输出的数据时,将所述第二处理单元所输出的数据进行返回,所述第二处理单元为对应的运行顺序最后的子部分对应的处理单元。
  18. 根据权利要求14-17任一所述的装置,其特征在于,所述处理单元可以为CPU、GPU、DSP、NPU或者专用AI加速芯片。
  19. 一种电子设备,其特征在于,包括处理器以及存储器;
    一个或多个程序被存储在所述存储器中并被配置为由所述处理器执行以实现权利要求1-13任一所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序代码,其中,在所述程序代码被处理器运行时执行权利要求1-13任一所述的方法。
PCT/CN2021/092183 2020-07-17 2021-05-07 数据处理方法、装置、电子设备及存储介质 WO2022012119A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010694627.7 2020-07-17
CN202010694627.7A CN111782403B (zh) 2020-07-17 2020-07-17 数据处理方法、装置以及电子设备

Publications (1)

Publication Number Publication Date
WO2022012119A1 true WO2022012119A1 (zh) 2022-01-20

Family

ID=72763121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092183 WO2022012119A1 (zh) 2020-07-17 2021-05-07 数据处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111782403B (zh)
WO (1) WO2022012119A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782401A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111782403B (zh) * 2020-07-17 2022-04-19 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN116362305A (zh) * 2021-12-24 2023-06-30 Oppo广东移动通信有限公司 数据处理方法、装置、计算机设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110298437A (zh) * 2019-06-28 2019-10-01 Oppo广东移动通信有限公司 神经网络的分割计算方法、装置、存储介质及移动终端
CN110458294A (zh) * 2019-08-19 2019-11-15 Oppo广东移动通信有限公司 模型运行方法、装置、终端及存储介质
CN111340237A (zh) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN111782401A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111782403A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111782402A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111984414A (zh) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 一种数据处理的方法、系统、设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035751B (zh) * 2014-06-20 2016-10-12 深圳市腾讯计算机系统有限公司 基于多图形处理器的数据并行处理方法及装置
CN109523022B (zh) * 2018-11-13 2022-04-05 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端
CN110503180B (zh) * 2019-08-14 2021-09-14 Oppo广东移动通信有限公司 模型处理方法、装置以及电子设备
CN110633153A (zh) * 2019-09-24 2019-12-31 上海寒武纪信息科技有限公司 一种用多核处理器实现神经网络模型拆分方法及相关产品
CN111400012A (zh) * 2020-03-20 2020-07-10 中国建设银行股份有限公司 数据并行处理方法、装置、设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160267380A1 (en) * 2015-03-13 2016-09-15 Nuance Communications, Inc. Method and System for Training a Neural Network
CN109754073A (zh) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备和可读存储介质
CN110298437A (zh) * 2019-06-28 2019-10-01 Oppo广东移动通信有限公司 神经网络的分割计算方法、装置、存储介质及移动终端
CN110458294A (zh) * 2019-08-19 2019-11-15 Oppo广东移动通信有限公司 模型运行方法、装置、终端及存储介质
CN111340237A (zh) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN111782401A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111782403A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111782402A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111984414A (zh) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 一种数据处理的方法、系统、设备及可读存储介质

Also Published As

Publication number Publication date
CN111782403B (zh) 2022-04-19
CN111782403A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2022012119A1 (zh) 数据处理方法、装置、电子设备及存储介质
WO2022012123A1 (zh) 数据处理方法、装置、电子设备及存储介质
WO2022012118A1 (zh) 数据处理方法、装置、电子设备及存储介质
WO2022042113A1 (zh) 数据处理方法、装置、电子设备及存储介质
US10754976B2 (en) Configuring image as private within storage container
TW202119255A (zh) 推理系統、推理方法、電子設備及電腦儲存媒體
CN113656176B (zh) 云设备的分配方法、装置、系统、电子设备、介质及产品
CN111273953B (zh) 模型处理方法、装置、终端及存储介质
US11954396B2 (en) Screen projection status determining method and apparatus
WO2021232958A1 (zh) 执行操作的方法、电子设备、装置及存储介质
CN112102364A (zh) 目标物跟踪方法、装置、电子设备及存储介质
WO2022121701A1 (zh) 图像处理方法、装置、电子设备以及存储介质
CN111292262A (zh) 图像处理方法、装置、电子设备以及存储介质
CN111182332B (zh) 视频处理方法、装置、服务器及存储介质
CN111813529B (zh) 数据处理方法、装置、电子设备及存储介质
US10212291B2 (en) System, method, and non-transitory computer readable storage medium for image recognition based on convolutional neural networks
US20230083565A1 (en) Image data processing method and apparatus, storage medium, and electronic device
US11720414B2 (en) Parallel execution controller for partitioned segments of a data model
CN110942345A (zh) 种子用户的选取方法、装置、设备及存储介质
CN115129469B (zh) 跨进程通信方法、装置、设备及存储介质
US11762622B1 (en) Interactive remote digital image editing utilizing a scalable containerized architecture
US20220279241A1 (en) Method and device for recognizing images
WO2019144701A1 (zh) 神经网络运算方法、装置以及相关设备
US20230252264A1 (en) Neural network processing
CN117035022A (zh) 模型生成方法、装置以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21842191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21842191

Country of ref document: EP

Kind code of ref document: A1