WO2020098414A1 - 终端数据处理方法、装置及终端 - Google Patents

终端数据处理方法、装置及终端 Download PDF

Info

Publication number
WO2020098414A1
WO2020098414A1 PCT/CN2019/109609 CN2019109609W WO2020098414A1 WO 2020098414 A1 WO2020098414 A1 WO 2020098414A1 CN 2019109609 W CN2019109609 W CN 2019109609W WO 2020098414 A1 WO2020098414 A1 WO 2020098414A1
Authority
WO
WIPO (PCT)
Prior art keywords
calculation processing
operator
processing units
expected value
neural network
Prior art date
Application number
PCT/CN2019/109609
Other languages
English (en)
French (fr)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020098414A1 publication Critical patent/WO2020098414A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • This application relates to the field of computer technology, and in particular, to a terminal data processing method, device, and terminal.
  • Deep learning is playing an increasingly important role in the fields of search technology, data mining, natural language processing, speech, and recommendation.
  • the general neural network model performs deep learning calculations on the terminal, it is necessary to allocate the operators included in the neural network model to the terminal's calculation processing unit to run.
  • the distribution is unreasonable, the calculation efficiency is affected.
  • Embodiments of the present application provide a terminal data processing method, device, and terminal, which can improve computing efficiency.
  • the present application provides a terminal data processing method.
  • the terminal includes at least two calculation processing units.
  • the method includes:
  • a calculation processing unit in which the operator operates is determined in the at least two calculation processing units.
  • an embodiment of the present application provides a terminal data processing apparatus, including:
  • Conversion unit which is used to convert the trained neural network model according to the preset model structure
  • An analysis unit which is used to analyze the converted neural network model
  • a determining unit which is used to obtain the expected value of the operator in the converted neural network model running on the at least two calculation processing units, and the state information of the at least two calculation processing units;
  • the determining unit is further configured to determine the calculation processing unit in which the operator is operating in the at least two calculation processing units based on the state information and the expected value.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of any of the above methods are implemented.
  • an embodiment of the present application provides a terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • a terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the program, any of the above methods A step of.
  • the terminal data processing method provided in the embodiment of the present application converts the trained neural network model according to a preset model structure; analyzes the converted neural network model; obtains the operator in the converted neural network model at least Expected values of operations on two calculation processing units; acquiring state information of at least two of the calculation processing units; based on the state information and the expected values, determining the operation of the operator in the at least two calculation processing units Calculation processing unit.
  • the data processing method of the embodiment of the present application determines the calculation processing unit in which the operator operates based on the expected value of the operator running on the calculation processing unit and the state information of the calculation processing unit, taking into account the actual operating state of the calculation processing unit and the calculation process
  • the expected value of the operation on the unit can flexibly and reasonably allocate each operator in the neural network to the calculation processing unit for operation, and use the calculation processing unit reasonably and effectively to improve the calculation efficiency.
  • FIG. 1 shows a flowchart of an embodiment of a terminal data processing method of the present application
  • FIG. 2 shows a schematic structural diagram of an embodiment of a terminal data processing device of the present application
  • FIG. 3 shows a schematic structural diagram of a terminal according to an embodiment of the present application.
  • the calculation processing unit is configured in advance, and after the configuration, all operations are run on one calculation processing unit, and the calculation efficiency is not high. Or, when configuring the calculation intelligently, whether power consumption or speed is the priority, and then according to one of the priority strategies to determine to which calculation processing unit the neural network operator is allocated, the processing method is single limited and not flexible.
  • FIG. 1 shows a flowchart of an embodiment of the terminal data processing method of the present application. Referring to FIG. 1, the method specifically includes:
  • a calculation processing unit in which the operator operates is determined in the at least two calculation processing units.
  • the terminal data processing method provided in the embodiment of the present application converts the trained neural network model according to a preset model structure; analyzes the converted neural network model; obtains the operator in the converted neural network model at least Expected values of operations on two calculation processing units; acquiring state information of at least two of the calculation processing units; based on the state information and the expected values, determining the operation of the operator in the at least two calculation processing units Calculation processing unit.
  • the data processing method of the embodiment of the present application determines the calculation processing unit in which the operator operates based on the expected value of the operator running on the calculation processing unit and the state information of the calculation processing unit, taking into account the actual operating state of the calculation processing unit and the calculation process of the operator
  • the expected value of the operation on the unit can flexibly and reasonably allocate each operator in the neural network to the calculation processing unit for operation, and use the calculation processing unit reasonably and effectively to improve the calculation efficiency.
  • the method of the embodiment of the present application can make full use of the computing resources on the terminal.
  • the terminal in the embodiment of the present application includes at least two calculation processing units.
  • the terminals in the embodiments of the present application include but are not limited to mobile phones, tablet computers, laptop computers and the like.
  • the calculation processing unit on the terminal in the embodiment of the present application may include, for example, CPU, GPU (Graphics Processor, Graphics Processing Unit), DSP (Digital Signal Processor, digital processor), and NPU (Network Processor, neural-network process) units) etc.
  • the at least two calculation processing units included in the terminal in the embodiments of the present application may be selected from the above-mentioned or other specific calculation processing units that are not mentioned.
  • the embodiment of the present application does not specifically limit the trained neural network model.
  • the trained neural network model may be, for example, a convolutional neural network (Convolutional Neural Networks, CNN) model, or a recurrent neural network (Recurrent Neural Network, RNN), etc.
  • the trained neural network model is converted into a preset model structure.
  • different types of neural network models are converted into a preset model structure, so as to facilitate reading and assigning calculation processing units operated by their operators. Among them, different types of neural network models are represented by open source code, and the converted preset model structure is represented by custom code.
  • the converted model is analyzed. After parsing, the model can be quickly read to improve the running speed of the model.
  • the converted model is parsed into memory, and the model data can be quickly read. For example, quickly read the expected value of the operator running on the calculation processing unit in the model, and put the operator on the determined calculation processing unit to run.
  • the expected value of the operator running on at least two calculation processing units and the state information of the at least two calculation processing units are obtained, and the calculation processing unit operated by the operator is determined according to the expected value and the state information.
  • the expected value of the integrated operator running on each computing processing unit and the status information of each computing processing unit determine the computing processing unit that the operator runs, and the computing processing unit that the operator runs can be flexibly allocated, which can improve the model Operating speed and computing efficiency can also improve the overall operating efficiency of the terminal.
  • the embodiment of the present application does not specifically limit the state information of the calculation processing unit.
  • the status information may include, for example, an operation performance parameter such as an idle value of an operation space of the calculation processing unit.
  • an operation performance parameter such as an idle value of an operation space of the calculation processing unit. The higher the idle value of the calculation space of the calculation processing unit, the stronger its remaining processing capacity.
  • a terminal in an embodiment of the present application includes two calculation processing units, CPU (Central Processing Unit) and GPU (Graphics Processing Unit); the trained neural network model is converted according to a preset model structure, and the model includes operators A and operator B, the output of operator A is the input of operator B, and the converted model includes the expected value of each operator (operator A and operator B) running on each calculation processing unit (CPU and GPU) .
  • the CPU and GPU Read the expected value of the operator A running on the CPU and GPU in the converted model, and determine the calculation processing unit on which the operator A runs based on the free value of the computing space of the CPU and GPU. For example, when the expected value of the operator A running on the CPU and the GPU is the same, it can be determined that the operator A is running on the calculation processing unit with a large idle value of the calculation space. After the calculation of operator A is completed, its output is used as the input of operator B, and the calculation processing unit on which operator B operates is determined according to the same principle.
  • the state information of the calculation processing unit may also include other parameters, such as other parameters expressing the remaining processing capacity, or other parameters expressing the current running performance of the calculation processing unit, or expressing the current running power of the calculation processing unit Consumption parameters, etc.
  • the calculation performance parameters of the calculation processing unit (taking the free value of the calculation space as an example) and the expected values are respectively weighted and summed to obtain the score of the operator running on at least two calculation processing units
  • the calculation processing unit operates as an arithmetic processing unit.
  • the expected value of the operator running on the calculation processing unit and the free value of the calculation space of the calculation processing unit are comprehensively considered by weighted summation. The higher the expected value, the operator is more inclined to run on the calculation processing unit. The higher the idle value of the calculation space, the stronger the remaining processing capacity. The operator runs on the calculation processing unit at a relatively fast speed.
  • the expected value and the free value of the calculation space in this embodiment can be expressed by a numerical value of 0-100.
  • the higher the expected value the higher the expectation that the operator will run on the calculation processing unit.
  • the higher the idle value of the calculation space the stronger the current processing capability of the calculation processing unit.
  • the expected value and the free value of the calculation space can also be expressed as a percentage.
  • the operators in the model can be flexibly assigned to different calculation processing units for calculation according to different situations, which effectively uses the calculation processing unit and improves the calculation efficiency.
  • the embodiment of the present application does not specifically limit the method for obtaining the expected value. For example, it can be obtained from experience. For example, based on experience, we can know which calculation processing unit a certain type of operator is more suitable for. According to the operation efficiency of the operator on different calculation processing units, the higher the operation efficiency of the operator on the calculation processing unit, the corresponding expected value The higher. Based on this, the expected value of the operator running on different calculation processing units can be determined. As an optional embodiment of the present application, the expected value is obtained through statistics. In the embodiment of the present application, the expected value may be obtained by how often different operators in multiple sample models run on different operation processing units. The number of samples can be 3000 or 10000. In this embodiment, the number of samples is not specifically limited.
  • the converted model includes the expected value of each operator running on each calculation processing unit.
  • the expected value of the operator running on at least two calculation processing units can be obtained by the name of the operator to which the expected value belongs.
  • the neural network may be composed of N (N ⁇ 1) operators and their corresponding expected values. The expected value and the operator are related by "the name of the affiliated operator". The name of the operator to which the expected value belongs can be associated with the operator of the corresponding name.
  • the output of operator A is the input of operator B
  • the calculation processing unit of the terminal running the model includes a CPU and a GPU.
  • the converted model includes the expected values of each operator A and operator B running on the CPU and GPU. Read the expected value of the operator A running on the CPU and GPU in the converted model, and determine the calculation processing unit on which the operator A runs based on the free value of the computing space of the CPU and GPU. For example, the expected values of operator A running on the CPU and GPU are 80 and 60, the free values of the computing space of the CPU and GPU are 40 and 80, the weight of the expected value is 0.6, and the weight of the free value of the computing space is 0.4.
  • the score of operator A running on the CPU 0.6 * 80 + 0.4 * 40, that is, the score of operator A running on the CPU is 64.
  • the score of operator A running on the GPU 0.6 * 60 + 0.4 * 80, that is, the score of operator A running on the GPU is 68.
  • the score it is determined that the calculation processing unit operated by the operator A is the GPU.
  • the operation of operator A is completed, and its output is used as the input of operator B.
  • the expected value of B running on the CPU is 70, and the expected value of running on the GPU is 80.
  • the idle value of the computing space of the CPU and GPU is 70 and 50, respectively, the weight of the expected value is 0.6, and the weight of the idle value of the computing space is 0.4.
  • the score of operator B running on the CPU 0.6 * 70 + 0.4 * 70, that is, the score of operator A running on the CPU is 70.
  • the score of operator B running on the GPU 0.6 * 80 + 0.4 * 50, that is, the score of operator A running on the GPU is 68. Based on the score, it is determined that the calculation processing unit operated by the operator B is the CPU.
  • an embodiment of the present application provides a terminal data processing device that can implement the terminal data processing method of the foregoing embodiment.
  • the foregoing terminal data processing method embodiment can be used to understand and explain the following terminal data processing device ⁇ ⁇ ⁇ Examples.
  • FIG. 2 shows a schematic structural diagram of an embodiment of a terminal data processing device of the present application.
  • the terminal data processing apparatus of the embodiment of the present application includes:
  • Conversion unit 10 which is used to convert the trained neural network model according to the preset model structure
  • An analysis unit 20 which is used to analyze the converted neural network model
  • the determining unit 30 is used to obtain the expected value of the operator in the converted neural network model running on at least two calculation processing units and the state information of at least two calculation processing units;
  • the determination unit 30 is also used to determine a calculation processing unit in which the operator runs in the at least two calculation processing units based on the state information and the expected value.
  • the conversion unit 10 converts the trained neural network model into a preset model structure, the analysis unit 10 analyzes the converted model; the determination unit 30 obtains the converted neural network model The expected value of the operator of at least two calculation processing units; the determination unit 30 acquires the state information of the at least two calculation processing units; the determination unit 30 determines the calculation among the at least two calculation processing units based on the state information and the expected value Sub-run calculation processing unit.
  • the data processing device of the embodiment of the present application determines the calculation processing unit in which the operator operates based on the expected value of the operator running on the calculation processing unit and the state information of the calculation processing unit, taking into account the actual operating state of the calculation processing unit and the calculation process of the operator
  • the expected value of the operation on the unit can flexibly and reasonably allocate each operator in the neural network to the calculation processing unit for operation, and use the calculation processing unit reasonably and effectively to improve the calculation efficiency.
  • the method of the embodiment of the present application can make full use of the computing resources on the terminal.
  • the embodiment of the present application does not specifically limit the trained neural network model.
  • the trained neural network model may be, for example, a CNN model or an RNN model.
  • the trained neural network model is converted into a preset model structure.
  • the conversion unit 10 converts different types of neural network models into a preset model structure, so that the determination unit 30 reads the corresponding data in the model and assigns a calculation processing unit operated by the operator.
  • the analysis unit 20 analyzes the converted model. After parsing, the model can be quickly read to improve the running speed of the model.
  • the converted neural network model is parsed into memory.
  • the converted model is parsed into memory, and the model data can be quickly read. For example, quickly read the expected value of the operator running on the calculation processing unit in the model, and put the operator on the determined calculation processing unit to run.
  • the determining unit 30 obtains the expected value of the operator running on the at least two calculation processing units and the state information of the at least two calculation processing units, and determines the calculation processing unit on which the operator runs according to the expected value and the state information. In the embodiment of the present application, the determination unit 30 integrates the expected value of the operator running on each calculation processing unit and the status information of each calculation processing unit to determine the calculation processing unit that the operator runs, and the calculation processing unit that the operator runs can be flexibly allocated. Increasing the operating speed and calculation efficiency of the model can also improve the overall operating efficiency of the terminal.
  • the determining unit 30 obtains the expected value of the operator in the converted neural network model running on the at least two calculation processing units through a statistical method.
  • the determining unit 30 obtains the operator in the converted neural network model in the at least two calculations by the frequency of different operators in multiple sample models running on different calculation processing units The expected value of the operation on the processing unit.
  • the determining unit 30 obtains the expected value of the operator in the converted neural network model running on the at least two calculation processing units according to the operation efficiency of the operator on different calculation processing units.
  • the state information acquired by the determination unit 30 may include, for example, an operation performance parameter such as an idle value of an operation space of the calculation processing unit. The higher the idle value of the calculation space of the calculation processing unit, the stronger its remaining processing capacity.
  • the determination unit 30 weights and sums the calculation performance parameters of the calculation processing unit (taking the free value of the calculation space as an example) and the expected value to obtain the score of the operator running on at least two calculation processing units, The calculation processing unit corresponding to the highest score among the scores is used as the calculation processing unit of the operator.
  • the determination unit 30 comprehensively considers the expected value of the operator running on the calculation processing unit and the free value of the calculation space of the calculation processing unit by weighted summation. The higher the expected value, the operator is more inclined to run on the calculation processing unit. The higher the idle value of the calculation space, the stronger the remaining processing capacity. The operator runs on the calculation processing unit at a relatively fast speed.
  • the expected value and the free value of the calculation space in this embodiment can be expressed by a numerical value of 0-100.
  • the higher the expected value the higher the expectation that the operator will run on the calculation processing unit.
  • the higher the idle value of the calculation space the stronger the current processing capability of the calculation processing unit.
  • the expected value and the free value of the calculation space can also be expressed as a percentage.
  • the operators in the model can be flexibly allocated to different calculation processing units for calculation according to different situations. The calculation processing unit is effectively used and the calculation efficiency is improved.
  • the converted model includes the expected value of each operator running on each calculation processing unit.
  • the determining unit 30 determines the calculation processing unit that the operator runs, it can read the expected value of the operator running on each calculation processing unit in the converted model, and determine which calculation processing the operator performs in conjunction with the free value of the calculation space of each calculation processing unit Run on the unit.
  • the expected value of the operator running on at least two calculation processing units can be obtained by the name of the operator to which the expected value belongs.
  • the neural network may be composed of N (N ⁇ 1) operators and their corresponding expected values. The expected value and the operator are related by "the name of the affiliated operator". The name of the operator to which the expected value belongs can be associated with the operator of the corresponding name.
  • the parsing unit 20 parses the converted neural network model into memory.
  • At least two calculation processing units are selected from CPU, GPU, DSP or NPU.
  • division of "units” or “modules” in the embodiments of the present application is only a division of logical functions. In actual implementation, there may be additional divisions, such as multiple “units” "" Or “module” can be combined or can be integrated into a “unit” or “module” to achieve the corresponding function. Or a “unit” or “module” is decomposed into multiple common to achieve the corresponding function.
  • the “unit” or “module” in the embodiments of the present application may be software and / or hardware that can independently complete or cooperate with other components to complete specific functions, where the hardware may be, for example, FPGA (Field-Programmable Gate Array, field programmable gate Array), IC (Integrated Circuit, Integrated Circuit), etc., no longer elaborate here.
  • FPGA Field-Programmable Gate Array
  • IC Integrated Circuit, Integrated Circuit
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the method of any of the foregoing embodiments.
  • the computer-readable storage medium may include but is not limited to any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, micro drives, magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices , Magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and / or data.
  • the terminal 1000 may include: at least one processor 1001, at least one network interface 1004, user interface 1003, memory 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display) and a camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the processor 1001 may include one or more processing cores.
  • the processor 1001 uses various excuses and lines to connect various parts of the entire terminal 1000, executes the terminal by running or executing instructions, programs, code sets or instruction sets stored in the memory 1005, and calling data stored in the memory 1005 1000 various functions and processing data.
  • the processor 1001 may use at least one of digital signal processing (DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • DSP digital signal processing
  • FPGA field programmable gate array
  • PLA programmable logic array
  • the processor 1001 may integrate one or a combination of one of a central processing unit (Central Processing Unit, CPU), an image processing unit (Graphics Processing Unit, GPU), and a modem.
  • CPU central processing unit
  • GPU Graphics Processing Unit
  • modem modem
  • the CPU mainly handles the operating system, user interface and application programs, etc .
  • the GPU is used to render and draw the content that the display screen needs to display
  • the modem is used to handle wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 1001, and may be implemented by a chip alone.
  • the memory 1005 may include a random access memory (Random Access Memory, RAM), or a read-only memory (Read-Only Memory).
  • the memory 1005 includes a non-transitory computer-readable storage medium.
  • the memory 1005 may be used to store instructions, programs, codes, code sets, or instruction sets.
  • the memory 1005 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like for implementing the above method embodiments; the storage data area may store data and the like involved in the above method embodiments.
  • the memory 1005 may be at least one storage device located away from the foregoing processor 1001.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a terminal data processing application program.
  • the user interface 1003 is mainly used to provide an input interface for the user to obtain data input by the user; and the processor 1001 can be used to call the terminal data processing application stored in the memory 1005, and specifically Do the following:
  • a calculation processing unit in which the operator operates is determined in the at least two calculation processing units.
  • the processor 1001 specifically performs the following operations when executing the acquiring the expected value of the operator in the converted neural network model running on the at least two calculation processing units:
  • a statistical method is used to obtain the expected value of the operator in the converted neural network model running on the at least two calculation processing units.
  • the processor 1001 specifically performs the following operations when executing the expected value of the operator in the converted neural network model obtained by the statistical method running on the at least two calculation processing units :
  • the expected value of the operator in the converted neural network model running on the at least two calculation processing units is obtained by the frequency of different operators in multiple sample models running on different calculation processing units.
  • the processor 1001 specifically performs the following operations when executing the expected value of the operator in the converted neural network model obtained by the statistical method running on the at least two calculation processing units :
  • the expected value of the operator in the converted neural network model running on the at least two calculation processing units is obtained.
  • the state information includes: the free value of the calculation space of the calculation processing unit.
  • the processor 1001 executes the calculation processing unit that determines the operation of the operator in the at least two calculation processing units according to the state information and the expected value, specifically executes the following operating:
  • the calculation processing unit corresponding to the highest score in the scores is used as the calculation processing unit operated by the operator.
  • the processor 1001 specifically performs the following operations when executing the acquiring the expected value of the operator in the converted neural network model running on the at least two calculation processing units:
  • the expected value of the operator running on at least two of the calculation processing units is obtained by the name of the operator to which the expected value belongs.
  • the processor 1001 also performs the following operation: parsing the converted neural network model into memory.
  • the at least two calculation processing units are selected from CPU, GPU, DSP or NPU.
  • the trained neural network model is converted according to a preset model structure; the converted neural network model is parsed; and the operator in the converted neural network model is acquired in at least two calculation processing units The expected value of the operation; acquiring the status information of at least two of the calculation processing units; and determining the calculation processing unit of the operator in the at least two calculation processing units based on the state information and the expected value.
  • the data processing method of the embodiment of the present application determines the calculation processing unit in which the operator operates based on the expected value of the operator running on the calculation processing unit and the state information of the calculation processing unit, taking into account the actual operating state of the calculation processing unit and the calculation process of the operator
  • the expected value of the operation on the unit can flexibly and reasonably allocate each operator in the neural network to the calculation processing unit for operation, and use the calculation processing unit reasonably and effectively to improve the calculation efficiency.
  • the method of the embodiment of the present application can make full use of the computing resources on the terminal.
  • connection may be a fixed connection, a detachable connection, or an integral connection; It is directly connected, or indirectly connected through an intermediary.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种终端数据处理方法、装置和终端,其中终端数据处理方法包括:将训练好的神经网络模型按预设模型结构转换;对转换后的所述神经网络模型进行解析;获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值;获取所述至少两个计算处理单元的状态信息;根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。该数据处理方法根据算子的期望值及计算处理单元的状态信息,能够将神经网络中的各个算子灵活、合理地分配到不同的计算处理单元上进行计算,有效利用计算处理单元,提升计算效率。

Description

终端数据处理方法、装置及终端 技术领域
本申请涉及计算机技术领域,具体而言,涉及一种终端数据处理方法、装置及终端。
背景技术
本申请对于背景技术的描述属于与本申请相关的相关技术,仅仅是用于说明和便于理解本申请的申请内容,不应理解为申请人明确认为或推定申请人认为是本申请在首次提出申请的申请日的现有技术。
深度学习在搜索技术、数据挖掘、自然语言处理、语音、推荐等领域的作用越来越大。一般的神经网络模型在终端上进行深度学习的计算时,需要将神经网络模型包含的各算子分配至终端的计算处理单元上运行,当分配不合理时,影响计算效率。
发明内容
本申请实施例提供了一种终端数据处理方法、装置及终端,可以提升计算效率。
第一方面,本申请提供了一种终端数据处理方法,所述终端包括至少两个计算处理单元,所述方法包括:
将训练好的神经网络模型按预设模型结构转换;
对转换后的所述神经网络模型进行解析;
获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值;
获取所述至少两个计算处理单元的状态信息;
根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
第二方面,本申请实施例提供了一种终端数据处理装置,包括:
转换单元,其用于将训练好的神经网络模型按预设模型结构转换;
解析单元,其用于对转换后的所述神经网络模型进行解析;
确定单元,其用于获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,及所述至少两个计算处理单元的状态信息;
所述确定单元,其还用于根据所述状态信息及所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
第三方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一项方法的步骤。
第四方面,本申请实施例提供了一种终端,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项方法的步骤。
本申请实施例具有如下有益效果:
本申请实施例提供的终端数据处理方法,将训练好的神经网络模型按预设模型结构转换;对转换后的神经网络模型进行解析;获取所述转换后的神经网络模型中的算子在至少两个计算处理单元上运行的期望值;获取至少两个所述计算处理单元的状态信息;根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。本申请实施例的数据处理方法根据算子在计算处理单元上运行的期望值及计算处理单元的状态信息确定算子运行的计算处理单元,兼顾了计算处理单元的实际运行状态和算子在计算处理单元上运行的期望值,能够将神经网络中的各个算子灵活、合理地分配到计算处理单元上进行运算,合理、有效利用计算处理单元,提升计算效率。
附图说明
图1示出了本申请终端数据处理方法的一实施例的流程图;
图2示出了本申请终端数据处理装置的一实施例的结构示意图;
图3示出了本申请一个实施例的终端的结构示意图。
具体实施方式
下面结合具体实施例对本申请作进一步详细描述,但不作为对本申请的限定。在下述说明中,不同的“一实施例”或“实施例”指的不一定是同一实施例。此外,一或多个实施例中的特定特征、结构、或特点可由任何合适形式组合。
现有技术中,对于神经网络的深度学习计算,都是预先配置好运行的计算处理单元,配置好后所有的运算都放在一个计算处理单元上运行,计算效率不高。或者,智能配置计算时是功耗优先还是速度优先,然后根据其中一种优先策略去确定将神经网络的算子分配至哪一个计算处理单元上,处理方式单一局限,不灵活。
第一方面,本申请实施例提供了一种终端数据处理方法,图1示出了本申请终端数据处理方法的一实施例的流程图。参见图1,该方法具体包括:
将训练好的神经网络模型按预设模型结构转换;
对转换后的所述神经网络模型进行解析;
获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值;
获取所述至少两个计算处理单元的状态信息;
根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
本申请实施例提供的终端数据处理方法,将训练好的神经网络模型按预设模型结构转换;对转换后的神经网络模型进行解析;获取所述转换后的神经网络模型中的算子在至少两个计算处理单元上运行的期望值;获取至少两个所述计算处理单元的状态信息;根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。本申请实施例的数据处理方法根据算子在计算处理单元上运行的期望值及计算处理单元的状态信息确定算子运行的计算处理单元,兼顾了计算处理单元的实际运行状态和算子在计算处理单元上运行的期望值,能够将神经网络中的各个算子灵活、合理地分配到计算处理单元上进行运算,合理、有效利用计算处理单元,提升计算效率。本申请实施例的方法能够充分地利用终端上的计算资源。
本申请实施例中的终端包括至少两个计算处理单元。本申请实施例中的 终端包括但不限于手机、平板电脑、膝上型计算机等。
本申请实施例中的终端上的计算处理单元例如可以包括CPU、GPU(图形处理器,Graphics Processing Unit)、DSP(数字信号处理器,digital singnal processor)和NPU(网络处理器,neural-network process units)等。本申请实施例中的终端包括的至少两个计算处理单元可以选自上述或其他未提及的具体的计算处理单元。
本申请实施例对训练好的神经网络模型不作具体限定。训练好的神经网络模型例如可以是卷积神经网络(Convolutional Neural Networks,CNN)模型,也可以是循环神经网络(Recurrent Neural Network,RNN)等。本申请实施例中,将训练好的神经网络模型转换成预设模型结构。本申请实施例中将不同类型的神经网络模型转换成预设模型结构,以便于对其进行读取和分配其算子运行的计算处理单元。其中,不同类型的神经网络模型采用开源代码表示,转换后的预设模型结构采用自定义代码表示。
本申请实施例中,对转换后的模型进行解析。解析后可以对模型快速读取,提高模型的运行速度。例如,本申请可选实施例中,将转换后的模型解析至内存中,可以快速读取模型数据。例如快速读取模型中算子在计算处理单元上运行的期望值,以及将算子放到确定的计算处理单元上运行等。
本申请实施例中,获取算子在至少两个计算处理单元上运行的期望值及至少两个计算处理单元的状态信息,根据期望值及状态信息确定算子运行的计算处理单元。本申请实施例中,综合算子在各计算处理单元上运行的期望值和各计算处理单元的状态信息确定算子运行的计算处理单元,可以灵活分配算子运行的计算处理单元,可以提高模型的运行速度和计算效率,还可以提高终端的整体运行效率。
本申请实施例对计算处理单元的状态信息不作具体限定。作为本申请可选实施例,状态信息例如可以包括计算处理单元的运算空间空闲值等运算性能参数。计算处理单元的运算空间空闲值越高,其剩余处理能力越强。例如,本申请一实施例中的终端包括CPU(中央处理器)和GPU(图形处理器)两个计算处理单元;训练好的神经网络模型按预设模型结构进行转换,该模型中包括算子A和算子B,算子A的输出为算子B的输入,转换后的模型中包括 各算子(算子A和算子B)在各计算处理单元(CPU和GPU)上运行的期望值。读取转换后的模型中算子A在CPU和GPU上运行的期望值,结合CPU和GPU的运算空间空闲值确定算子A在哪个计算处理单元上运行。例如,算子A在CPU和GPU上运行的期望值相同时,可以确定算子A在运算空间空闲值较大的计算处理单元上运行。算子A计算完毕,其输出作为算子B的输入,按同样的道理确定算子B在哪个计算处理单元上运行。当然,本申请实施例中,计算处理单元的状态信息也可以包括其他的参数,例如其他表达剩余处理能力的参数,或其他表达计算处理单元当前运行性能的参数,或表达计算处理单元当前运行功耗的参数等。
本申请可选实施例中,对计算处理单元的运算性能参数(以运算空间空闲值为例)和期望值分别加权求和得到算子在至少两个计算处理单元上运行的评分,将评分最高的计算处理单元作为算子运行的计算处理单元。本申请实施例中,通过加权求和的方式综合考虑算子在计算处理单元上运行的期望值和计算处理单元的运算空间空闲值。期望值越高说明算子更倾向于在该计算处理单元上运行,运算空间空闲值越高,其剩余处理能力越强,算子在该计算处理单元上运行的速度相当较快。本实施例中的期望值和运算空间空闲值可以用0-100的数值表达,期望值越高,表明算子在该计算处理单元上运行的期望越高。运算空间空闲值越高,表明计算处理单元的当前处理能力越强。当然,期望值和运算空间空闲值也可以用百分数表达。本实施例通过加权求和,合理分配权重比,可以根据不同情况将模型中的算子灵活分配至不同的计算处理单元上进行运算,有效利用了运算处理单元,提升了运算效率。
本申请实施例对期望值的获取方法不作具体限定。例如可以由经验获得等等。例如根据经验可以知道某一类型的算子更适合在哪个计算处理单元上运行,根据算子在不同计算处理单元上的运行效率,算子在计算处理单元上的运行效率越高,相应的期望值也越高。据此可以确定该算子在不同计算处理单元上运行的期望值。作为本申请可选实施例,期望值通过统计获得。本申请实施例中,期望值可以是通过多个样本模型中的不同算子在不同运算处理单元上运行的频率得出。其中样本数量可以是3000个,也可以是10000个等。本实施例对样本的数量不作具体限定。
本申请实施例中,转换后的模型中包括各算子在各计算处理单元上运行的期望值。确定算子运行的计算处理单元时,可以读取转换后的模型中算子在各计算处理单元上运行的期望值,结合各计算处理单元的运算空间空闲值确定算子在哪个计算处理单元上运行。本申请可选实施例中,可以通过期望值隶属的算子的名称获取算子在至少两个计算处理单元上运行的期望值。在预设模型结构中神经网络可能由N(N≥1)个算子以及它对应的期望值组成。期望值和算子用“隶属的算子的名称”进行关联。通过期望值隶属的算子的名称可以与相应名称的算子关联。
以模型中包括算子A和算子B为例,算子A的输出为算子B的输入,运行该模型的终端的计算处理单元包括CPU和GPU。转换后的模型中包括各算子A和算子B在CPU和GPU上运行的期望值。读取转换后的模型中算子A在CPU和GPU上运行的期望值,结合CPU和GPU的运算空间空闲值确定算子A在哪个计算处理单元上运行。例如,算子A在CPU和GPU上运行的期望值分别为80和60,CPU和GPU的运算空间空闲值分别为40和80,期望值的权重为0.6,运算空间空闲值的权重为0.4。算子A在CPU上运行的评分=0.6*80+0.4*40,即算子A在CPU上运行的评分为64。算子A在GPU上运行的评分=0.6*60+0.4*80,即算子A在GPU上运行的评分为68。根据评分,确定算子A运行的计算处理单元是GPU。算子A运算完成,其输出作为算子B的输入,确定算子B在哪个计算处理单元上运行时,读取转换后的模型中算子B在CPU和GPU上运行的期望值,其中算子B在CPU上运行的期望值为70,在GPU上运行的期望值80,此时CPU和GPU的运算空间空闲值分别为70和50,期望值的权重为0.6,运算空间空闲值的权重为0.4。算子B在CPU上运行的评分=0.6*70+0.4*70,即算子A在CPU上运行的评分为70。算子B在GPU上运行的评分=0.6*80+0.4*50,即算子A在GPU上运行的评分为68。根据评分,确定算子B运行的计算处理单元是CPU。
第二方面,本申请实施例提供了一种终端数据处理装置,该终端数据处理装置可实现上述实施例终端数据处理方法,上述终端数据处理方法的实施例可用于理解和说明以下终端数据处理装置的实施例。
图2示出了本申请终端数据处理装置的一实施例的结构示意图。参见图 2,本申请实施例的终端数据处理装置包括:
转换单元10,其用于将训练好的神经网络模型按预设模型结构转换;
解析单元20,其用于对转换后的神经网络模型进行解析;
确定单元30,其用于获取转换后的神经网络模型中的算子在至少两个计算处理单元上运行的期望值,及至少两个计算处理单元的状态信息;
所述确定单元30,其还用于根据状态信息及期望值,在所述至少两个计算处理单元中确定算子运行的计算处理单元。
本申请实施例提供的终端数据处理装置,转换单元10将训练好的神经网络模型转换成预设模型结构,解析单元10对转换后的模型进行解析;确定单元30获取转换后的神经网络模型中的算子在至少两个计算处理单元上运行的期望值;确定单元30获取至少两个计算处理单元的状态信息;确定单元30根据状态信息和期望值,在所述至少两个计算处理单元中确定算子运行的计算处理单元。本申请实施例的数据处理装置根据算子在计算处理单元上运行的期望值及计算处理单元的状态信息确定算子运行的计算处理单元,兼顾了计算处理单元的实际运行状态和算子在计算处理单元上运行的期望值,能够将神经网络中的各个算子灵活、合理地分配到计算处理单元上进行运算,合理、有效利用计算处理单元,提升计算效率。本申请实施例的方法能够充分地利用终端上的计算资源。
本申请实施例对训练好的神经网络模型不作具体限定。训练好的神经网络模型例如可以是CNN模型,也可以是RNN模型等。本申请实施例中,将训练好的神经网络模型转换成预设模型结构。本申请实施例中转换单元10将不同类型的神经网络模型转换成预设模型结构,以便于确定单元30读取模型中的相应的数据,分配算子运行的计算处理单元。
本申请实施例中,解析单元20对转换后的模型进行解析。解析后可以对模型快速读取,提高模型的运行速度。例如,本申请可选实施例中,将转换后的神经网络模型解析至内存中。将转换后的模型解析至内存中,可以快速读取模型数据。例如快速读取模型中算子在计算处理单元上运行的期望值,以及将算子放到确定的计算处理单元上运行等。
本申请实施例中,确定单元30获取算子在至少两个计算处理单元上运行 的期望值及至少两个计算处理单元的状态信息,根据期望值及状态信息确定算子运行的计算处理单元。本申请实施例中,确定单元30综合算子在各计算处理单元上运行的期望值和各计算处理单元的状态信息确定算子运行的计算处理单元,可以灵活分配算子运行的计算处理单元,可以提高模型的运行速度和计算效率,还可以提高终端的整体运行效率。
本申请可行实施例中,确定单元30通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
本申请可行实施例中,确定单元30通过多个样本模型中的不同算子在不同计算处理单元上运行的频率,获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
本申请可行实施例中,确定单元30根据算子在不同计算处理单元上的运行效率,获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
本申请实施例对计算处理单元的状态信息不作具体限定。作为本申请可选实施例,确定单元30获取的状态信息例如可以包括计算处理单元的运算空间空闲值等运算性能参数。计算处理单元的运算空间空闲值越高,其剩余处理能力越强。
本申请可选实施例中,确定单元30对计算处理单元的运算性能参数(以运算空间空闲值为例)和期望值分别加权求和,得到算子在至少两个计算处理单元上运行的评分,将评分中最高评分对应的计算处理单元作为算子运行的计算处理单元。本申请实施例中,确定单元30通过加权求和的方式综合考虑算子在计算处理单元上运行的期望值和计算处理单元的运算空间空闲值。期望值越高说明算子更倾向于在该计算处理单元上运行,运算空间空闲值越高,其剩余处理能力越强,算子在该计算处理单元上运行的速度相当较快。本实施例中的期望值和运算空间空闲值可以用0-100的数值表达,期望值越高,表明算子在该计算处理单元上运行的期望越高。运算空间空闲值越高,表明计算处理单元的当前处理能力越强。当然,期望值和运算空间空闲值也可以用百分数表达。本实施例通过加权求和,合理分配权重比,可以根据不同情况将模型中的算子灵活分配至不同的计算处理单元上进行运算,有效利 用了运算处理单元,提升了运算效率。
本申请实施例中,转换后的模型中包括各算子在各计算处理单元上运行的期望值。确定单元30确定算子运行的计算处理单元时,可以读取转换后的模型中算子在各计算处理单元上运行的期望值,结合各计算处理单元的运算空间空闲值确定算子在哪个计算处理单元上运行。本申请可选实施例中,可以通过期望值隶属的算子的名称获取算子在至少两个计算处理单元上运行的期望值。在预设模型结构中神经网络可能由N(N≥1)个算子以及它对应的期望值组成。期望值和算子用“隶属的算子的名称”进行关联。通过期望值隶属的算子的名称可以与相应名称的算子关联。
本申请可选实施例中,解析单元20将所述转换后的神经网络模型解析至内存中。
本申请可选实施例中,至少两个计算处理单元选自CPU、GPU、DSP或NPU。
本领域的技术人员可以清楚地了解到本申请实施例中的“单元”或“模块”的划分,仅仅为一种逻辑功能的划分,实际实现时可以有另外的划分方式,例如多个“单元”或“模块”可以结合或者可以集成为一个“单元”或“模块”实现相应的功能。或者一个“单元”或“模块”分解为多个共同实现相应的功能。本申请实施例中的“单元”或“模块”可以是能够独立完成或与其他部件配合完成特定功能的软件和/或硬件,其中硬件例如可以是FPGA(Field-Programmable Gate Array,现场可编程门阵列)、IC(Integrated Circuit,集成电路)等,在此不再一一赘述。
第三方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例方法的步骤。其中,计算机可读存储介质可以包括但不限于任何类型的盘,包括软盘、光盘、DVD、CD-ROM、微型驱动器以及磁光盘、ROM、RAM、EPROM、EEPROM、DRAM、VRAM、闪速存储器设备、磁卡或光卡、纳米系统(包括分子存储器IC),或适合于存储指令和/或数据的任何类型的媒介或设备。
第四方面,请参见图3,为本申请实施例提供了一种终端的结构示意图。如图3所示,所述终端1000可以包括:至少一个处理器1001,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。
其中,通信总线1002用于实现这些组件之间的连接通信。
其中,用户接口1003可以包括显示屏(Display)、摄像头(Camera),可选用户接口1003还可以包括标准的有线接口、无线接口。
其中,网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
其中,处理器1001可以包括一个或者多个处理核心。处理器1001利用各种借口和线路连接整个终端1000内的各个部分,通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集,以及调用存储在存储器1005内的数据,执行终端1000的各种功能和处理数据。可选的,处理器1001可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1001可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1001中,单独通过一块芯片进行实现。
其中,存储器1005可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器1005包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图3所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及终端数据处理应用程序。
在图3所示的终端1000中,用户接口1003主要用于为用户提供输入的 接口,获取用户输入的数据;而处理器1001可以用于调用存储器1005中存储的终端数据处理应用程序,并具体执行以下操作:
将训练好的神经网络模型按预设模型结构转换;
对转换后的所述神经网络模型进行解析;
获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值;
获取所述至少两个计算处理单元的状态信息;
根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
在一个实施例中,所述处理器1001在执行所述获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值时,具体执行以下操作:
通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
在一个实施例中,所述处理器1001在执行所述通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值时,具体执行以下操作:
通过多个样本模型中的不同算子在不同计算处理单元上运行的频率,获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
在一个实施例中,所述处理器1001在执行所述通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值时,具体执行以下操作:
根据算子在不同计算处理单元上的运行效率,获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
在一个实施例中,状态信息包括:所述计算处理单元的运算空间空闲值。
在一个实施例中,所述处理器1001在执行所述根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元时,具体执行以下操作:
对所述至少两个计算处理单元中各计算处理单元的运算空间空闲值和期望值分别加权求和,得到所述算子在所述各计算处理单元上运行的评分;
将所述评分中最高评分对应的计算处理单元作为所述算子运行的计算处理单元。
在一个实施例中,所述处理器1001在执行所述获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值时,具体执行以下操作:
通过期望值隶属的算子的名称获取所述算子在至少两个所述计算处理单元上运行的期望值。
在一个实施例中,所述处理器1001还执行以下操作:将所述转换后的神经网络模型解析至内存中。
在一个实施例中,所述至少两个计算处理单元选自CPU、GPU、DSP或NPU。
本申请实施例中,将训练好的神经网络模型按预设模型结构转换;对转换后的神经网络模型进行解析;获取所述转换后的神经网络模型中的算子在至少两个计算处理单元上运行的期望值;获取至少两个所述计算处理单元的状态信息;根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。本申请实施例的数据处理方法根据算子在计算处理单元上运行的期望值及计算处理单元的状态信息确定算子运行的计算处理单元,兼顾了计算处理单元的实际运行状态和算子在计算处理单元上运行的期望值,能够将神经网络中的各个算子灵活、合理地分配到计算处理单元上进行运算,合理、有效利用计算处理单元,提升计算效率。本申请实施例的方法能够充分地利用终端上的计算资源。
在本申请中,术语“第一”、“第二”等仅用于描述的目的,而不能理解为指示或暗示相对重要性或顺序;术语“多个”则指两个或两个以上,除非另有明确的限定。术语“安装”、“相连”、“连接”、“固定”等术语均应做广义理解,例如,“连接”可以是固定连接,也可以是可拆卸连接,或一体地连接;“相连”可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。
本申请的描述中,需要理解的是,术语“上”、“下”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或单元必须具有特定的方向、以特定的方位构造和操作,因此,不能理解为对本申请的限制。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种终端数据处理方法,所述终端包括至少两个计算处理单元,其特征在于,所述方法包括:
    将训练好的神经网络模型按预设模型结构转换;
    对转换后的所述神经网络模型进行解析;
    获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值;
    获取所述至少两个计算处理单元的状态信息;
    根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,包括:
    通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  3. 根据权利要求2所述的方法,其特征在于,所述通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,包括:
    通过多个样本模型中的不同算子在不同计算处理单元上运行的频率,获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  4. 根据权利要求2所述的方法,其特征在于,所述通过统计方法获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,包括:
    根据算子在不同计算处理单元上的运行效率,获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  5. 根据权利要求1所述的方法,其特征在于,所述状态信息包括:所述计算处理单元的运算空间空闲值。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述状态信息和所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元,包括:
    对所述至少两个计算处理单元中各计算处理单元的运算空间空闲值和期望值分别加权求和,得到所述算子在所述各计算处理单元上运行的评分;
    将所述评分中最高评分对应的计算处理单元作为所述算子运行的计算处理单元。
  7. 根据权利要求1所述的方法,其特征在于,所述获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,包括:
    通过期望值隶属的算子的名称获取所述算子在至少两个所述计算处理单元上运行的期望值。
  8. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述转换后的神经网络模型解析至内存中。
  9. 根据权利要求1所述的方法,其特征在于,所述至少两个计算处理单元选自CPU、GPU、DSP或NPU。
  10. 一种终端数据处理装置,其特征在于,包括:
    转换单元,其用于将训练好的神经网络模型按预设模型结构转换;
    解析单元,其用于对转换后的所述神经网络模型进行解析;
    确定单元,其用于获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值,及所述至少两个计算处理单元的状态信息;
    所述确定单元,其还用于根据所述状态信息及所述期望值,在所述至少两个计算处理单元中确定所述算子运行的计算处理单元。
  11. 根据权利要求10所述的装置,其特征在于,所述确定单元,具体用于:
    通过统计装置获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  12. 根据权利要求11所述的装置,其特征在于,所述确定单元,具体用于:
    通过多个样本模型中的不同算子在不同计算处理单元上运行的频率,获得所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  13. 根据权利要求11所述的装置,其特征在于,所述确定单元,具体用于:
    根据算子在不同计算处理单元上的运行效率,获取所述转换后的神经网络模型中的算子在所述至少两个计算处理单元上运行的期望值。
  14. 根据权利要求10所述的装置,其特征在于,所述状态信息包括:所述计算处理单元的运算空间空闲值。
  15. 根据权利要求14所述的装置,其特征在于,所述确定单元,具体用于:
    对所述至少两个计算处理单元中各计算处理单元的运算空间空闲值和期望值分别加权求和,得到所述算子在所述各计算处理单元上运行的评分;
    将所述评分中最高评分对应的计算处理单元作为所述算子运行的计算处理单元。
  16. 根据权利要求10所述的装置,其特征在于,所述确定单元,具体用于:
    通过期望值隶属的算子的名称获取所述算子在至少两个所述计算处理单元上运行的期望值。
  17. 根据权利要求10所述的装置,其特征在于,所述解析单元,具体用于:
    将所述转换后的神经网络模型解析至内存中。
  18. 根据权利要求10所述的装置,其特征在于,所述至少两个计算处理单元选自CPU、GPU、DSP或NPU。
  19. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现所述权利要求1-9中任一项所述方法的步骤。
  20. 一种终端,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现所述权利要求1-9中任一项所述方法的步骤。
PCT/CN2019/109609 2018-11-13 2019-09-30 终端数据处理方法、装置及终端 WO2020098414A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811349645.0 2018-11-13
CN201811349645.0A CN109523022B (zh) 2018-11-13 2018-11-13 终端数据处理方法、装置及终端

Publications (1)

Publication Number Publication Date
WO2020098414A1 true WO2020098414A1 (zh) 2020-05-22

Family

ID=65776212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/109609 WO2020098414A1 (zh) 2018-11-13 2019-09-30 终端数据处理方法、装置及终端

Country Status (2)

Country Link
CN (1) CN109523022B (zh)
WO (1) WO2020098414A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882038A (zh) * 2020-07-24 2020-11-03 深圳力维智联技术有限公司 模型转换方法与装置
CN112463158A (zh) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 编译方法、装置、电子设备和存储介质
CN113806095A (zh) * 2021-09-23 2021-12-17 广州极飞科技股份有限公司 一种网络模型部署方法、装置、存储介质及边缘设备
CN115292053A (zh) * 2022-09-30 2022-11-04 苏州速显微电子科技有限公司 移动端cnn的cpu、gpu、npu统一调度方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523022B (zh) * 2018-11-13 2022-04-05 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端
CN111028226A (zh) * 2019-12-16 2020-04-17 北京百度网讯科技有限公司 算法移植的方法及装置
CN111340237B (zh) * 2020-03-05 2024-04-26 腾讯科技(深圳)有限公司 数据处理和模型运行方法、装置和计算机设备
CN111782403B (zh) * 2020-07-17 2022-04-19 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049509A (zh) * 2015-07-23 2015-11-11 浪潮电子信息产业股份有限公司 一种集群调度方法、负载均衡器以及集群系统
CN105872061A (zh) * 2016-04-01 2016-08-17 浪潮电子信息产业股份有限公司 一种服务器集群管理方法、装置及系统
CN106779050A (zh) * 2016-11-24 2017-05-31 厦门中控生物识别信息技术有限公司 一种卷积神经网络的优化方法和装置
WO2017185285A1 (zh) * 2016-04-28 2017-11-02 华为技术有限公司 图形处理器任务的分配方法和装置
CN108734293A (zh) * 2017-04-13 2018-11-02 北京京东尚科信息技术有限公司 任务管理系统、方法和装置
CN109523022A (zh) * 2018-11-13 2019-03-26 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101902295B (zh) * 2009-05-26 2013-08-21 国际商业机器公司 控制应用服务器的负载接收速率的方法及装置
CN103713949A (zh) * 2012-10-09 2014-04-09 鸿富锦精密工业(深圳)有限公司 动态任务分配系统及方法
CN103135132B (zh) * 2013-01-15 2015-07-01 中国科学院地质与地球物理研究所 Cpu/gpu协同并行计算的混合域全波形反演方法
CN104598425B (zh) * 2013-10-31 2018-03-13 中国石油天然气集团公司 一种通用多机并行计算方法及系统
CN104699697B (zh) * 2013-12-04 2017-11-21 中国移动通信集团天津有限公司 一种数据处理方法及装置
CN106155635B (zh) * 2015-04-03 2020-09-18 北京奇虎科技有限公司 一种数据处理方法和装置
CN104794194B (zh) * 2015-04-17 2018-10-26 同济大学 一种面向大规模多媒体检索的分布式异构并行计算系统
CN106789421B (zh) * 2016-12-17 2020-06-05 深圳中广核工程设计有限公司 协同设计的方法和装置
CN108595272B (zh) * 2018-05-02 2020-11-27 厦门集微科技有限公司 一种请求分发方法和装置、计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105049509A (zh) * 2015-07-23 2015-11-11 浪潮电子信息产业股份有限公司 一种集群调度方法、负载均衡器以及集群系统
CN105872061A (zh) * 2016-04-01 2016-08-17 浪潮电子信息产业股份有限公司 一种服务器集群管理方法、装置及系统
WO2017185285A1 (zh) * 2016-04-28 2017-11-02 华为技术有限公司 图形处理器任务的分配方法和装置
CN106779050A (zh) * 2016-11-24 2017-05-31 厦门中控生物识别信息技术有限公司 一种卷积神经网络的优化方法和装置
CN108734293A (zh) * 2017-04-13 2018-11-02 北京京东尚科信息技术有限公司 任务管理系统、方法和装置
CN109523022A (zh) * 2018-11-13 2019-03-26 Oppo广东移动通信有限公司 终端数据处理方法、装置及终端

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882038A (zh) * 2020-07-24 2020-11-03 深圳力维智联技术有限公司 模型转换方法与装置
CN112463158A (zh) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 编译方法、装置、电子设备和存储介质
CN112463158B (zh) * 2020-11-25 2023-05-23 安徽寒武纪信息科技有限公司 编译方法、装置、电子设备和存储介质
CN113806095A (zh) * 2021-09-23 2021-12-17 广州极飞科技股份有限公司 一种网络模型部署方法、装置、存储介质及边缘设备
CN115292053A (zh) * 2022-09-30 2022-11-04 苏州速显微电子科技有限公司 移动端cnn的cpu、gpu、npu统一调度方法

Also Published As

Publication number Publication date
CN109523022B (zh) 2022-04-05
CN109523022A (zh) 2019-03-26

Similar Documents

Publication Publication Date Title
WO2020098414A1 (zh) 终端数据处理方法、装置及终端
CN108255653B (zh) 一种产品的测试方法及其终端
Xu et al. The case for FPGA-based edge computing
WO2018233438A1 (zh) 人脸特征点跟踪方法、装置、存储介质及设备
WO2020000867A1 (zh) 一种答案提供方法及设备
US20190050265A1 (en) Methods and apparatus for allocating a workload to an accelerator using machine learning
WO2019042180A1 (zh) 资源配置方法及相关产品
WO2019062405A1 (zh) 应用程序的处理方法、装置、存储介质及电子设备
KR20200037602A (ko) 인공 신경망 선택 장치 및 방법
US20230259712A1 (en) Sound effect adding method and apparatus, storage medium, and electronic device
US20220215177A1 (en) Method and system for processing sentence, and electronic device
KR20210151730A (ko) 메모리 할당 방법, 장치 및 전자 기기
CN114244821B (zh) 数据处理方法、装置、设备、电子设备和存储介质
WO2020134547A1 (zh) 数据的定点化加速方法、装置、电子设备及存储介质
CN110019648B (zh) 一种训练数据的方法、装置及存储介质
Folino et al. Automatic offloading of mobile applications into the cloud by means of genetic programming
WO2019062404A1 (zh) 应用程序的处理方法、装置、存储介质及电子设备
US20220414474A1 (en) Search method, electronic device and storage medium based on neural network model
CN107291543B (zh) 应用程序处理方法、装置、存储介质和终端
US20230206113A1 (en) Feature management for machine learning system
US20220011852A1 (en) Methods and apparatus to align network traffic to improve power consumption
CN112765022B (zh) 一种基于数据流的Webshell静态检测方法及电子设备
JP2022078286A (ja) データ処理モデルのトレーニング方法、装置、電子機器及び記憶媒体
CN109002498A (zh) 人机对话方法、装置、设备及存储介质
CN116467153A (zh) 数据处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19883977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19883977

Country of ref document: EP

Kind code of ref document: A1