CN113408718A - Device processor selection method, system, terminal device and storage medium - Google Patents

Device processor selection method, system, terminal device and storage medium Download PDF

Info

Publication number
CN113408718A
CN113408718A CN202110633567.2A CN202110633567A CN113408718A CN 113408718 A CN113408718 A CN 113408718A CN 202110633567 A CN202110633567 A CN 202110633567A CN 113408718 A CN113408718 A CN 113408718A
Authority
CN
China
Prior art keywords
value
neural network
sample
variable value
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110633567.2A
Other languages
Chinese (zh)
Inventor
陈志杰
李跃文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN202110633567.2A priority Critical patent/CN113408718A/en
Publication of CN113408718A publication Critical patent/CN113408718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a method, a system, a terminal device and a storage medium for selecting a device processor, wherein the method comprises the following steps: determining a calculated quantity variable value of the neural network to be deployed according to the calculated quantity information of the neural network to be deployed; acquiring equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information; and determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value. The invention can automatically select the processor on the target equipment based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment and the performance difference of the neural network to be deployed between different processors, and does not need to adopt a manual test mode to select the processor, thereby improving the accuracy and the efficiency of the equipment processor selection.

Description

Device processor selection method, system, terminal device and storage medium
Technical Field
The invention belongs to the field of electronic communication, and particularly relates to a device processor selection method, a system, a terminal device and a storage medium.
Background
With the development of the times and the progress of science and technology, the functions of the processors are more and more powerful, and the effect of providing the performance of the processors can be achieved by deploying a plurality of processors on the terminal equipment, for example, a Central Processing Unit (CPU) and a Graphic Processing Unit (GPU) are deployed on a mobile terminal mobile phone, the division of the CPU and the GPU is different, the CPU is good at logic control, the GPU is good at parallel operation, and a lot of reasoning of deep learning is put on the GPU for operation. However, because of the influence of factors such as the calculation amount in the neural network, the inference of the neural network on the GPU is not necessarily faster than that of the CPU, and therefore, the problem of selecting the device processor is more and more emphasized by people in view of the inference process of the neural network.
At present, the selection of the device processor is performed in a manual testing manner, that is, the running time of the neural network on different processors of the same device is manually tested, and the processor on the device is selected based on the tested running time.
Disclosure of Invention
Embodiments of the present invention provide a method, a system, a terminal device and a storage medium for selecting a device processor, and aim to solve the problem of low efficiency in selecting the device processor due to selection of the device processor by a manual test mode in the existing device processor selection process.
The embodiment of the invention is realized in such a way that a device processor selection method comprises the following steps:
acquiring the calculated quantity information of a neural network to be deployed, and determining the calculated quantity variable value of the neural network to be deployed according to the calculated quantity information, wherein the calculated quantity information comprises the total calculated quantity and the convolution calculated quantity of the neural network to be deployed, and the calculated quantity variable value is used for representing the complexity of the calculated quantity of the neural network to be deployed;
acquiring equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information, wherein the equipment information comprises equipment running speed and a memory value, and the equipment variable value is used for representing equipment performance of the target equipment;
and determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable quantity value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment.
Further, the determining the calculated quantity variable value of the neural network to be deployed according to the calculated quantity information includes:
determining the maximum total calculated amount and the minimum total calculated amount in a sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating a quotient value between the total calculated quantity of the neural network to be deployed and the calculated quantity difference value to obtain a first calculated quantity variable value;
calculating a quotient value between convolution calculation quantity of convolution of 3x3 convolution and total calculation quantity in the neural network to be deployed to obtain a second calculation quantity variable value;
calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in the neural network to be deployed to obtain a third calculation quantity variable value;
the calculation amount variable value includes the first calculation amount variable value, the second calculation amount variable value, and the third calculation amount variable value.
Further, the determining a device variable value of the target device according to the device information includes:
determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
calculating a quotient value between the memory value of the target device and the memory difference value to obtain a first device variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
calculating a quotient value between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value.
Still further, the method further comprises:
respectively determining the running time of the sample neural network on processors of different sample devices, and determining the sample deployment variable values of the sample neural network on the different sample devices according to the running time;
determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
respectively calculating a quotient value between the total calculated quantity of each sample neural network and the calculated quantity difference value to obtain a first sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the convolution of 3x3 and the total calculation quantity in each sample neural network to obtain a second sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in each sample neural network to obtain a third sample variable value;
determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
respectively calculating a quotient value between the memory value of each sample device and the memory difference value to obtain a fourth sample variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
respectively calculating a quotient value between the equipment running speed of each sample equipment and the running speed difference value to obtain a fifth sample variable value;
solving for a variable coefficient in an approximation equation based on the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value, and the fifth sample variable value.
Furthermore, each sample device is provided with a first processor and a second processor, and the calculation formula for determining the sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
y is the sample deployment variable value, Gtime is the run time for the first processor to run the sample neural network, and Ctime is the run time for the second processor to run the sample neural network.
Further, the determining a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value includes:
and performing variable value operation on the calculated variable quantity value and the equipment variable value according to the solved approximate equation to obtain the deployment variable value.
Further, before the obtaining the calculated amount information of the neural network to be deployed, the method further includes:
respectively acquiring the identifications of the neural network to be deployed and the target equipment to obtain a network identification and an equipment identification;
if the network identifier and the equipment identifier are a preset identifier combination, inquiring a processor corresponding to the preset identifier combination in the target equipment;
and deploying the neural network to be deployed according to the processor inquired in the target equipment.
It is another object of an embodiment of the present invention to provide a device processor selection system, including:
the calculation quantity and quantity value determining module is used for acquiring calculation quantity information of the neural network to be deployed and determining the calculation quantity and quantity value of the neural network to be deployed according to the calculation quantity information, wherein the calculation quantity information comprises total calculation quantity and convolution calculation quantity of the neural network to be deployed, and the calculation quantity and quantity value is used for representing the complexity of the calculation quantity of the neural network to be deployed;
the device variable value determining module is used for acquiring device information of target devices and determining device variable values of the target devices according to the device information, wherein the device information comprises device running speed and memory values, and the device variable values are used for representing device performances of the target devices;
and the deployment variable value determining module is used for determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable quantity value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment.
It is another object of the embodiments of the present invention to provide a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.
It is a further object of embodiments of the present invention to provide a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the above-mentioned method steps.
According to the embodiment of the invention, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information by acquiring the calculated amount information of the neural network to be deployed, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the performance difference of the neural network to be deployed can be effectively determined on the target equipment based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the processor on the target equipment can be automatically selected based on the performance difference between different processors, the selection of the processor is not required to be performed in a manual test mode, and the accuracy and the efficiency of the selection of the equipment processor are improved.
Drawings
FIG. 1 is a flow chart of a device processor selection method provided by a first embodiment of the present invention;
FIG. 2 is a flow chart of a device processor selection method according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a device processor selection system according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Referring to fig. 1, a flowchart of a device processor selection method according to a first embodiment of the present invention is shown, where the device processor selection method is applicable to any terminal device, where the terminal device includes a mobile phone, a tablet or a wearable smart device, and the device processor selection method includes the steps of:
step S10, obtaining the calculated quantity information of the neural network to be deployed, and determining the calculated quantity variable value of the neural network to be deployed according to the calculated quantity information;
the calculated quantity information comprises the total calculated quantity and the convolution calculated quantity of the neural network to be deployed, the total calculated quantity is the total times of multiply-add calculation in the neural network to be deployed, the calculated quantity variable value is used for representing the complexity of the calculated quantity of the neural network to be deployed, and the total calculated quantity and the convolution calculated quantity of different neural networks to be deployed can be different.
In the step, the convolution calculated amount comprises a calculated amount corresponding to a convolution layer which cannot be configured in the neural network to be deployed, and the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information by acquiring the calculated amount information of the neural network to be deployed.
Step S20, acquiring equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information;
the device information comprises a device running speed and a memory value, the device running speed is the size of running data of the target device in unit time, the device variable value is used for representing the device performance of the target device, and when the device running speed is higher, the performance of the target device is judged to be better.
Step S30, determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable quantity value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value;
when the to-be-deployed neural network is deployed on the target device, processing the performance gap of the to-be-deployed neural network among different processors;
optionally, in this step, when two different processors are deployed on the target device, if the deployment variable value is greater than 0, it is determined that the processing performance of the first processor on the neural network to be deployed is higher than that of the second processor, if the deployment variable value is smaller than 0, it is determined that the processing performance of the second processor on the neural network to be deployed is higher than that of the first processor, and when the absolute value of the deployment variable value is larger, it is determined that the difference in processing performance between the two processors on the neural network to be deployed is larger.
Optionally, in this step, before determining, according to the calculated quantity variable value and the device variable value, a deployment variable value between the neural network to be deployed and the target device, the method further includes:
respectively determining the running time of the sample neural network on processors of different sample devices, and determining the sample deployment variable values of the sample neural network on the different sample devices according to the running time;
in the step, the running time of the sample neural network on processors of different sample devices is determined by respectively operating different sample neural networks on different sample devices.
Further, in this step, each of the sample devices is provided with a first processor and a second processor, and the calculation formula for determining the sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
y is the sample deployment variable value, Gtime is the run time for the first processor to run the sample neural network, and Ctime is the run time for the second processor to run the sample neural network.
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
and obtaining the calculated quantity difference value by calculating the difference value between Tmax and Tmin.
Respectively calculating a quotient value between the total calculated quantity of each sample neural network and the calculated quantity difference value to obtain a first sample variable value;
wherein, if the total calculation amount of the sample neural network is T, then:
the first sample variable value X1 is T/(tmax-Tmin).
Respectively calculating a quotient value between the convolution calculation quantity of the convolution of 3x3 and the total calculation quantity in each sample neural network to obtain a second sample variable value;
where the convolution calculation amount of the convolution of 3 × 3 is T3x3Then the second sample variable value X2 is T3x3/T。
Respectively calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in each sample neural network to obtain a third sample variable value;
wherein, the convolution calculation amount of the convolution of 1x1 is T1x1Then the third sample variable value X3 is T1x1/T。
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
the maximum memory value is Mmax, the minimum memory value is Min, and the memory difference value is obtained by calculating the difference value between Mmax and Min.
Respectively calculating a quotient value between the memory value of each sample device and the memory difference value to obtain a fourth sample variable value;
where the memory value of the sample device is M, the fourth sample variable value X4 is M/(Mmax-Mmin).
Determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
wherein the maximum operation speed is Smax, the minimum operation speed is Smin, and the operation speed difference is obtained by calculating the difference between Smax and Smin.
Respectively calculating a quotient value between the equipment running speed of each sample equipment and the running speed difference value to obtain a fifth sample variable value;
wherein, the device running speed of the sample device is S, then:
the fifth sample variable value X5 is S/(Smax-Smin).
Solving for variable coefficients in an approximation equation based on the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value, and the fifth sample variable value;
wherein, the approximate equation Y is selected as a0+ a 1X 1+ a 2X 2+ a 3X 3+ a 4X 4+ a 5X 5, and all coefficients (a1, a2, a3, a4, a5) in the above equations are solved by the principle of least squares, so that the solved approximate equation can effectively calculate the deployment variable values between different neural networks to be deployed and the target device, and Y is the deployment variable value.
Further, in this step, the determining a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value includes:
performing variable value operation on the calculated variable quantity value and the equipment variable value according to the solved approximate equation to obtain a deployment variable value;
and substituting the calculated variable quantity value and the equipment variable value into the solved approximate equation for operation to obtain a deployment variable value between the neural network to be deployed and the target equipment.
According to the embodiment, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information by acquiring the calculated amount information of the neural network to be deployed, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the processor on the target equipment can be automatically selected based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the performance difference of the neural network to be deployed is processed among different processors, the processor on the target equipment can be automatically selected based on the performance difference among the different processors, the selection of the processor is not required to be performed in a manual test mode, and the accuracy and the efficiency of the selection of the equipment processor are improved.
Example two
Referring to fig. 2, a flowchart of a device processor selection method according to a second embodiment of the present invention is provided, where the method is used to further refine steps S20 to S30, and includes the steps of:
step S21, determining the maximum total calculated quantity and the minimum total calculated quantity in the sample neural network, and calculating the difference between the maximum total calculated quantity and the minimum total calculated quantity to obtain the calculated quantity difference;
and obtaining the calculated quantity difference value by calculating the difference value between Tmax and Tmin.
Step S22, calculating a quotient between the total calculated amount of the neural network to be deployed and the calculated amount difference value to obtain a first calculated amount variable value;
wherein, if the total calculation amount of the sample neural network is T1, then: the first calculation variable value is T1/(tmax-Tmin).
Step S23, respectively calculating quotient values between convolution calculation quantities of convolution 3x3 and convolution 1x1 in the neural network to be deployed and total calculation quantities to obtain a second calculation quantity variable value and a third calculation quantity variable value;
wherein the calculated variable value comprises a first calculated variable value, a second calculated variable value and a third calculated variable value;
in this step, in the neural network to be deployed, the convolution calculation amount of convolution of 3x3 is T2, and the convolution calculation amount of convolution of 1x1 is T3, then: the second calculated variable value is T2/T1 and the third calculated variable value is T3/T1.
Step S24, determining a maximum memory value and a minimum memory value in the sample device, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
the maximum memory value is Mmax, the minimum memory value is Min, and the memory difference value is obtained by calculating the difference value between Mmax and Min.
Step S25, calculating a quotient between the memory value of the target device and the memory difference value, to obtain a first device variable value;
if the memory value of the sample device is M1, the first device variable value is M1/(Mmax-Mmin).
Step S26, determining the maximum running speed and the minimum running speed in the sample device, and calculating the difference between the maximum running speed and the minimum running speed to obtain a running speed difference;
wherein the maximum operation speed is Smax, the minimum operation speed is Smin, and the operation speed difference is obtained by calculating the difference between Smax and Smin.
Step S27, calculating a quotient value between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
wherein the device variable value includes a first device variable value and a second device variable value, and the device operating speed of the target device is S1, the second device variable value is S1/(Smax-Smin).
In the embodiment, the first calculated variable value is substituted into X1 in the solved approximate equation, the second calculated variable value is substituted into X2 in the solved approximate equation, the third calculated variable value is substituted into X3 in the solved approximate equation, the first device variable value is substituted into X4 in the solved approximate equation, the second device variable value is substituted into X5 in the solved approximate equation, and operation is performed based on the solved a1, a2, a3, a4 and a5, so that the deployment variable value between the neural network to be deployed and the target device is obtained.
EXAMPLE III
Referring to fig. 3, a schematic structural diagram of a device processor selection system 100 according to a third embodiment of the present invention is shown, including: a calculation variable value determination module 10, a device variable value determination module 11, and a deployment variable value determination module 12, wherein:
the calculation variable quantity value determining module 10 is configured to obtain calculation quantity information of the neural network to be deployed, and determine the calculation variable quantity value of the neural network to be deployed according to the calculation quantity information, where the calculation quantity information includes a total calculation quantity and a convolution calculation quantity of the neural network to be deployed, and the calculation variable quantity value is used to represent a complexity of the calculation quantity of the neural network to be deployed.
Wherein, the calculation variable value determining module 10 is further configured to: determining the maximum total calculated amount and the minimum total calculated amount in a sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating a quotient value between the total calculated quantity of the neural network to be deployed and the calculated quantity difference value to obtain a first calculated quantity variable value;
calculating a quotient value between convolution calculation quantity of convolution of 3x3 convolution and total calculation quantity in the neural network to be deployed to obtain a second calculation quantity variable value;
calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in the neural network to be deployed to obtain a third calculation quantity variable value;
the calculation amount variable value includes the first calculation amount variable value, the second calculation amount variable value, and the third calculation amount variable value.
Further, the calculation variable value determination module 10 is further configured to: respectively acquiring the identifications of the neural network to be deployed and the target equipment to obtain a network identification and an equipment identification;
if the network identifier and the equipment identifier are a preset identifier combination, inquiring a processor corresponding to the preset identifier combination in the target equipment;
and deploying the neural network to be deployed according to the processor inquired in the target equipment.
The device variable value determining module 11 is configured to obtain device information of a target device, and determine a device variable value of the target device according to the device information, where the device information includes a device running speed and a memory value, and the device variable value is used to characterize device performance of the target device.
Wherein the device variable value determination module 11 is further configured to: determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
calculating a quotient value between the memory value of the target device and the memory difference value to obtain a first device variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
calculating a quotient value between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value.
A deployment variable value determining module 12, configured to determine a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value, and select a processor in the target device according to the deployment variable value, where the deployment variable value is used to characterize a performance gap between different processors of the neural network to be deployed when the neural network to be deployed is deployed on the target device.
Optionally, in this embodiment, the device processor selection system 100 further includes:
the approximate equation solving module 13 is configured to determine running times of the sample neural network on processors of different sample devices, respectively, and determine sample deployment variable values of the sample neural network on the different sample devices according to the running times;
determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
respectively calculating a quotient value between the total calculated quantity of each sample neural network and the calculated quantity difference value to obtain a first sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the convolution of 3x3 and the total calculation quantity in each sample neural network to obtain a second sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in each sample neural network to obtain a third sample variable value;
determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
respectively calculating a quotient value between the memory value of each sample device and the memory difference value to obtain a fourth sample variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
respectively calculating a quotient value between the equipment running speed of each sample equipment and the running speed difference value to obtain a fifth sample variable value;
solving for a variable coefficient in an approximation equation based on the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value, and the fifth sample variable value.
Further, each sample device is provided with a first processor and a second processor, and the calculation formula for determining the sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
y is the sample deployment variable value, Gtime is the run time for the first processor to run the sample neural network, and Ctime is the run time for the second processor to run the sample neural network.
Still further, the deployment variable value determination module 12 is further configured to: and performing variable value operation on the calculated variable quantity value and the equipment variable value according to the solved approximate equation to obtain the deployment variable value.
According to the embodiment, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information by acquiring the calculated amount information of the neural network to be deployed, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the processor on the target equipment can be automatically selected based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the performance difference of the neural network to be deployed is processed among different processors, the processor on the target equipment can be automatically selected based on the performance difference among the different processors, the selection of the processor is not required to be performed in a manual test mode, and the accuracy and the efficiency of the selection of the equipment processor are improved.
Example four
Fig. 4 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 4, the terminal device 2 of this embodiment includes: a processor 20, a memory 21 and a computer program 22, such as a program of a device processor selection method, stored in said memory 21 and executable on said processor 20. The processor 20, when executing the computer program 23, implements the steps in the various embodiments of the device processor selection method described above, such as S10 through S30 shown in fig. 1, or S21 through S27 shown in fig. 2. Alternatively, when the processor 20 executes the computer program 22, the functions of the units in the embodiment corresponding to fig. 3, for example, the functions of the units 10 to 13 shown in fig. 3, are implemented, for which reference is specifically made to the relevant description in the embodiment corresponding to fig. 3, which is not repeated herein.
Illustratively, the computer program 22 may be divided into one or more units, which are stored in the memory 21 and executed by the processor 20 to accomplish the present application. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 22 in the terminal device 2. For example, the computer program 22 may be divided into a calculation variable value determination module 10, a device variable value determination module 11, a deployment variable value determination module 12, and an approximation equation solving module 13, and the specific functions of the respective units are as described above.
The terminal device may include, but is not limited to, a processor 20, a memory 21. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a terminal device 2 and does not constitute a limitation of the terminal device 2 and may include more or less components than those shown, or some components may be combined, or different components, for example the terminal device may also include input output devices, network access devices, buses, etc.
The Processor 20 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program and other programs and data required by the terminal device. The memory 21 may also be used to temporarily store data that has been output or is to be output.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable storage medium may be non-volatile or volatile. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method for device processor selection, the method comprising:
acquiring the calculated quantity information of a neural network to be deployed, and determining the calculated quantity variable value of the neural network to be deployed according to the calculated quantity information, wherein the calculated quantity information comprises the total calculated quantity and the convolution calculated quantity of the neural network to be deployed, and the calculated quantity variable value is used for representing the complexity of the calculated quantity of the neural network to be deployed;
acquiring equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information, wherein the equipment information comprises equipment running speed and a memory value, and the equipment variable value is used for representing equipment performance of the target equipment;
and determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable quantity value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment.
2. The device processor selection method of claim 1, wherein the determining a calculated quantity variable value for the neural network to be deployed from the calculated quantity information comprises:
determining the maximum total calculated amount and the minimum total calculated amount in a sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating a quotient value between the total calculated quantity of the neural network to be deployed and the calculated quantity difference value to obtain a first calculated quantity variable value;
calculating a quotient value between convolution calculation quantity of convolution of 3x3 convolution and total calculation quantity in the neural network to be deployed to obtain a second calculation quantity variable value;
calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in the neural network to be deployed to obtain a third calculation quantity variable value;
the calculation amount variable value includes the first calculation amount variable value, the second calculation amount variable value, and the third calculation amount variable value.
3. The device processor selection method of claim 1, wherein said determining a device variable value for the target device from the device information comprises:
determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
calculating a quotient value between the memory value of the target device and the memory difference value to obtain a first device variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
calculating a quotient value between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value.
4. The device processor selection method of claim 1, the method further comprising:
respectively determining the running time of the sample neural network on processors of different sample devices, and determining the sample deployment variable values of the sample neural network on the different sample devices according to the running time;
determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
respectively calculating a quotient value between the total calculated quantity of each sample neural network and the calculated quantity difference value to obtain a first sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the convolution of 3x3 and the total calculation quantity in each sample neural network to obtain a second sample variable value;
respectively calculating a quotient value between the convolution calculation quantity of the 1x1 convolution and the total calculation quantity in each sample neural network to obtain a third sample variable value;
determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
respectively calculating a quotient value between the memory value of each sample device and the memory difference value to obtain a fourth sample variable value;
determining a maximum running speed and a minimum running speed in the sample equipment, and calculating a difference value between the maximum running speed and the minimum running speed to obtain a running speed difference value;
respectively calculating a quotient value between the equipment running speed of each sample equipment and the running speed difference value to obtain a fifth sample variable value;
solving for a variable coefficient in an approximation equation based on the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value, and the fifth sample variable value.
5. The device processor selection method of claim 4, wherein each of the sample devices is provided with a first processor and a second processor, and the calculation formula for determining the values of the sample deployment variables of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
y is the sample deployment variable value, Gtime is the run time for the first processor to run the sample neural network, and Ctime is the run time for the second processor to run the sample neural network.
6. The device processor selection method of claim 4, wherein the determining a deployment variable value between the neural network to be deployed and the target device from the calculated quantity variable value and the device variable value comprises:
and performing variable value operation on the calculated variable quantity value and the equipment variable value according to the solved approximate equation to obtain the deployment variable value.
7. The device processor selection method of claim 1, wherein prior to obtaining the computational load information for the neural network to be deployed, further comprising:
respectively acquiring the identifications of the neural network to be deployed and the target equipment to obtain a network identification and an equipment identification;
if the network identifier and the equipment identifier are a preset identifier combination, inquiring a processor corresponding to the preset identifier combination in the target equipment;
and deploying the neural network to be deployed according to the processor inquired in the target equipment.
8. A device processor selection system, the system comprising:
the calculation quantity and quantity value determining module is used for acquiring calculation quantity information of the neural network to be deployed and determining the calculation quantity and quantity value of the neural network to be deployed according to the calculation quantity information, wherein the calculation quantity information comprises total calculation quantity and convolution calculation quantity of the neural network to be deployed, and the calculation quantity and quantity value is used for representing the complexity of the calculation quantity of the neural network to be deployed;
the device variable value determining module is used for acquiring device information of target devices and determining device variable values of the target devices according to the device information, wherein the device information comprises device running speed and memory values, and the device variable values are used for representing device performances of the target devices;
and the deployment variable value determining module is used for determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable quantity value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110633567.2A 2021-06-07 2021-06-07 Device processor selection method, system, terminal device and storage medium Pending CN113408718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110633567.2A CN113408718A (en) 2021-06-07 2021-06-07 Device processor selection method, system, terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110633567.2A CN113408718A (en) 2021-06-07 2021-06-07 Device processor selection method, system, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN113408718A true CN113408718A (en) 2021-09-17

Family

ID=77676748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110633567.2A Pending CN113408718A (en) 2021-06-07 2021-06-07 Device processor selection method, system, terminal device and storage medium

Country Status (1)

Country Link
CN (1) CN113408718A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017124955A1 (en) * 2016-01-21 2017-07-27 阿里巴巴集团控股有限公司 Method and device warehouse storage space planning and electronic device
JP2018191461A (en) * 2017-05-10 2018-11-29 キヤノン株式会社 Control device, optical instrument, control method, and program
CN109754066A (en) * 2017-11-02 2019-05-14 三星电子株式会社 Method and apparatus for generating fixed-point type neural network
CN110345099A (en) * 2019-07-18 2019-10-18 西安易朴通讯技术有限公司 The method, apparatus and system of server fan speed regulation
US20200089534A1 (en) * 2017-08-21 2020-03-19 Shanghai Cambricon Information Technology Co., Ltd Data sharing system and data sharing method therefor
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
US10664718B1 (en) * 2017-09-11 2020-05-26 Apple Inc. Real-time adjustment of hybrid DNN style transfer networks
WO2020151338A1 (en) * 2019-01-23 2020-07-30 平安科技(深圳)有限公司 Audio noise detection method and apparatus, storage medium, and mobile terminal
CN111797881A (en) * 2019-07-30 2020-10-20 华为技术有限公司 Image classification method and device
CN112380782A (en) * 2020-12-07 2021-02-19 重庆忽米网络科技有限公司 Rotating equipment fault prediction method based on mixed indexes and neural network
CN112445823A (en) * 2019-09-04 2021-03-05 华为技术有限公司 Searching method of neural network structure, image processing method and device
CN112634870A (en) * 2020-12-11 2021-04-09 平安科技(深圳)有限公司 Keyword detection method, device, equipment and storage medium
CN112884127A (en) * 2021-03-01 2021-06-01 厦门美图之家科技有限公司 Multiprocessor parallel neural network acceleration method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017124955A1 (en) * 2016-01-21 2017-07-27 阿里巴巴集团控股有限公司 Method and device warehouse storage space planning and electronic device
JP2018191461A (en) * 2017-05-10 2018-11-29 キヤノン株式会社 Control device, optical instrument, control method, and program
US20200089534A1 (en) * 2017-08-21 2020-03-19 Shanghai Cambricon Information Technology Co., Ltd Data sharing system and data sharing method therefor
US10664718B1 (en) * 2017-09-11 2020-05-26 Apple Inc. Real-time adjustment of hybrid DNN style transfer networks
CN109754066A (en) * 2017-11-02 2019-05-14 三星电子株式会社 Method and apparatus for generating fixed-point type neural network
WO2020151338A1 (en) * 2019-01-23 2020-07-30 平安科技(深圳)有限公司 Audio noise detection method and apparatus, storage medium, and mobile terminal
CN110345099A (en) * 2019-07-18 2019-10-18 西安易朴通讯技术有限公司 The method, apparatus and system of server fan speed regulation
CN111797881A (en) * 2019-07-30 2020-10-20 华为技术有限公司 Image classification method and device
CN112445823A (en) * 2019-09-04 2021-03-05 华为技术有限公司 Searching method of neural network structure, image processing method and device
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN112380782A (en) * 2020-12-07 2021-02-19 重庆忽米网络科技有限公司 Rotating equipment fault prediction method based on mixed indexes and neural network
CN112634870A (en) * 2020-12-11 2021-04-09 平安科技(深圳)有限公司 Keyword detection method, device, equipment and storage medium
CN112884127A (en) * 2021-03-01 2021-06-01 厦门美图之家科技有限公司 Multiprocessor parallel neural network acceleration method, device, equipment and storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LIAN, YJ 等: "Optimizing AD Pruning of Sponsored Search with Reinforcement Learning", 《 WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021)》, 30 April 2021 (2021-04-30), pages 123 - 127, XP058746404, DOI: 10.1145/3442442.3453148 *
MOHAMMAD REZA TAVAKOLI 等: "A High Throughput Hardware CNN Accelerator Using a Novel Multi-Layer Convolution Processor", 《2020 28TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE)》, 31 August 2020 (2020-08-31), pages 1 - 6, XP033865830, DOI: 10.1109/ICEE50131.2020.9260785 *
刘必成: "一种基于第二代赛道存储的面向卷积神经网络的高效内存计算框架", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2019, 15 January 2019 (2019-01-15), pages 137 - 195 *
段秉环 等: "面向嵌入式应用的深度神经网络压缩方法研究", 《航空计算技术》, vol. 48, no. 05, 25 September 2018 (2018-09-25), pages 50 - 53 *
谢波 等: "三维弹性波数值模拟中改进的NPML研究", 《地球物理学进展》, vol. 31, no. 06, 15 December 2016 (2016-12-15), pages 2762 - 2766 *
黄培: "一种基于神经网络的网络设备故障预测系统", 《办公自动化》, vol. 25, no. 11, 1 June 2020 (2020-06-01), pages 23 - 27 *

Similar Documents

Publication Publication Date Title
CN110288082B (en) Convolutional neural network model training method and device and computer readable storage medium
CN108172213B (en) Surge audio identification method, surge audio identification device, surge audio identification equipment and computer readable medium
CN110795976B (en) Method, device and equipment for training object detection model
CN109858613B (en) Compression method and system of deep neural network and terminal equipment
US11748595B2 (en) Convolution acceleration operation method and apparatus, storage medium and terminal device
JP2010002370A (en) Pattern extraction program, technique, and apparatus
CN109766800B (en) Construction method of mobile terminal flower recognition model
CN108846814B (en) Image processing method, image processing device, readable storage medium and computer equipment
KR20230130591A (en) Information processing apparatus, information processing method, non-transitory computer-readable storage medium
CN114741389A (en) Model parameter adjusting method and device, electronic equipment and storage medium
CN112766397B (en) Classification network and implementation method and device thereof
CN114611697A (en) Neural network quantification and deployment method, system, electronic device and storage medium
CN108053034B (en) Model parameter processing method and device, electronic equipment and storage medium
CN111524153B (en) Image analysis force determination method and device and computer storage medium
CN113408718A (en) Device processor selection method, system, terminal device and storage medium
CN111373436A (en) Image processing method, terminal device and storage medium
CN112187266B (en) Nonlinear correction method and device of analog-to-digital converter and electronic equipment
CN111160516A (en) Convolutional layer sparsization method and device of deep neural network
CN116414542B (en) Task scheduling method, device, equipment and storage medium
CN109740733B (en) Deep learning network model optimization method and device and related equipment
US10963746B1 (en) Average pooling in a neural network
CN107392859B (en) Method and device for eliminating highlight area and terminal
CN113469324B (en) Model dynamic quantization method, device, electronic equipment and computer readable medium
CN110866484B (en) Driver face detection method, computer device and computer readable storage medium
CN107330866B (en) Method and device for eliminating highlight area and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination