CN113408718B - Device processor selection method, system, terminal device and storage medium - Google Patents
Device processor selection method, system, terminal device and storage medium Download PDFInfo
- Publication number
- CN113408718B CN113408718B CN202110633567.2A CN202110633567A CN113408718B CN 113408718 B CN113408718 B CN 113408718B CN 202110633567 A CN202110633567 A CN 202110633567A CN 113408718 B CN113408718 B CN 113408718B
- Authority
- CN
- China
- Prior art keywords
- value
- sample
- neural network
- variable value
- equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010187 selection method Methods 0.000 title claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 152
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000004590 computer program Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a device processor selection method, a system, terminal equipment and a storage medium, wherein the method comprises the following steps: determining a calculated variable value of the neural network to be deployed according to the calculated information of the neural network to be deployed; acquiring equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information; and determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value. According to the invention, based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the performance gap of the neural network to be deployed is processed among different processors, and based on the performance gap among different processors, the processors on the target equipment can be automatically selected without adopting a manual test mode, so that the accuracy and the efficiency of the equipment processor selection are improved.
Description
Technical Field
The present invention belongs to the field of electronic communications, and in particular, relates to a device processor selection method, a system, a terminal device, and a storage medium.
Background
With the development of the age and the progress of technology, the functions of a processor are more and more powerful, and the effect of providing the performance of the processor can be achieved by disposing a plurality of processors on a terminal device, for example, a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) are disposed on a mobile terminal mobile phone, the division of the power of the CPU and the GPU is different, the CPU is good at logic control, the GPU is good at parallel operation, and a lot of deep learning reasoning is put on the GPU for operation. However, because of the influence of the calculation amount and other factors in the neural network, the reasoning of the neural network on the GPU is not necessarily faster than the CPU, so that the problem of selecting a device processor is more and more emphasized for the reasoning process of the neural network.
The selection of the device processor is carried out in the prior art by a manual test mode, namely, the operation time of the neural network on different processors of the same device is manually tested, and the processor on the device is selected based on the tested operation time.
Disclosure of Invention
The embodiment of the invention aims to provide a device processor selection method, a system, terminal equipment and a storage medium, which aim to solve the problem of low device processor selection efficiency caused by the fact that a manual test mode is adopted for selecting a device processor in the existing device processor selection process.
The embodiment of the invention is realized in such a way that a device processor selection method comprises the following steps:
Acquiring calculated quantity information of a neural network to be deployed, and determining calculated quantity values of the neural network to be deployed according to the calculated quantity information, wherein the calculated quantity information comprises total calculated quantity and convolution calculated quantity of the neural network to be deployed, and the calculated quantity values are used for representing the complexity degree of the calculated quantity of the neural network to be deployed;
Acquiring equipment information of target equipment, and determining equipment variable values of the target equipment according to the equipment information, wherein the equipment information comprises equipment running speed and memory values, and the equipment variable values are used for representing equipment performance of the target equipment;
and determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment.
Still further, the determining the calculated variable value of the neural network to be deployed according to the calculated variable information includes:
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
Calculating a quotient between the total computation of the neural network to be deployed and the computation difference value to obtain a first computation value;
calculating a quotient between the convolution calculated amount of the 3x3 convolution and the total calculated amount in the neural network to be deployed to obtain a second calculated amount variable value;
calculating a quotient between the convolution calculated amount of the 1x1 convolution and the total calculated amount in the neural network to be deployed to obtain a third calculated amount variable;
The calculated variable values include the first calculated variable value, the second calculated variable value, and the third calculated variable value.
Still further, the determining the device variable value of the target device according to the device information includes:
Determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating a quotient between the memory value of the target device and the memory difference value to obtain a first device variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating a quotient between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value.
Still further, the method further comprises:
Respectively determining the running time of the sample neural network on processors of different sample devices, and determining sample deployment variable values of the sample neural network on different sample devices according to the running time;
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating the quotient value between the total computation quantity and the computation quantity difference value of each sample neural network respectively to obtain a first sample variable value;
calculating quotient values between convolution calculated quantity of 3x3 convolution and total calculated quantity in each sample neural network respectively to obtain second sample variable values;
Calculating quotient values between convolution calculated quantity of 1x1 convolution and total calculated quantity in each sample neural network respectively to obtain a third sample variable value;
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating the quotient value between the memory value of each sample device and the memory difference value respectively to obtain a fourth sample variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating quotient values between the equipment operation speed of each sample equipment and the operation speed difference value respectively to obtain a fifth sample variable value;
And solving variable coefficients in an approximation equation according to the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value and the fifth sample variable value.
Further, each sample device is provided with a first processor and a second processor, and a calculation formula adopted by determining sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
Y is the sample deployment variable value, gtime is the runtime of the first processor to run the sample neural network, ctime is the runtime of the second processor to run the sample neural network.
Still further, the determining a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value includes:
And carrying out variable value operation on the calculated variable value and the equipment variable value according to the solved approximation equation to obtain the deployment variable value.
Further, before obtaining the calculated amount information of the neural network to be deployed, the method further includes:
respectively acquiring the identification of the neural network to be deployed and the identification of the target equipment to obtain a network identification and an equipment identification;
If the network identification and the equipment identification are preset identification combinations, inquiring a processor corresponding to the preset identification combinations in the target equipment;
and deploying the neural network to be deployed according to the processor queried in the target equipment.
It is another object of an embodiment of the present invention to provide a device processor selection system, the system comprising:
The computing quantity value determining module is used for obtaining computing quantity information of the neural network to be deployed and determining computing quantity values of the neural network to be deployed according to the computing quantity information, wherein the computing quantity information comprises total computing quantity and convolution computing quantity of the neural network to be deployed, and the computing quantity values are used for representing the complexity degree of the computing quantity of the neural network to be deployed;
the device variable value determining module is used for obtaining device information of target devices and determining device variable values of the target devices according to the device information, wherein the device information comprises device running speed and memory values, and the device variable values are used for representing device performances of the target devices;
The deployment variable value determining module is used for determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, selecting a processor in the target equipment according to the deployment variable value, and processing a performance gap of the neural network to be deployed among different processors when the neural network to be deployed is deployed on the target equipment.
It is a further object of an embodiment of the present invention to provide a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor implements the steps of the method as described above when executing the computer program.
It is a further object of embodiments of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
According to the embodiment of the invention, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the performance gap of the neural network to be deployed can be effectively determined to the target equipment based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the processor on the target equipment can be automatically selected based on the performance gap of the different processors, and the selection of the processor on the target equipment is not needed by adopting a manual test mode, so that the accuracy and the efficiency of the selection of the equipment processor are improved.
Drawings
FIG. 1 is a flow chart of a device processor selection method provided by a first embodiment of the present invention;
FIG. 2 is a flow chart of a device processor selection method provided by a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a device processor selection system according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Example 1
Referring to fig. 1, a flowchart of a device processor selection method according to a first embodiment of the present invention may be applied to any terminal device, where the terminal device includes a mobile phone, a tablet or a wearable smart device, and the device processor selection method includes the steps of:
Step S10, acquiring calculated amount information of a neural network to be deployed, and determining a calculated amount variable value of the neural network to be deployed according to the calculated amount information;
The calculated quantity information comprises total calculated quantity and convolution calculated quantity of the neural network to be deployed, the total calculated quantity is total times of multiplication and addition calculation in the neural network to be deployed, the calculated quantity variable is used for representing the complexity degree of the calculated quantity of the neural network to be deployed, and the total calculated quantity and the convolution calculated quantity between different neural networks to be deployed can be different.
In the step, the convolution calculated amount comprises calculated amount corresponding to the volume layer which cannot be large or small in the neural network to be deployed, and the complexity degree of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information by acquiring the calculated amount information of the neural network to be deployed.
Step S20, obtaining equipment information of target equipment, and determining an equipment variable value of the target equipment according to the equipment information;
the device information comprises a device running speed and a memory value, wherein the device running speed is the size of running data of the target device in unit time, the device variable value is used for representing the device performance of the target device, and when the device running speed is larger, the better the performance of the target device is judged.
Step S30, determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value;
the deployment variable value is used for representing the performance gap of the to-be-deployed neural network when the to-be-deployed neural network is deployed on the target equipment;
optionally, in the step, when two different processors are deployed on the target device, if the deployment variable value is greater than 0, it is determined that the processing performance of the first processor to the neural network to be deployed is higher than that of the second processor, if the deployment variable value is less than 0, it is determined that the processing performance of the second processor to the neural network to be deployed is higher than that of the first processor, and when the absolute value of the deployment variable value is greater, it is determined that the difference of the processing performances of the two processors to the neural network to be deployed is greater.
Optionally, before determining the deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value, the method further includes:
Respectively determining the running time of the sample neural network on processors of different sample devices, and determining sample deployment variable values of the sample neural network on different sample devices according to the running time;
the number and the model of the sample neural network and the sample equipment can be set according to the requirements, and in the step, the operation is respectively carried out on different sample equipment by using different sample neural networks so as to determine the operation time of the sample neural network on the processors of the different sample equipment.
Further, in this step, a first processor and a second processor are disposed on each sample device, and a calculation formula adopted by determining sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
Y is the sample deployment variable value, gtime is the runtime of the first processor to run the sample neural network, ctime is the runtime of the second processor to run the sample neural network.
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
the maximum total calculated amount is Tmax, the minimum total calculated amount is Tmin, and the calculated amount difference is obtained by calculating the difference between Tmax and Tmin.
Calculating the quotient value between the total computation quantity and the computation quantity difference value of each sample neural network respectively to obtain a first sample variable value;
Wherein, the total calculated amount of the sample neural network is T, then:
the first sample variable value x1=t/(Tmanx-Tmin).
Calculating quotient values between convolution calculated quantity of 3x3 convolution and total calculated quantity in each sample neural network respectively to obtain second sample variable values;
Wherein, the convolution calculation amount of the 3X3 convolution is T 3x3, and then the second sample variable value x2=t 3x3/T.
Calculating quotient values between convolution calculated quantity of 1x1 convolution and total calculated quantity in each sample neural network respectively to obtain a third sample variable value;
Wherein, the convolution calculation amount of the 1X1 convolution is T 1x1, and the third sample variable value x3=t 1x1/T.
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
The maximum memory value is Mmax, the minimum memory value is Min, and the memory difference value is obtained by calculating the difference value between Mmax and Min.
Calculating the quotient value between the memory value of each sample device and the memory difference value respectively to obtain a fourth sample variable value;
wherein, the memory value of the sample device is M, and the fourth sample variable value x4=m/(Mmax-Mmin).
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
The maximum running speed is Smax, the minimum running speed is Smin, and the running speed difference value is obtained by calculating the difference value between Smax and Smin.
Calculating quotient values between the equipment operation speed of each sample equipment and the operation speed difference value respectively to obtain a fifth sample variable value;
wherein, the device operation speed of the sample device is S, then:
the fifth sample variable value x5=s/(Smax-Smin).
Solving a variable coefficient in an approximation equation according to the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value and the fifth sample variable value;
the approximation equation y=a0+a1×1+a2×2+a3×3+a3+a4×4+a5×5 is selected, and all coefficients (a 1, a2, a3, a4, a 5) in the equation are solved by a least square principle, so that the solved approximation equation can effectively calculate deployment variable values between different neural networks to be deployed and target devices, and Y is the deployment variable value.
Still further, in this step, the determining a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value includes:
performing variable value operation on the calculated variable value and the equipment variable value according to the solved approximation equation to obtain the deployment variable value;
substituting the calculated variable value and the equipment variable value into the solved approximation equation to perform operation, so as to obtain a deployment variable value between the neural network to be deployed and the target equipment.
According to the embodiment, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the performance gap of the neural network to be deployed can be effectively determined to the target equipment based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the processor on the target equipment can be automatically selected based on the performance gap of the different processors, and the selection of the processor on the target equipment does not need to be performed in a manual test mode, so that the accuracy and the efficiency of the selection of the equipment processor are improved.
Example two
Referring to fig. 2, a flowchart of a device processor selection method according to a second embodiment of the present invention is provided, and the method is used for further refining steps S20 to S30, and includes the steps of:
Step S21, determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
the maximum total calculated amount is Tmax, the minimum total calculated amount is Tmin, and the calculated amount difference is obtained by calculating the difference between Tmax and Tmin.
Step S22, calculating a quotient between the total computation amount of the neural network to be deployed and the computation amount difference value to obtain a first computation amount variable value;
Wherein, the total calculated amount of the sample neural network is T1, then: first calculated variable value = T1/(Tmanx-Tmin).
Step S23, respectively calculating quotient values between convolution calculated amounts of the 3x3 convolution and the 1x1 convolution and total calculated amounts in the neural network to be deployed, and obtaining a second calculated amount variable value and a third calculated amount variable value;
wherein the calculated variable values include a first calculated variable value, a second calculated variable value, and a third calculated variable value;
In the step, in the neural network to be deployed, the convolution calculation amount of the 3x3 convolution is T2, and the convolution calculation amount of the 1x1 convolution is T3, and then: second calculation variable value=t2/T1, third calculation variable value=t3/T1.
Step S24, determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
The maximum memory value is Mmax, the minimum memory value is Min, and the memory difference value is obtained by calculating the difference value between Mmax and Min.
Step S25, calculating a quotient between the memory value of the target device and the memory difference value to obtain a first device variable value;
Wherein, the memory value of the sample device is M1, and the first device variable value=m1/(Mmax-Mmin).
Step S26, determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
The maximum running speed is Smax, the minimum running speed is Smin, and the running speed difference value is obtained by calculating the difference value between Smax and Smin.
Step S27, calculating a quotient between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
wherein the device variable value includes a first device variable value and a second device variable value, and the device operation speed of the target device is S1, then the second device variable value=s1/(Smax-Smin).
In this embodiment, the first calculated variable value is substituted into X1 in the solved approximation equation, the second calculated variable value is substituted into X2 in the solved approximation equation, the third calculated variable value is substituted into X3 in the solved approximation equation, the first equipment variable value is substituted into X4 in the solved approximation equation, the second equipment variable value is substituted into X5 in the solved approximation equation, and the deployment variable value between the neural network to be deployed and the target equipment is obtained by performing operation based on a1, a2, a3, a4 and a5 after the solution.
Example III
Referring to fig. 3, a schematic structural diagram of a device processor selection system 100 according to a third embodiment of the present invention includes: a calculated variable value determination module 10, an equipment variable value determination module 11, and a deployment variable value determination module 12, wherein:
the computation amount value determining module 10 is configured to obtain computation amount information of a neural network to be deployed, and determine a computation amount value of the neural network to be deployed according to the computation amount information, where the computation amount information includes a total computation amount and a convolution computation amount of the neural network to be deployed, and the computation amount value is used to characterize the complexity degree of the computation amount of the neural network to be deployed.
Wherein the calculation amount value determining module 10 is further configured to: determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
Calculating a quotient between the total computation of the neural network to be deployed and the computation difference value to obtain a first computation value;
calculating a quotient between the convolution calculated amount of the 3x3 convolution and the total calculated amount in the neural network to be deployed to obtain a second calculated amount variable value;
calculating a quotient between the convolution calculated amount of the 1x1 convolution and the total calculated amount in the neural network to be deployed to obtain a third calculated amount variable;
The calculated variable values include the first calculated variable value, the second calculated variable value, and the third calculated variable value.
Further, the calculation amount value determining module 10 is further configured to: respectively acquiring the identification of the neural network to be deployed and the identification of the target equipment to obtain a network identification and an equipment identification;
If the network identification and the equipment identification are preset identification combinations, inquiring a processor corresponding to the preset identification combinations in the target equipment;
and deploying the neural network to be deployed according to the processor queried in the target equipment.
The device variable value determining module 11 is configured to obtain device information of a target device, and determine a device variable value of the target device according to the device information, where the device information includes a device running speed and a memory value, and the device variable value is used to characterize a device performance of the target device.
Wherein the device variable value determining module 11 is further configured to: determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating a quotient between the memory value of the target device and the memory difference value to obtain a first device variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating a quotient between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value.
A deployment variable value determining module 12, configured to determine a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value, and select a processor in the target device according to the deployment variable value, where the deployment variable value is used to characterize a performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target device.
Optionally, in this embodiment, the device processor selection system 100 further includes:
an approximation equation solving module 13, configured to determine running times of the sample neural network on processors of different sample devices, and determine sample deployment variable values of the sample neural network on different sample devices according to the running times;
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating the quotient value between the total computation quantity and the computation quantity difference value of each sample neural network respectively to obtain a first sample variable value;
calculating quotient values between convolution calculated quantity of 3x3 convolution and total calculated quantity in each sample neural network respectively to obtain second sample variable values;
Calculating quotient values between convolution calculated quantity of 1x1 convolution and total calculated quantity in each sample neural network respectively to obtain a third sample variable value;
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating the quotient value between the memory value of each sample device and the memory difference value respectively to obtain a fourth sample variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating quotient values between the equipment operation speed of each sample equipment and the operation speed difference value respectively to obtain a fifth sample variable value;
And solving variable coefficients in an approximation equation according to the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value and the fifth sample variable value.
Further, each sample device is provided with a first processor and a second processor, and a calculation formula adopted for determining sample deployment variable values of the sample neural network on different sample devices according to the running time is as follows:
Y=(Gtime-Ctime)/Gtime
Y is the sample deployment variable value, gtime is the runtime of the first processor to run the sample neural network, ctime is the runtime of the second processor to run the sample neural network.
Still further, the deployment variable value determination module 12 is further configured to: and carrying out variable value operation on the calculated variable value and the equipment variable value according to the solved approximation equation to obtain the deployment variable value.
According to the embodiment, the complexity of the calculated amount of the neural network to be deployed can be effectively determined based on the calculated amount information, the equipment performance of the target equipment can be effectively determined based on the equipment information by acquiring the equipment information of the target equipment, the performance gap of the neural network to be deployed can be effectively determined to the target equipment based on the complexity of the calculated amount of the neural network to be deployed and the equipment performance of the target equipment, the processor on the target equipment can be automatically selected based on the performance gap of the different processors, and the selection of the processor on the target equipment does not need to be performed in a manual test mode, so that the accuracy and the efficiency of the selection of the equipment processor are improved.
Example IV
Fig. 4 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 4, the terminal device 2 of this embodiment includes: a processor 20, a memory 21 and a computer program 22 stored in said memory 21 and executable on said processor 20, for example a program for a device processor selection method. The steps of the respective embodiments of the device processor selection method described above, such as S10 to S30 shown in fig. 1 or S21 to S27 shown in fig. 2, are implemented when the processor 20 executes the computer program 23. Or the processor 20 performs the functions of each unit in the embodiment corresponding to fig. 3, for example, the functions of units 10 to 13 shown in fig. 3, when executing the computer program 22, refer to the related descriptions in the embodiment corresponding to fig. 3, which are not repeated here.
Illustratively, the computer program 22 may be partitioned into one or more units that are stored in the memory 21 and executed by the processor 20 to complete the present application. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 22 in the terminal device 2. For example, the computer program 22 may be partitioned into a calculated variable value determination module 10, a device variable value determination module 11, a deployment variable value determination module 12, and an approximation equation solution module 13, each unit functioning specifically as described above.
The terminal device may include, but is not limited to, a processor 20, a memory 21. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal device 2 and does not constitute a limitation of the terminal device 2, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The Processor 20 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 21 may also be used for temporarily storing data that has been output or is to be output.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Wherein the computer readable storage medium may be nonvolatile or volatile. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable storage medium may be appropriately scaled according to the requirements of jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunication signals, for example, according to jurisdictions and patent practices.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (6)
1. A method of device processor selection, the method comprising:
Acquiring calculated quantity information of a neural network to be deployed, and determining calculated quantity values of the neural network to be deployed according to the calculated quantity information, wherein the calculated quantity information comprises total calculated quantity and convolution calculated quantity of the neural network to be deployed, and the calculated quantity values are used for representing the complexity degree of the calculated quantity of the neural network to be deployed;
Acquiring equipment information of target equipment, and determining equipment variable values of the target equipment according to the equipment information, wherein the equipment information comprises equipment running speed and memory values, and the equipment variable values are used for representing equipment performance of the target equipment;
determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, and selecting a processor in the target equipment according to the deployment variable value, wherein the deployment variable value is used for representing the performance gap of the neural network to be deployed between different processors when the neural network to be deployed is deployed on the target equipment;
The determining the calculated variable value of the neural network to be deployed according to the calculated variable information comprises the following steps:
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
Calculating a quotient between the total computation of the neural network to be deployed and the computation difference value to obtain a first computation value;
calculating a quotient between the convolution calculated amount of the 3x3 convolution and the total calculated amount in the neural network to be deployed to obtain a second calculated amount variable value;
calculating a quotient between the convolution calculated amount of the 1x1 convolution and the total calculated amount in the neural network to be deployed to obtain a third calculated amount variable;
the calculated variable values include the first calculated variable value, the second calculated variable value, and the third calculated variable value;
The determining the device variable value of the target device according to the device information comprises the following steps:
Determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating a quotient between the memory value of the target device and the memory difference value to obtain a first device variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating a quotient between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value;
The method further comprises the steps of:
Respectively determining the running time of the sample neural network on processors of different sample devices, and determining sample deployment variable values of the sample neural network on different sample devices according to the running time;
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating the quotient value between the total computation quantity and the computation quantity difference value of each sample neural network respectively to obtain a first sample variable value;
calculating quotient values between convolution calculated quantity of 3x3 convolution and total calculated quantity in each sample neural network respectively to obtain second sample variable values;
Calculating quotient values between convolution calculated quantity of 1x1 convolution and total calculated quantity in each sample neural network respectively to obtain a third sample variable value;
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating the quotient value between the memory value of each sample device and the memory difference value respectively to obtain a fourth sample variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating quotient values between the equipment operation speed of each sample equipment and the operation speed difference value respectively to obtain a fifth sample variable value;
Solving a variable coefficient in an approximation equation according to the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value and the fifth sample variable value;
the determining a deployment variable value between the neural network to be deployed and the target device according to the calculated variable value and the device variable value comprises the following steps:
And carrying out variable value operation on the calculated variable value and the equipment variable value according to the solved approximation equation to obtain the deployment variable value.
2. The device processor selection method of claim 1, wherein each sample device is provided with a first processor and a second processor, and a calculation formula adopted by determining sample deployment variable values of a sample neural network on different sample devices according to the running time is as follows: y= (Gtime-Ctime)/Gtime
Y is the sample deployment variable value, gtime is the runtime of the first processor to run the sample neural network, ctime is the runtime of the second processor to run the sample neural network.
3. The device processor selection method of claim 1, wherein prior to obtaining the computational information of the neural network to be deployed, further comprising:
respectively acquiring the identification of the neural network to be deployed and the identification of the target equipment to obtain a network identification and an equipment identification;
If the network identification and the equipment identification are preset identification combinations, inquiring a processor corresponding to the preset identification combinations in the target equipment;
and deploying the neural network to be deployed according to the processor queried in the target equipment.
4. A device processor selection system, the system comprising:
The computing quantity value determining module is used for obtaining computing quantity information of the neural network to be deployed and determining computing quantity values of the neural network to be deployed according to the computing quantity information, wherein the computing quantity information comprises total computing quantity and convolution computing quantity of the neural network to be deployed, and the computing quantity values are used for representing the complexity degree of the computing quantity of the neural network to be deployed;
the device variable value determining module is used for obtaining device information of target devices and determining device variable values of the target devices according to the device information, wherein the device information comprises device running speed and memory values, and the device variable values are used for representing device performances of the target devices;
The deployment variable value determining module is used for determining a deployment variable value between the neural network to be deployed and the target equipment according to the calculated variable value and the equipment variable value, selecting a processor in the target equipment according to the deployment variable value, and processing a performance gap of the neural network to be deployed among different processors when the neural network to be deployed is deployed on the target equipment;
the calculated variable value determination module is further configured to:
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
Calculating a quotient between the total computation of the neural network to be deployed and the computation difference value to obtain a first computation value;
calculating a quotient between the convolution calculated amount of the 3x3 convolution and the total calculated amount in the neural network to be deployed to obtain a second calculated amount variable value;
calculating a quotient between the convolution calculated amount of the 1x1 convolution and the total calculated amount in the neural network to be deployed to obtain a third calculated amount variable;
the calculated variable values include the first calculated variable value, the second calculated variable value, and the third calculated variable value;
the device variable value determination module is further configured to:
Determining a maximum memory value and a minimum memory value in sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating a quotient between the memory value of the target device and the memory difference value to obtain a first device variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating a quotient between the equipment running speed of the target equipment and the running speed difference value to obtain a second equipment variable value;
the device variable values include the first device variable value and the second device variable value;
The approximation equation solving module is used for respectively determining the running time of the sample neural network on the processors of different sample devices and determining the sample deployment variable values of the sample neural network on the different sample devices according to the running time;
Determining the maximum total calculated amount and the minimum total calculated amount in the sample neural network, and calculating the difference between the maximum total calculated amount and the minimum total calculated amount to obtain a calculated amount difference;
calculating the quotient value between the total computation quantity and the computation quantity difference value of each sample neural network respectively to obtain a first sample variable value;
calculating quotient values between convolution calculated quantity of 3x3 convolution and total calculated quantity in each sample neural network respectively to obtain second sample variable values;
Calculating quotient values between convolution calculated quantity of 1x1 convolution and total calculated quantity in each sample neural network respectively to obtain a third sample variable value;
Determining a maximum memory value and a minimum memory value in the sample equipment, and calculating a difference value between the maximum memory value and the minimum memory value to obtain a memory difference value;
Calculating the quotient value between the memory value of each sample device and the memory difference value respectively to obtain a fourth sample variable value;
Determining the maximum operation speed and the minimum operation speed in the sample equipment, and calculating the difference between the maximum operation speed and the minimum operation speed to obtain an operation speed difference;
calculating quotient values between the equipment operation speed of each sample equipment and the operation speed difference value respectively to obtain a fifth sample variable value;
Solving a variable coefficient in an approximation equation according to the sample deployment variable value, the first sample variable value, the second sample variable value, the third sample variable value, the fourth sample variable value and the fifth sample variable value;
The deployment variable value determination module is further configured to: and carrying out variable value operation on the calculated variable value and the equipment variable value according to the solved approximation equation to obtain the deployment variable value.
5. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 3 when the computer program is executed.
6. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110633567.2A CN113408718B (en) | 2021-06-07 | 2021-06-07 | Device processor selection method, system, terminal device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110633567.2A CN113408718B (en) | 2021-06-07 | 2021-06-07 | Device processor selection method, system, terminal device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113408718A CN113408718A (en) | 2021-09-17 |
CN113408718B true CN113408718B (en) | 2024-05-31 |
Family
ID=77676748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110633567.2A Active CN113408718B (en) | 2021-06-07 | 2021-06-07 | Device processor selection method, system, terminal device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113408718B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017124955A1 (en) * | 2016-01-21 | 2017-07-27 | 阿里巴巴集团控股有限公司 | Method and device warehouse storage space planning and electronic device |
JP2018191461A (en) * | 2017-05-10 | 2018-11-29 | キヤノン株式会社 | Control device, optical instrument, control method, and program |
CN109754066A (en) * | 2017-11-02 | 2019-05-14 | 三星电子株式会社 | Method and apparatus for generating fixed-point type neural network |
CN110345099A (en) * | 2019-07-18 | 2019-10-18 | 西安易朴通讯技术有限公司 | The method, apparatus and system of server fan speed regulation |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
US10664718B1 (en) * | 2017-09-11 | 2020-05-26 | Apple Inc. | Real-time adjustment of hybrid DNN style transfer networks |
WO2020151338A1 (en) * | 2019-01-23 | 2020-07-30 | 平安科技(深圳)有限公司 | Audio noise detection method and apparatus, storage medium, and mobile terminal |
CN111797881A (en) * | 2019-07-30 | 2020-10-20 | 华为技术有限公司 | Image classification method and device |
CN112380782A (en) * | 2020-12-07 | 2021-02-19 | 重庆忽米网络科技有限公司 | Rotating equipment fault prediction method based on mixed indexes and neural network |
CN112445823A (en) * | 2019-09-04 | 2021-03-05 | 华为技术有限公司 | Searching method of neural network structure, image processing method and device |
CN112634870A (en) * | 2020-12-11 | 2021-04-09 | 平安科技(深圳)有限公司 | Keyword detection method, device, equipment and storage medium |
CN112884127A (en) * | 2021-03-01 | 2021-06-01 | 厦门美图之家科技有限公司 | Multiprocessor parallel neural network acceleration method, device, equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109426553A (en) * | 2017-08-21 | 2019-03-05 | 上海寒武纪信息科技有限公司 | Task cutting device and method, Task Processing Unit and method, multi-core processor |
-
2021
- 2021-06-07 CN CN202110633567.2A patent/CN113408718B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017124955A1 (en) * | 2016-01-21 | 2017-07-27 | 阿里巴巴集团控股有限公司 | Method and device warehouse storage space planning and electronic device |
JP2018191461A (en) * | 2017-05-10 | 2018-11-29 | キヤノン株式会社 | Control device, optical instrument, control method, and program |
US10664718B1 (en) * | 2017-09-11 | 2020-05-26 | Apple Inc. | Real-time adjustment of hybrid DNN style transfer networks |
CN109754066A (en) * | 2017-11-02 | 2019-05-14 | 三星电子株式会社 | Method and apparatus for generating fixed-point type neural network |
WO2020151338A1 (en) * | 2019-01-23 | 2020-07-30 | 平安科技(深圳)有限公司 | Audio noise detection method and apparatus, storage medium, and mobile terminal |
CN110345099A (en) * | 2019-07-18 | 2019-10-18 | 西安易朴通讯技术有限公司 | The method, apparatus and system of server fan speed regulation |
CN111797881A (en) * | 2019-07-30 | 2020-10-20 | 华为技术有限公司 | Image classification method and device |
CN112445823A (en) * | 2019-09-04 | 2021-03-05 | 华为技术有限公司 | Searching method of neural network structure, image processing method and device |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
CN112380782A (en) * | 2020-12-07 | 2021-02-19 | 重庆忽米网络科技有限公司 | Rotating equipment fault prediction method based on mixed indexes and neural network |
CN112634870A (en) * | 2020-12-11 | 2021-04-09 | 平安科技(深圳)有限公司 | Keyword detection method, device, equipment and storage medium |
CN112884127A (en) * | 2021-03-01 | 2021-06-01 | 厦门美图之家科技有限公司 | Multiprocessor parallel neural network acceleration method, device, equipment and storage medium |
Non-Patent Citations (6)
Title |
---|
A High Throughput Hardware CNN Accelerator Using a Novel Multi-Layer Convolution Processor;Mohammad Reza Tavakoli 等;《2020 28th Iranian Conference on Electrical Engineering (ICEE)》;20200831;1-6 * |
Optimizing AD Pruning of Sponsored Search with Reinforcement Learning;Lian, YJ 等;《 WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021)》;20210430;123-127 * |
一种基于神经网络的网络设备故障预测系统;黄培;《办公自动化》;20200601;第25卷(第11期);23-27+45 * |
一种基于第二代赛道存储的面向卷积神经网络的高效内存计算框架;刘必成;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第(2019)01期);I137-195 * |
三维弹性波数值模拟中改进的NPML研究;谢波 等;《地球物理学进展》;20161215;第31卷(第06期);2762-2766 * |
面向嵌入式应用的深度神经网络压缩方法研究;段秉环 等;《航空计算技术》;20180925;第48卷(第05期);50-53 * |
Also Published As
Publication number | Publication date |
---|---|
CN113408718A (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109858613B (en) | Compression method and system of deep neural network and terminal equipment | |
US11748595B2 (en) | Convolution acceleration operation method and apparatus, storage medium and terminal device | |
JP6991983B2 (en) | How and systems to train machine learning systems | |
CN113485837B (en) | Tensor processing method and system based on parallel branches and tensor segmentation | |
KR102585470B1 (en) | Information processing apparatus, information processing method, non-transitory computer-readable storage medium | |
CN109766800B (en) | Construction method of mobile terminal flower recognition model | |
CN110222833B (en) | Data processing circuit for neural network | |
CN112671232B (en) | LLC resonant circuit control method and device and terminal equipment | |
CN110109646A (en) | Data processing method, device and adder and multiplier and storage medium | |
CN111709415B (en) | Target detection method, device, computer equipment and storage medium | |
CN113485836A (en) | Tensor processing method and tensor processing system based on tensor segmentation | |
CN111882038A (en) | Model conversion method and device | |
CN114819159A (en) | Inference method, device, equipment and storage medium of deep learning model | |
CN113408718B (en) | Device processor selection method, system, terminal device and storage medium | |
CN116976432A (en) | Chip simulation method and device supporting task parallel processing and chip simulator | |
CN111162792A (en) | Compression method and device for power load data | |
CN116009675A (en) | Power consumption determining method and device, storage medium and electronic equipment | |
CN115940202A (en) | Multi-inverter power distribution control method, device and equipment based on artificial intelligence | |
KR20150103644A (en) | Method of cryptographic processing of data on elliptic curves, corresponding electronic device and computer program product | |
CN115456009A (en) | Signal processing method, device operation monitoring device, device and medium | |
CN112037814B (en) | Audio fingerprint extraction method and device, electronic equipment and storage medium | |
CN111767204B (en) | Spill risk detection method, device and equipment | |
CN110048404B (en) | Online optimization method and device for low-frequency oscillation suppressor of power system and storage medium | |
CN116384452B (en) | Dynamic network model construction method, device, equipment and storage medium | |
CN112132275A (en) | Parallel computing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |