WO2019144701A1 - 神经网络运算方法、装置以及相关设备 - Google Patents

神经网络运算方法、装置以及相关设备 Download PDF

Info

Publication number
WO2019144701A1
WO2019144701A1 PCT/CN2018/118502 CN2018118502W WO2019144701A1 WO 2019144701 A1 WO2019144701 A1 WO 2019144701A1 CN 2018118502 W CN2018118502 W CN 2018118502W WO 2019144701 A1 WO2019144701 A1 WO 2019144701A1
Authority
WO
WIPO (PCT)
Prior art keywords
pooling
target
clock frequency
layer
time
Prior art date
Application number
PCT/CN2018/118502
Other languages
English (en)
French (fr)
Inventor
孟玉
王玉伟
张立鑫
于潇宇
高剑林
朱建平
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18902036.5A priority Critical patent/EP3745308B1/en
Publication of WO2019144701A1 publication Critical patent/WO2019144701A1/zh
Priority to US16/885,669 priority patent/US11507812B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/08Clock generators with changeable or programmable clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular, to a computing resource adjustment method, apparatus, neural network operation method, apparatus, and related equipment.
  • Convolutional neural network is a kind of artificial neural network. It has become a research hotspot in the fields of speech analysis, image recognition, object classification, etc. Its weight sharing network structure makes it more similar to biological nerves. The network reduces the complexity of the network model and reduces the number of weights.
  • the convolutional neural network is mainly formed by convolutional layers and pooling through various forms of stacking, and the depth of the model is extended from a dozen layers to hundreds of layers.
  • the convolution calculation time and the pool calculation time of the same layer are required to be equal or similar.
  • the convolution calculation and the pool calculation in the algorithm have non-uniformity, that is, the convolution calculation amount and the pool calculation amount of the same layer may be proportional or inversely proportional.
  • the second layer of convolution calculation is twice the calculation of the first layer of convolution, but the second layer of pooling calculation may be 10 times or more than 0.1 times of the first layer of pooling calculation.
  • the application example provides a computing resource adjustment method, which is performed by a computing device, and includes:
  • the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer and obtaining a current clock frequency corresponding to the computing resource unit for performing the pooling processing; the expected pooling of the target pooling layer
  • the difference between the expected time consuming of the convolutional layer of the convolutional layer associated with the target pooling layer is less than a time threshold;
  • the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different, switching the current clock frequency to the target clock frequency, and based on having The computing resource unit of the target clock frequency is pooled at the target pooling layer.
  • the application example also provides a neural network operation method, which is executed by a computing device, and includes:
  • the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer and obtaining a current clock frequency corresponding to the computing resource unit for performing the pooling processing; the expected pooling of the target pooling layer
  • the difference between the expected time consuming of the convolutional layer of the convolutional layer associated with the target pooling layer is less than a time threshold;
  • the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different, switching the current clock frequency to the target clock frequency, and based on having The computing resource unit of the target clock frequency is pooled at the target pooling layer.
  • the application example further provides a computing resource adjusting apparatus, including:
  • An acquiring module configured to acquire a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer, and obtain a current clock frequency corresponding to the computing resource unit used for the pooling process; the target pooling The difference between the expected pooling time consumption of the layer and the convolution time of the convolution layer associated with the target pooling layer is less than a time threshold;
  • a first determining module configured to determine a target clock frequency according to a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer
  • a switching module configured to switch the current clock frequency to the target clock when the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different Frequency, and based on the computing resource unit having the target clock frequency, performing pooling processing at the target pooling layer.
  • the application example also provides a neural network computing device, including:
  • An acquiring module configured to acquire a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer, and obtain a current clock frequency corresponding to the computing resource unit used for the pooling process; the target pooling The difference between the expected pooling time consumption of the layer and the convolution time of the convolution layer associated with the target pooling layer is less than a time threshold;
  • a first determining module configured to determine a target clock frequency according to a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer
  • a switching module configured to switch the current clock frequency to the target clock when the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different Frequency, and based on the computing resource unit having the target clock frequency, performing pooling processing at the target pooling layer.
  • the application example further provides a computing device, including: a processor and a memory;
  • the processor is coupled to a memory, wherein the memory is for storing program code, and the processor is configured to invoke the program code to perform the above method.
  • the present application examples also provide a computer storage medium storing a computer program, the computer program including program instructions that, when executed by a processor, perform the above method.
  • FIG. 1 is a schematic diagram of a scenario of a method for adjusting a computing resource provided by an example of the present application
  • FIG. 2 is a schematic flowchart of a method for adjusting a computing resource provided by an example of the present application
  • FIG. 3 is a schematic flowchart of a method for determining a target clock frequency provided by an example of the present application
  • FIG. 4 is a schematic flow chart of another method for determining a target clock frequency provided by an example of the present application.
  • FIG. 5 is a schematic flowchart of another method for adjusting a computing resource provided by an example of the present application.
  • FIG. 6 is a schematic flowchart of determining a quantity of a computing resource unit provided by an example of the present application
  • FIG. 7 is a schematic diagram of interaction of a switching target clock frequency provided by an example of the present application.
  • FIG. 8 is a schematic structural diagram of a computing resource adjusting apparatus according to an example of the present application.
  • FIG. 9 is a schematic structural diagram of a terminal device according to an example of the present application.
  • the application example provides a computing resource adjustment method, which can be applied to a terminal device or a server, and is particularly suitable for a scenario with a large amount of data computing requirements, for example, a massive yellow map detection or a scenario in which a large number of images need to be classified.
  • the above method is also particularly applicable to scenarios in which terminal devices are relatively sensitive to power consumption, such as smart terminal devices and picture detection and recognition of drones.
  • FIG. 1 is a schematic diagram of a scenario of a method for adjusting a computing resource provided by an example of the present application.
  • the convolutional neural network model has a picture input size of 256 ⁇ 256, so it can be predicted in advance that the convolution of the first convolution layer is expected to take 30 microseconds; the convolution prediction of the second convolutional layer The time is 50 microseconds; the data volume of the first pooling layer is 4 KB, and the data volume of the second pooling layer is 1 KB.
  • the convolution time of the same layer is expected to be close to the expected pooling time of the corresponding pooling layer.
  • the expected pool The time consumption and the convolution time are expected to be similar rather than equal, that is, the expected pooling time of the first pooling layer may be 33 microseconds, and the expected pooling time of the second pooling layer may be 46 microseconds.
  • the average clock frequency of a single computing unit is 300 MHz, and the corresponding computing power value is 1, wherein the conventional condition means that the temperature of the central processing unit is within 40 degrees Celsius.
  • the expected pooling time of the first pooling layer is 33 microseconds
  • the expected pooling time of the second pooling layer is 46 microseconds
  • the minimum of the two pooling layers is The amount of data is 1 KB
  • the computing power value of a single computing unit is 1.
  • the number of computing units used for pooling operations can be calculated in advance to be 5. The number of computing units is determined, and subsequent convolution operations and pooling operations are performed. The lieutenant will not change, only the clock frequency of the calculation unit will be adjusted.
  • the clock frequency controller 100g can calculate the target clock frequency to be 450 MHz in advance.
  • the feature image 100b is generated, and the feature image 100b is stored in the convolutional data storage unit 100f, and the clock frequency controller 100g adjusts the clock frequency of the calculation unit 1 from 300 MHz to 450MHz, the clock frequency of the calculation unit 2 is adjusted from 300MHz to 450MHz, the clock frequency of the calculation unit 3 is adjusted from 300MHz to 450MHz, the clock frequency of the calculation unit 4 is adjusted from 300MHz to 450MHz, and the clock frequency of the calculation unit 5 is 300MHz. Adjusted to 450MHz.
  • the feature image 100b stored in the convolution data storage unit 100f is input to the first pooling layer, and the pooling operation is performed by using five computing units having a clock frequency of 450 MHz, and the pooling operation is performed.
  • the actual pooling time is 28 microseconds, that is, the actual pooling time of the first layer of the pooling layer (28 microseconds) is close to the expected time of the first convolution layer (30 microseconds).
  • the subsequent feature image 100c is stored in the pooled data storage unit 100h for the convolution operation of the second convolutional layer.
  • the feature image 100c is input to the second layer convolution layer for convolution operation, within 50 microseconds of the convolution operation process in the second convolution layer, that is, the convolution prediction of the second prediction layer predicted in advance
  • the time is 50 microseconds
  • the expected pooling time of the second pooling layer takes 46 microseconds
  • the corresponding data volume is 1 KB
  • the current clock frequency corresponding to each computing unit is 450 MHz, here, due to the pooling in the previous round.
  • the clock frequency is adjusted from 300MHz to 450MHz, and the current clock frequency of the second convolutional layer is the same as the current clock frequency of the first pooled layer, so the current clock frequency of the current round is 450MHz, and the clock frequency controller 100g
  • the target clock frequency can be calculated to be 100MHz.
  • the feature image 100d is generated, and the feature image 100d is stored in the convolution data storage unit 100f, and the clock frequency controller 100g adjusts the clock frequency of the calculation unit 1 from 450 MHz.
  • the clock frequency of the calculation unit 2 is adjusted from 450 MHz to 100 MHz
  • the clock frequency of the calculation unit 3 is adjusted from 450 MHz to 100 MHz
  • the clock frequency of the calculation unit 4 is adjusted from 450 MHz to 100 MHz
  • the clock frequency of the calculation unit 5 is The 450MHz is adjusted to 100MHz.
  • the feature image 100d stored in the convolution data storage unit 100f is input to the second pooling layer, and the pooling operation is performed by using five computing units having a clock frequency of 100 MHz, that is, the pooling operation.
  • the actual pooling time is 52 microseconds, and the actual pooling time of the second pooling layer (52 microseconds) is similar to the expected time of the second layer of volume convolution (50 microseconds).
  • the feature image 100e is stored in the pooled data storage unit 100h.
  • the actual pooling time and the convolution time of the same layer are close to each other, and the unevenness of the convolution time and the pooling time can be overcome, saving Computing resources and avoiding that computing resources are idle, which can increase the utilization of computing resources.
  • the specific process of determining the number of computing units, determining the target clock frequency, and adjusting the clock frequency of the computing resource unit refer to the examples corresponding to FIG. 2 to FIG. 7 below.
  • FIG. 2 is a schematic flowchart of a method for adjusting a computing resource provided by an example of the present application. As shown in FIG. 2, the method may include:
  • step S101 the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer are obtained, and the current clock frequency corresponding to the computing resource unit for performing the pooling processing is obtained;
  • the target pooling layer is Desiring that the difference between the expected time consuming of the convolutional layer of the convolutional layer associated with the target pooling layer is less than a time threshold;
  • a convolutional neural network model is formed by stacking multiple convolution layers and multiple pooling layers.
  • the processing order is: convolution layer 1, pooling layer 1, convolution layer 2, pool Layer 2... convolution layer N, pool layer N.
  • Convolution processing is performed in the convolutional layer
  • pooling processing is performed in the pooling layer.
  • the convolution layer i and the pooling layer i are in the same layer, i ⁇ [1, N].
  • the pooling process of the same layer is followed by the convolution process of the corresponding convolution layer, that is, there is a sequential dependency between the convolution data generated by the convolution process and the pooled data generated by the pooling process. Pooled data is generated after the data.
  • the unprocessed pooling layer adjacent to the pooling layer that is being pooled is called the target pooling layer, or the pooling associated with the convolutional layer that is undergoing convolution processing.
  • the layer is called the target pooling layer.
  • the convolutional neural network model has 4 convolutional layers and 4 pooling layers. If the pooling layer 2 is being pooled, the pooling layer 3 (with the pooling layer) 2 adjacent and unprocessed pooling layer) is the target pooling layer; if convolutional layer 2 is undergoing convolution processing, pooling layer 2 (the pooling layer associated with convolutional layer 2) is the target pooling Floor.
  • the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer wherein the expected pooling time of the target pooling layer and the convolutional layer convolution time associated with the target pooling layer are expected to take time Close (because of hardware factors, the expected pooling time and convolution time of the same layer are similar, not equal), that is, the convolution time and pooling layer of the convolution layer at the same layer
  • the difference between the expected pooling time consuming is within a time threshold, and the above time threshold is set in advance according to the performance of the model and the current hardware parameters.
  • the convolutional convolution prediction time-consuming is related to the depth, complexity, and input data of the convolutional neural network model. Therefore, the convolution prediction time of any convolutional layer in the convolutional neural network model can be predicted in advance, correspondingly Predict the expected pooling time of any pooled layer.
  • the convolution of convolutional layer 1 is expected to take 20 microseconds
  • the convolution of convolutional layer 2 is expected to take 30 microseconds
  • the time threshold is 3 milliseconds; correspondingly, pooling corresponding to convolution layer 1
  • the expected pooling time of layer 1 is 19 microseconds (the difference between the convolution estimated time consuming and the expected pooling time is 1 millisecond, less than the time threshold of 3 milliseconds), and the pooling layer corresponding to the convolution layer 2
  • the expected pooling time of 2 is 32 microseconds (the difference between the convolution estimated time consuming and the expected pooling time is 2 milliseconds, less than the time threshold of 3 milliseconds).
  • the clock frequency of the computing resource unit (here, the computing resource unit may be the computing unit 1 to the computing unit 5 in the above example of FIG. 1b) for the pooling process is referred to as the current clock frequency; or
  • the clock frequency of the above computing resource unit may also be referred to as the current clock frequency. It is worth noting that the above computing resource unit is used for pooling processing, so the clock frequency of the computing resource unit remains unchanged during the convolution processing. For example, if the pooling layer 2 is being pooled, the clock frequency for the pooling process is 200 MHz, so the current clock frequency is 200 MHz; if the convolution layer 3 is performing convolution processing, the current clock frequency is still 200 MHz.
  • the computing resource unit is the smallest unit component used for numerical, instruction, and logic operations in the central processing unit.
  • the computing resource unit can work at multiple clock frequencies. The higher the clock frequency, the stronger the computing power and the lower the clock frequency. The lower the calculation resource unit used for pooling processing, the boundary processing, the sliding window processing, the maximum pooling operation, or the average pooling operation in the pooling process.
  • Step S102 determining a target clock frequency according to a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer;
  • step S102 includes: calculating a sum of computational power values of the computing resource units having an average clock frequency to obtain a total calculated value (eg, a computing power value of the computing resource unit having an average clock frequency of 2, for the pool)
  • the amount of data to be processed of the target pooling layer is extracted, and the ratio between the amount of data to be processed of the target pooling layer and the total computing power value is determined as the average pooling time of the target pooling layer.
  • the expected pooling time of the target pooling layer (the difference between the expected pooling time of the target pooling layer and the convolution time of the convolution layer associated with the target pooling layer is within a time threshold)
  • the ratio between the average pooling time and the average pooling time is determined as the speedup ratio of the target pooling layer.
  • the target clock frequency is determined according to the acceleration ratio of the target pooling layer, the acceleration ratio of the reference pooling layer, and the current clock frequency.
  • the reference pooling layer is a pooling layer adjacent to the target pooling layer and being processed or processed, that is, the pooling layer being processed by the pooling layer may be referred to as a reference pooling layer, or is being convolved with
  • the processed convolutional layer adjacent and processed pooled layer may also be referred to as a reference pooling layer.
  • the speedup ratio of the reference pooling layer is determined according to the expected pooling time of the reference pooling layer and the amount of data to be processed of the reference pooling layer, and the specific process of determining the speedup ratio of the reference pooling layer is: extracting the reference pool
  • the amount of data to be processed of the layer, the ratio of the amount of data to be processed of the reference pooling layer to the total computing power value is determined as the average pooling time of the reference pooling layer, and the pooling layer is referred to It is expected that the pooling time (refer to the difference between the expected pooling time of the pooling layer and the convolution time of the convolution layer associated with the reference pooling layer is within the time threshold) and the reference pooling layer
  • the ratio between the average pooling time consumption is determined as the speedup ratio of the reference pooling layer.
  • step S102 can also include calculating a sum of computational power values of the computing resource units having an average frequency to obtain a total calculated value.
  • the amount of data to be processed of the target pooling layer is extracted, and the ratio between the amount of data to be processed of the target pooling layer and the total computing power value is determined as the average pooling time of the target pooling layer.
  • the expected pooling time of the target pooling layer (the difference between the expected pooling time of the target pooling layer and the convolution time of the convolution layer associated with the target pooling layer is within a time threshold)
  • the ratio between the average pooling time and the average pooling time is determined as the speedup ratio of the target pooling layer.
  • the target clock frequency is determined based on the speedup ratio of the target pooling layer and the average clock frequency.
  • the average clock frequency is the average of multiple clock frequencies provided by the system.
  • the target clock frequency can be obtained by calculating the current clock frequency of the resource unit, or by calculating the average clock frequency of the resource, wherein the current clock frequency is a dynamically changing variable, and the average clock frequency is a fixed constant.
  • Step S103 when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and the target clock frequency are different, the current clock frequency is switched to the target clock frequency. And performing pooling processing on the target pooling layer based on the computing resource unit having the target clock frequency.
  • a boundary is added to the outer layer of the input image, generally in the image.
  • the outer layer adds a value of 0 to enlarge the size of the input image, ensuring that the size of the feature image after convolution processing is the same as the size of the input image. Therefore, if the convolution data generated after the convolution processing is continuously detected is a value of 0, it is determined that the convolution data reaches the data boundary.
  • the current clock frequency and the target clock frequency are detected. If the current clock frequency and the target clock frequency are different, the current clock frequency is switched to the target clock frequency, and based on the target
  • the computing resource unit of the clock frequency extracts the convolution data generated by the convolution processing at the target pooling layer, and performs pooling processing (boundary processing, sliding window processing, maximum pooling operation, or average pooling operation) on the convolution data. If the convolution layer associated with the target pooling layer completes the convolution processing, and the current clock frequency and the target clock frequency are the same, the current clock frequency is kept unchanged, and continues to be based on the computing resource unit having the current clock frequency in the target pool. The layer performs pooling on the convolution data generated by the convolution process.
  • the calculation target clock frequency is performed before the target pooling layer performs the pooling process, and may calculate the target clock frequency during the pooling process with reference to the pooling layer, or may be associated with the target pooling layer.
  • the convolutional layer calculates the target clock frequency during the convolution process. During the pooling process of the target pooling layer, the target pooling layer is already the next reference pooling layer, and the unprocessed pooling layer adjacent to the target pooling layer is the next target pooling layer.
  • the convolution time of the corresponding convolution layer are expected to be time-consuming, and the actual pool of the target pooling layer
  • the consumption time is the actual time counted after the pooling process of the target pooling layer is completed.
  • FIG. 3 is a schematic flowchart of a method for determining a target clock frequency provided by an example of the present application.
  • the specific process of determining the target clock frequency includes the following steps S201-S203, and step S201-S203 is a specific example of step S102 in the example corresponding to FIG. 2:
  • Step S201 calculating a sum of the computing power values of all the computing resource units having the average clock frequency, obtaining a total computing power value, and the amount of data to be processed of the target pooling layer and the total computing power value.
  • the ratio between the values is determined as the average pooling time of the target pooling layer;
  • the sum of the computing power values of all computing resource units having an average clock frequency is calculated to obtain a total computing power value
  • the average clock frequency is an average of a plurality of clock frequencies provided by the system.
  • the ratio of the amount of data to be processed of the target pooling layer to the calculated total computing power value is determined as the target average pooling consumption. Time.
  • Step S202 determining a ratio between a desired pooling time of the target pooling layer and an average pooling time of the target pooling layer as an acceleration ratio of the target pooling layer;
  • the ratio between the expected pooling time of the target pooling layer and the average pooling time of the target pooling layer (the expected pooling time of the target pooling layer is divided by the average pooling of the target pooling layer)
  • the time-consuming result is determined as the acceleration ratio of the target pooling layer.
  • Step S203 when the speedup ratio of the reference pooling layer and the speedup ratio of the target pooling layer are the same, set the target clock frequency to be the same clock frequency as the current clock frequency;
  • the target clock frequency is set to the same clock frequency as the current clock frequency, that is, when the target pool is pooled, the current clock frequency is kept unchanged.
  • the target pooling layer has an acceleration ratio of 3
  • the reference pooling layer has an acceleration ratio of 3
  • the current clock frequency is 350 MHz, so the target clock frequency is set to 350 MHz, that is, the target clock frequency is equal to the current clock frequency.
  • the acceleration ratio of the target pooling layer is a value of 1 (indicating that the expected pooling time of the target pooling layer is similar or equal to the actual pooling time)
  • only the target clock frequency is equal to the average clock frequency.
  • the current clock frequency may or may not be equal to the average clock frequency. Therefore, if the target clock frequency is determined by the current clock frequency.
  • the relationship between the acceleration ratio of the target pooling layer, the acceleration ratio of the reference pooling layer, and the current clock frequency is also required.
  • Step S204 when the acceleration ratio of the reference pooling layer and the acceleration ratio of the target pooling layer are different, a ratio between an acceleration ratio of the reference pooling layer and an acceleration ratio of the target pooling layer And determining an acceleration coefficient, and determining a product between the acceleration coefficient and the current clock frequency as the target clock frequency.
  • the ratio between the acceleration ratio of the pooling layer and the acceleration ratio of the target pooling layer is referred to (refer to the speedup ratio of the pooling layer).
  • the target clock frequency is obtained by multiplying the current clock frequency by the result of the acceleration ratio of the target pooling layer.
  • the target pooling layer has an acceleration ratio of 6
  • the reference pooling layer has an acceleration ratio of 3
  • the speedup ratio of the reference pooling layer is generated according to the expected pooling time of the reference pooling layer and the amount of data to be processed of the reference pooling layer; the reference pooling layer is adjacent to the target pooling layer and is Processed or processed pooled layer.
  • the target clock frequency of the computing resource unit of the next pooling process is calculated according to the ratio between the acceleration ratios of the two pooling layers and the current clock frequency.
  • FIG. 4 is a schematic flowchart of another method for determining a target clock frequency provided by an example of the present application.
  • the specific process of determining the target clock frequency includes the following steps S301-S303, and step S301-S303 is a specific example of step S102 in the example corresponding to FIG. 2:
  • Step S301 calculating a sum of the computing power values of all the computing resource units having the average clock frequency, obtaining a total computing power value, and the amount of data to be processed of the target pooling layer and the total computing power value.
  • the ratio between the values is determined as the average pooling time of the target pooling layer;
  • the sum of the computing power values of all computing resource units having an average clock frequency is calculated to obtain a total computing power value
  • the average clock frequency is an average of a plurality of clock frequencies provided by the system.
  • the ratio of the amount of data to be processed of the target pooling layer to the calculated total computing power value is determined as the target average pooling consumption. Time.
  • Step S302 determining a ratio between a desired pooling time of the target pooling layer and an average pooling time of the target pooling layer as an acceleration ratio of the target pooling layer;
  • the ratio between the expected pooling time of the target pooling layer and the average pooling time of the target pooling layer (the expected pooling time of the target pooling layer is divided by the average pooling of the target pooling layer)
  • the time-consuming result is determined as the acceleration ratio of the target pooling layer.
  • Step S303 determining a product of a reciprocal of the acceleration ratio of the target pooling layer and the average clock frequency as the target clock frequency.
  • the target clock frequency the result obtained by multiplying the reciprocal of the acceleration ratio of the target pooling layer by the average clock frequency. That is to say, according to the acceleration ratio of the target pooling layer and the average clock frequency (the average clock frequency remains unchanged), the target clock frequency of the computing resource unit of the next pooling process is calculated.
  • the expected pooling time of the target pooling layer is 100 microseconds
  • the example of the present application obtains the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer, and obtains the current clock frequency corresponding to the computing resource unit used for the pooling process; Determining the expected pooling time of the layer and the amount of data to be processed of the target pooling layer, determining a target clock frequency; when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and When the target clock frequency is different, the current clock frequency is switched to the target clock frequency, and the pooling process is performed at the target pooling layer based on the computing resource unit having the target clock frequency.
  • the clock frequency of the computing resource unit can be dynamically adjusted according to the expected pooling time of the pooling layer and the amount of data to be processed of the pooling layer, the actual pooling time and expected pooling time of any pooling layer are Approaching also makes the actual pooling time of any pooled layer match the expected convolution of the corresponding convolution layer. Therefore, by adjusting the clock frequency of the computing resource unit, the actual pooling time and the convolution time of any layer can be satisfied, so that the idleness of the computing resource can be avoided, thereby increasing the usage rate of the computing resource.
  • FIG. 5 is a schematic flowchart of another method for adjusting a computing resource provided by an example of the present application.
  • the computing resource adjustment method may include:
  • Step S401 predicting that the convolution of the plurality of convolution layers is expected to take time, and determining a desired pooling time of the plurality of pooling layers according to the convolution estimation of the plurality of convolution layers, and according to the plurality of The expected pooling time of the pooling layer determines the pooling expectation value;
  • the convolution prediction time of each convolutional layer in the convolutional neural network model convolution of multiple convolutional layers according to predictions It is expected to take time to determine the expected pooling time of multiple pooling layers. Due to hardware factors, the difference between the convolution time of the convolutional layer located in the same layer and the expected pooling time of the pooled layer is within the time threshold.
  • the expected pooling time of multiple pooling layers determines the pooling expectation value, and the pooling expectation value is obtained by calculating the average value of the expected pooling time of the multiple pooling layers, representing the expected pool. The average value of the time consumption.
  • the convolution of convolutional layer 1 is expected to take 20 microseconds, and the convolution of convolutional layer 2 is expected to take 50 microseconds.
  • the convolution of the convolutional layer 3 is expected to take 10 microseconds; the expected pooling time of the pooled layer 1 corresponding to the convolutional layer 1 is 18 microseconds, corresponding to the desired pool of the pooled layer 3 of the convolutional layer 2.
  • Step S402 the amount of data to be processed of the plurality of pooling layers is predicted, and the amount of data to be processed is determined according to the amount of data to be processed of the plurality of pooled layers;
  • the amount of data to be processed of each pooled layer in the convolutional neural network model is predicted, and the amount of data to be processed in the plurality of pooled layers is The amount of data to be processed with the smallest value is extracted as the amount of the base pooled data.
  • the minimum amount of data to be processed is selected as the amount of the base pooled data in order to reduce the area occupied by the pooling process and maximize the computational efficiency of the computing resource unit.
  • the amount of data to be processed in pooling layer 1 is 2 KB
  • the amount of data to be processed in pooling layer 2 is 3 KB
  • the amount of data to be processed in pooling layer 3 is 5 KB. Therefore, 2KB is the base pooled data amount (2KB ⁇ 3KB ⁇ 5KB).
  • Step S403 determining, according to the pooled expected value, the basic pooled data amount, and the computing power value of the computing resource unit having an average clock frequency, the number of computing resource units used for performing the pooling process;
  • the clock frequency is the average of multiple clock frequencies provided by the system;
  • the product obtained by multiplying the expected value of the pooling by the computing power value of the computing resource unit having the average clock frequency is divided by the amount of the basic pooled data, that is, (the pooled expected value * the computing resource unit having the average clock frequency The calculation capability value)/the amount of the base pooled data, and the number of calculation resource units used for the pooling process.
  • the average clock frequency is the average of multiple clock frequencies provided by the system.
  • the computing power value refers to the ability value of the computing resource unit processing instruction. The higher the computing power value, the stronger the ability to calculate the resource unit processing instruction.
  • FIG. 6 is a schematic flowchart of determining the number of computing resource units provided by the example of the present application.
  • the convolution prediction time TC1-TCn corresponding to the n convolutional layers in the convolutional neural network model is predicted according to the depth of the convolutional neural network model, the algorithm complexity, and the size of the input data, where TCn represents The convolution of the nth convolutional layer is expected to be time consuming; and the expected pooling time TP1-TPn of the pooling layer corresponding to each convolutional layer is predicted according to the plurality of convolution predictions, wherein TPn represents the nth
  • Step S404 acquiring a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer, and acquiring a current clock frequency corresponding to the computing resource unit for performing the pooling processing; the target pooling layer Desiring that the difference between the expected time consuming of the convolutional layer of the convolutional layer associated with the target pooling layer is less than a time threshold;
  • Step S405 determining a target clock frequency according to a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer;
  • step S404 to the step S405 refer to the description of the step S101 to the step S102 in the corresponding example of FIG. 2, and the specific process of determining the target clock frequency can be referred to step S201-step S203 in FIG. 3 above. Steps S301 to S303 in FIG. 4 are not described herein again.
  • Step S406 deleting a clock frequency other than the target clock frequency among the plurality of clock frequencies provided by the system;
  • the system provides multiple clock frequencies for the computing resource unit. To reduce the power consumption, after determining the target clock frequency, the (unused) clock frequencies other than the target clock frequency are deleted.
  • Step S407 when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and the target clock frequency are different, the current clock frequency is switched to the target clock frequency. And performing pooling processing on the target pooling layer based on the computing resource unit having the target clock frequency.
  • step S407 For the specific implementation manner of step S407, refer to the description of step S103 in the corresponding example in FIG. 2 above, and details are not described herein.
  • FIG. 7 is a schematic diagram of interaction of a switching target clock frequency provided by an example of the present application.
  • clock frequency generator 200a in the system generates clocks of different frequencies to drive different computing power values for computing resource units.
  • the convolution data generated in the convolution processing is stored in the convolutional data memory 200d.
  • the clock frequency selector 200b calculates the acceleration ratio of the target pooling layer, thereby determining the target clock frequency.
  • the data boundary detector 200c detects that the convolution data reaches the data boundary (ie, the convolution process is completed)
  • the clock frequency selector 200b calculates the resource unit 1 among the plurality of clock frequencies generated by the clock frequency generator 200a.
  • Resource unit 2...the calculation resource unit n selects the target clock frequency (previous convolution process or pooling process, clock frequency selector 200b has determined the target clock frequency), and deletes the excess clock frequency (ie, masks) To reduce the power consumption of the system. Switching the clock frequency of the computing resource unit to the target clock frequency, and using the computing resource unit having the target clock frequency, the computing resource unit 2 having the target clock frequency, the computing resource unit n having the target clock frequency, and the volume The product data is pooled. Among them, the data boundary detector 200c is continuously detected during the convolution process.
  • the clock frequency generator 200a, the clock frequency selector 200b, and the data boundary detector 200c may be integrated into the clock frequency controller 100g shown in FIG. 1b above.
  • the above method can also be applied to a server, for example, deploying a convolutional neural network model in an FPGA cloud server for cloud computing, that is, deploying on a FPGA hardware in an FPGA cloud server for volume a computational resource pool, a convolutional data storage unit, a computational resource pool for pooling calculations, a pooled data storage unit, and a clock frequency controller (which may be the clock frequency controller 100g shown in FIG. 1b above)
  • the computing resource pool used for performing the pooling calculation may include the computing resource unit 1, the computing resource unit 2, ..., and the computing resource unit n in the example corresponding to FIG.
  • the clock frequency controller calculates the acceleration ratio of the target pooling layer according to the expected pooling time and the average pooling time of the target pooling layer, thereby determining the target clock frequency. .
  • the convolutional data obtained by convolution is stored in the convolutional data storage unit, and the clock frequency controller is provided in the plurality of clocks provided by the FPAG. In the frequency, a predetermined target clock frequency is selected for the computing resource pool used for the pooling calculation, and the excess clock frequency is shielded to reduce the power consumption of the system, and the clock frequency controller switches the current clock frequency to the target.
  • the clock frequency is reused by the computing resource pool for the pooling calculation with the target clock frequency, the convolution data in the convolutional data storage unit is pooled, and the pooled data is stored in the pool.
  • the convolutional neural network model can be smoothly run by constructing a convolutional neural network model in the FPGA cloud server.
  • the example of the present application obtains the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer, and obtains the current clock frequency corresponding to the computing resource unit used for the pooling process; Determining the expected pooling time of the layer and the amount of data to be processed of the target pooling layer, determining a target clock frequency; when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and When the target clock frequency is different, the current clock frequency is switched to the target clock frequency, and the pooling process is performed at the target pooling layer based on the computing resource unit having the target clock frequency.
  • the clock frequency of the computing resource unit can be dynamically adjusted according to the expected pooling time of the pooling layer and the amount of data to be processed of the pooling layer, the actual pooling time and expected pooling time of any pooling layer are Approaching also makes the actual pooling time of any pooled layer match the expected convolution of the corresponding convolution layer. Therefore, by adjusting the clock frequency of the computing resource unit, the actual pooling time and the convolution time of any layer can be satisfied, so that the idleness of the computing resource can be avoided, thereby increasing the usage rate of the computing resource.
  • FIG. 8 is a schematic structural diagram of a computing resource adjustment apparatus provided by an example of the present application.
  • the computing resource adjustment apparatus 1 may include: an obtaining module 11, a first determining module 12, and a switching module 13;
  • the obtaining module 11 is configured to acquire a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer, and obtain a current clock frequency corresponding to the computing resource unit used for the pooling process; the target pool The difference between the expected pooling time consumption of the layer and the convolution time of the convolution layer associated with the target pooling layer is less than a time threshold;
  • a first determining module 12 configured to determine a target clock frequency according to a desired pooling time of the target pooling layer and a data volume to be processed of the target pooling layer;
  • the switching module 13 is configured to switch the current clock frequency to the target when the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different a clock frequency, and based on the computing resource unit having the target clock frequency, performing pooling processing at the target pooling layer.
  • step S101 to step S103 for the implementation of the specific functions of the obtaining module 11, the first determining module 12, and the switching module 13, refer to step S101 to step S103 in the corresponding example of FIG. 2, and details are not described herein.
  • the computing resource adjustment apparatus 1 may include: an obtaining module 11 , a first determining module 12 , and a switching module 13 ; the computing resource adjusting apparatus 1 may further include: a first prediction module 14 , a second prediction module 15 , a second determining module 16, a third determining module 17;
  • the first prediction module 14 is configured to predict a convolution estimation time of the multiple convolution layers, and determine a desired pooling time of the multiple pooling layers according to the convolution estimation time of the multiple convolution layers, and Determining a pooling expectation value according to a desired pooling time of the plurality of pooling layers;
  • the second prediction module 15 is configured to predict a quantity of data to be processed of the plurality of pooling layers, and determine a quantity of the basic pooled data according to the amount of data to be processed of the multiple pooling layers;
  • the second prediction module 15 is configured to extract, in the data volume to be processed of the plurality of pooling layers, a quantity of data to be processed with a minimum value as the amount of the base pooled data;
  • the second determining module 16 is configured to determine, according to the pooled expected value, the basic pooled data amount, and the computing power value of the computing resource unit having an average clock frequency, a computing resource unit used for performing the pooling process.
  • the number; the average clock frequency is the average of multiple clock frequencies provided by the system.
  • a third determining module 17 configured to determine, when the convolutional data generated in the convolutional layer associated with the target pooling layer reaches a data boundary, determine a convolutional layer completion volume associated with the target pooling layer Product processing.
  • step S401 For the specific function implementation manners of the first prediction module 14, the second prediction module 15, and the second determination module 16, refer to step S401 to step S403 in the corresponding example of FIG. 5, and the specific function implementation manner of the third determining module 17 may be implemented. Referring to step S103 in the corresponding example of FIG. 2 above, no further details are provided herein.
  • the first prediction module 14 may include: a first determining unit 141, a second determining unit 142;
  • a first determining unit 141 configured to determine a desired pooling time of the plurality of pooling layers according to a convolution prediction of the plurality of convolution layers; a convolution prediction cost of the convolution layer located in the same layer The difference between the expected pooling time of the hourly pooling layer and the pooling layer is less than the time threshold;
  • the second determining unit 142 is configured to calculate an average value of the expected pooling time of the plurality of pooling layers, and determine the average value as the pooling expected value.
  • step S401 For the specific function implementation manners of the first determining unit 141 and the second determining unit 142, refer to step S401 in the corresponding example in FIG. 5 above, and details are not described herein.
  • the first determining module 12 may include: a third determining unit 121, a fourth determining unit 122, and a fifth determining unit 123;
  • a third determining unit 121 configured to calculate a sum of the computing power values of all the computing resource units having the average clock frequency, obtain a total computing power value, and calculate the amount of data to be processed of the target pooling layer The ratio between the total computing power values is determined as the average pooling time of the target pooling layer;
  • the fourth determining unit 122 is configured to determine a ratio between a desired pooling time of the target pooling layer and an average pooling time of the target pooling layer as an acceleration ratio of the target pooling layer. ;
  • a fifth determining unit 123 configured to determine a target clock frequency according to an acceleration ratio of the target pooling layer, an acceleration ratio of the reference pooling layer, and the current clock frequency; and an acceleration ratio of the reference pooling layer is
  • the reference pooling layer is generated by the expected pooling time of the pooling layer and the amount of data to be processed of the reference pooling layer; the reference pooling layer is a pool adjacent to the target pooling layer and being processed or processed Layer.
  • step S201 For the specific function implementation manners of the third determining unit 121, the fourth determining unit 122, and the fifth determining unit 123, refer to step S201 to step S203 in the corresponding example in FIG. 3, and details are not described herein.
  • the fifth determining module 123 may include: a first determining subunit 1231, a second determining subunit 1232, and deleting the subunit 1233;
  • a first determining sub-unit 1231 configured to set the target clock frequency to be the same clock frequency as the current clock frequency when an acceleration ratio of the reference pooling layer and an acceleration ratio of the target pooling layer are the same ;
  • a second determining sub-unit 1232 configured to: when the speedup ratio of the reference pooling layer is different from the speedup ratio of the target pooling layer, the speedup ratio of the reference pooling layer and the target pooling layer a ratio between the acceleration ratios, determined as an acceleration coefficient, and determining a product between the acceleration coefficient and the current clock frequency as the target clock frequency;
  • step S203 For the specific function implementation manners of the first determining sub-unit 1231, the second determining sub-unit 1232, and the deleting sub-unit 1233, refer to step S203 in the corresponding example in FIG. 3, and details are not described herein.
  • the fifth determining module 123 may include: a first determining subunit 1231, a second determining subunit 1232, deleting the subunit 1233, and further comprising: deleting the subunit 1233,
  • the deleting subunit 1233 is configured to delete a clock frequency other than the target clock frequency among the plurality of clock frequencies provided by the system.
  • step S406 For the specific function implementation of the deletion sub-unit 1233, refer to step S406 in the corresponding example in FIG. 5 above, and details are not described herein.
  • the first determining module 12 may further include: a sixth determining unit 124, a seventh determining unit 125, and an eighth determining unit 126;
  • a sixth determining unit 124 configured to calculate a sum of computing power values of all the computing resource units having the average clock frequency, obtain a total computing power value, and obtain a data volume to be processed of the target pooling layer The ratio between the total computing power values is determined as the average pooling time of the target pooling layer;
  • the seventh determining unit 125 is configured to determine a ratio between a desired pooling time of the target pooling layer and an average pooling time of the target pooling layer as an acceleration ratio of the target pooling layer. ;
  • the eighth determining unit 126 is configured to determine the target clock frequency according to the acceleration ratio of the target pooling layer and the average clock frequency;
  • the eighth determining unit 126 is specifically configured to determine, as the target clock frequency, a product of a reciprocal of the acceleration ratio of the target pooling layer and the average clock frequency.
  • step S301 For the specific function implementation manners of the sixth determining unit 124, the seventh determining unit 125, and the eighth determining unit 126, refer to step S301 to step S303 in the corresponding example in FIG. 4, and details are not described herein.
  • the example of the present application obtains the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer, and obtains the current clock frequency corresponding to the computing resource unit used for the pooling process; Determining the expected pooling time of the layer and the amount of data to be processed of the target pooling layer, determining a target clock frequency; when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and When the target clock frequency is different, the current clock frequency is switched to the target clock frequency, and the pooling process is performed at the target pooling layer based on the computing resource unit having the target clock frequency.
  • the clock frequency of the computing resource unit can be dynamically adjusted according to the expected pooling time of the pooling layer and the amount of data to be processed of the pooling layer, the actual pooling time and expected pooling time of any pooling layer are Approaching also makes the actual pooling time of any pooled layer match the expected convolution of the corresponding convolution layer. Therefore, by adjusting the clock frequency of the computing resource unit, the actual pooling time and the convolution time of any layer can be satisfied, so that the idleness of the computing resource can be avoided, thereby increasing the usage rate of the computing resource.
  • FIG. 9 is a schematic structural diagram of a computing device provided by an example of the present application.
  • the computing device can be a terminal device or a server.
  • the computing resource adjustment apparatus in FIG. 8 above may be applied to the computing device 1000, and the computing device 1000 may include a processor 1001, a network interface 1004, and a memory 1005.
  • the computing device 1000 Also included may be: a user interface 1003, and at least one communication bus 1002. Among them, the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 can include a display and a keyboard.
  • the optional user interface 1003 can also include a standard wired interface and a wireless interface.
  • network interface 1004 can include a standard wired interface, a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • memory 1005 can also be at least one storage device located remotely from processor 1001 described above. As shown in FIG. 9, an operating system, a network communication module, a user interface module, and a device control application may be included in the memory 1005 as a computer storage medium.
  • the network interface 1004 can provide a network communication function; and the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005. Program to achieve:
  • the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer and obtaining a current clock frequency corresponding to the computing resource unit for performing the pooling processing; the expected pooling of the target pooling layer
  • the difference between the expected time consuming of the convolutional layer of the convolutional layer associated with the target pooling layer is less than a time threshold;
  • the convolution layer associated with the target pooling layer completes a convolution process, and the current clock frequency and the target clock frequency are different, switching the current clock frequency to the target clock frequency, and based on having The computing resource unit of the target clock frequency is pooled at the target pooling layer.
  • the processor 1001 also performs the following steps:
  • Predicting the convolution of multiple convolutional layers is expected to take time, and determining the expected pooling time of the plurality of pooling layers according to the convolution prediction of the plurality of convolutional layers, and according to the plurality of pooling layers Expectation pooling time-consuming to determine pooling expectations;
  • the average clock frequency is The average of multiple clock frequencies provided by the system.
  • the processor 1001 determines a desired pooling time of the plurality of pooling layers according to the convolution estimated time consumption according to the multiple convolution layers, and according to the multiple pooling layers When you want to determine the pooling expectation value, you need to perform the following steps:
  • An average of the expected pooling time of the plurality of pooled layers is calculated and the average is determined as the pooled expected value.
  • the processor 1001 determines the amount of data to be processed according to the amount of data to be processed of the multiple pooling layers, the processor 1001 performs the following steps:
  • the amount of data to be processed having the smallest value is extracted as the amount of data to be processed in the plurality of pooled layers to be processed.
  • the processor 1001 performs the following steps when determining the target clock frequency according to the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer. :
  • the time-consuming and the amount of data to be processed of the reference pooling layer are generated; the reference pooling layer is a pooling layer adjacent to the target pooling layer and being processed or processed.
  • the processor 1001 determines the target clock frequency according to the acceleration ratio of the target pooling layer, the speed ratio of the reference pooling layer, and the current clock frequency, the processor 1001 performs the following steps:
  • determining a ratio between an acceleration ratio of the reference pooling layer and an acceleration ratio of the target pooling layer is determined as An acceleration factor is determined and the product between the acceleration coefficient and the current clock frequency is determined as the target clock frequency.
  • the processor 1001 also performs the following steps:
  • the clock frequency other than the target clock frequency is deleted from the plurality of clock frequencies provided by the system.
  • the processor 1001 performs the following steps when determining the target clock frequency according to the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer. :
  • the target clock frequency is determined according to an acceleration ratio of the target pooling layer and the average clock frequency.
  • the processor 1001 determines the target clock frequency according to the acceleration ratio of the target pooling layer and the average clock frequency, the processor 1001 performs the following steps:
  • a product of the reciprocal of the acceleration ratio of the target pooling layer and the average clock frequency is determined as the target clock frequency.
  • the processor 1001 also performs the following steps:
  • the example of the present application obtains the expected pooling time of the target pooling layer and the amount of data to be processed of the target pooling layer, and obtains the current clock frequency corresponding to the computing resource unit used for the pooling process; Determining the expected pooling time of the layer and the amount of data to be processed of the target pooling layer, determining a target clock frequency; when the convolution layer associated with the target pooling layer completes convolution processing, and the current clock frequency and When the target clock frequency is different, the current clock frequency is switched to the target clock frequency, and the pooling process is performed at the target pooling layer based on the computing resource unit having the target clock frequency.
  • the clock frequency of the computing resource unit can be dynamically adjusted according to the expected pooling time of the pooling layer and the amount of data to be processed of the pooling layer, the actual pooling time and expected pooling time of any pooling layer are Approaching also makes the actual pooling time of any pooled layer match the expected convolution of the corresponding convolution layer. Therefore, by adjusting the clock frequency of the computing resource unit, the actual pooling time and the convolution time of any layer can be satisfied, so that the idleness of the computing resource can be avoided, thereby increasing the usage rate of the computing resource.
  • the terminal device 1000 described in the example of the present application may perform the description of the method for adjusting the computing resource in the corresponding example in FIG. 2 to FIG. 7 , and may also perform the computing resource in the corresponding example in FIG. 8 .
  • the description of the adjustment device will not be described here.
  • the description of the beneficial effects of the same method will not be repeated.
  • the application example further provides a computer storage medium, and the computer storage medium stores the computer program executed by the computing resource adjustment device 1 mentioned above, and the computer program includes The program instruction, when the processor executes the program instruction, can perform the description of the computing resource adjustment method in the examples corresponding to the foregoing FIG. 2 to FIG. 7 , and therefore, no further description is made herein.
  • the description of the beneficial effects of the same method will not be repeated.
  • the computer storage medium examples referred to in this application please refer to the description of the method examples of the present application.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Neurology (AREA)
  • Power Sources (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)

Abstract

本申请实例公开了一种计算资源调整方法、装置以及相关设备,所述方法包括:获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。

Description

神经网络运算方法、装置以及相关设备
本申请要求于2018年01月25日提交中国专利局、申请号为201810072678.9、发明名称为“一种计算资源调整方法、装置以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种计算资源调整方法、装置、神经网络运算方法、装置以及相关设备。
背景
卷积神经网络(Convolution neural network,CNN)是人工神经网络的一种,现已成语音分析、图像识别、物体分类等领域的研究热点,它的权值共享网络结构使之更类似于生物神经网络,降低了网络模型的复杂度,减少了权值的数量。卷积神经网络主要是由卷积层和池化层(pooling)经过多种形式的堆叠形成,模型的深度也从十几层扩展到上百层。
随着模型深度的增加,对计算力的要求也逐渐增加。为了提高同一层算法的并行效率,要求同一层的卷积计算时间和池化计算时间相等或者相近。但算法中卷积计算和池化计算存在不均匀性,即是同一层的卷积计算量和池化计算量之间可能是正比例关系也可能是反比例关系。例如,第二层卷积计算量是第一层卷积计算量的2倍,但第二层池化计算量可能是第一层池化计算量的10倍,也可能是0.1倍。因此,若想要同一层的池化计算时间匹配卷积计算时间,只能优先满足条件最恶劣的场景要求,就会消耗大量的计算资源,而实际运行时存在计算资源空闲,造成计算资源的使用率低下。
技术内容
本申请实例提供了一种计算资源调整方法,由计算设备执行,包括:
获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时 与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
本申请实例还提供了一种神经网络运算方法,由计算设备执行,包括:
获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
本申请实例还提供了一种计算资源调整装置,包括:
获取模块,用于获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
第一确定模块,用于根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
切换模块,用于当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
本申请实例还提供了一种神经网络运算装置,包括:
获取模块,用于获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的 期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
第一确定模块,用于根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
切换模块,用于当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
本申请实例还提供了一种计算设备,包括:处理器和存储器;
所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行上述方法。
本申请实例还提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行上述方法。
附图简要说明
为了更清楚地说明本申请实例或现有技术中的技术方案,下面将对实例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a-图1b是本申请实例提供的一种计算资源调整方法的场景示意图;
图2是本申请实例提供的一种计算资源调整方法的流程示意图;
图3是本申请实例提供的一种确定目标时钟频率方法的流程示意图;
图4是本申请实例提供的另一种确定目标时钟频率方法的流程示意图;
图5是本申请实例提供的另一种计算资源调整方法的流程示意图;
图6是本申请实例提供的一种确定计算资源单位数量的流程示意图;
图7是本申请实例提供的一种切换目标时钟频率的交互示意图;
图8是本申请实例提供的一种计算资源调整装置的结构示意图;
图9是本申请实例提供的一种终端设备的结构示意图。
实施方式
下面将结合本申请实例中的附图,对本申请实例中的技术方案进行清楚、完整地描述,显然,所描述的实例仅仅是本申请一部分实例,而不是全部的实例。基于本申请中的实例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实例,都属于本申请保护的范围。
本申请实例提出一种计算资源调整方法,该方法可应用于终端设备或服务器,特别适用于有海量数据计算需求的场景,例如,海量的黄图检测或者需对海量图片进行分类的场景。上述方法还特别应用于终端设备对功耗比较敏感的场景,例如智能终端设备、无人机的图片检测识别。
本申请下述实例中,将以上述方法应用于终端设备为例进行说明。
请参见图1a-图1b,是本申请实例提供的一种计算资源调整方法的场景示意图。
如图1a所示,以卷积神经网络模型中的2个卷积层(第一卷积层、第二卷积层)和2个池化层(第一池化层、第二池化层)为例进行说明,卷积神经网络模型的图片输入大小为256×256,因此可以提前预测到第一卷积层的卷积预计耗时为30微秒;第二卷积层的卷积预计耗时为50微秒;第一池化层的数据量为4KB,第二池化层的数据量为1KB。为了满足同一层的卷积运算耗时和池化运算耗时相匹配,因此同一层的卷积预计耗时和对应池化层的期望池化耗时相接近,这里,由于硬件因素,期望池化耗时和卷积预计耗时之间是相近而不是相等,也就是预测第一池化层的期望池化耗时可能为33微秒,第二池化层的期望池化耗时可能为46微秒。
在常规条件下单个计算单元的平均时钟频率为300MHz,对应的计算能力值为1,其中,常规条件是指中央处理器的温度在40摄氏度以内。如图1b所示,根据上述条件,即第一池化层的期望池化耗时为33微秒、第二池化层的期望池化耗时为46微秒、两个池化层中最小数据量为1KB、单个计算单元的计算能力值为1,可以提前计算出用于池化运算的计算单元的数量为5,计算单元的数量一旦被确定,在后续的卷积运算和池化运算中将不再更改,只调整计算单元的时钟频率。确定了计算单元的数量后,输入一张大小为256×256的图片100a至卷积神经网络的输入层,通过将卷积核作为一个窗口依次滑过图片100a的所有区域,即卷积核在图片100a 中做卷积运算。在第一卷积层中的卷积运算过程的30微秒内,即上述提前预测的第一卷积层的卷积预计耗时为30微秒,根据第一池化层的期望池化耗时33微秒,对应的数据量4KB,以及计算单元的当前时钟频率300MHz,时钟频率控制器100g可以提前计算出目标时钟频率为450MHz。
图片100a在第一卷积层中的卷积运算完成后,生成特征图像100b,特征图像100b存储在卷积数据存储单元100f中,时钟频率控制器100g将计算单元1的时钟频率由300MHz调整为450MHz、将计算单元2的时钟频率由300MHz调整为450MHz、将计算单元3的时钟频率由300MHz调整为450MHz、将计算单元4的时钟频率由300MHz调整为450MHz、将计算单元5的时钟频率由300MHz调整为450MHz。所有的计算单元调整完毕后,将存储在卷积数据存储单元100f中的特征图像100b输入至第一池化层,并利用时钟频率为450MHz的5个计算单元执行池化运算,池化运算的实际池化耗时为28微秒,即第一层池化层的实际池化耗时(28微秒)和第一卷积层的卷积预计耗时(30微秒)相接近,池化后的特征图像100c的存储在池化数据存储单元100h中,用于第二卷积层的卷积运算。
特征图像100c输入至第二层卷积层中用于卷积运算,在第二卷积层中的卷积运算过程的50微秒内,即上述提前预测的第二卷积层的卷积预计耗时为50微秒,根据第二池化层的期望池化耗时46微秒,对应的数据量1KB,以及每个计算单元对应的当前时钟频率450MHz,这里,由于在上一轮池化计算过程中,时钟频率由300MHz调整为450MHz,而第二卷积层的当前时钟频率与上述第一池化层的当前时钟频率相同,因此本轮的当前时钟频率为450MHz,时钟频率控制器100g可以计算出目标时钟频率为100MHz。
特征图像100c在第二卷积层中的卷积运算完成后,生成特征图像100d,特征图像100d存储在卷积数据存储单元100f中,时钟频率控制器100g将计算单元1的时钟频率由450MHz调整为100MHz、将计算单元2的时钟频率由450MHz调整为100MHz、将计算单元3的时钟频率由450MHz调整为100MHz、将计算单元4的时钟频率由450MHz调整为100MHz、将计算单元5的时钟频率由450MHz调整为100MHz。所有的计算单元调整完毕后,将存储在卷积数据存储单元100f中的特征图像100d输入至第二池化层,并利用时钟频率为100MHz的5个计算单元执行池化运算,即池化运算的实际池化耗时为52微秒,第二池化层的实际池化耗时(52微 秒)和第二卷积层的卷积预计耗时(50微秒)相接近,池化后的特征图像100e存储在池化数据存储单元100h中。
上述技术方案中,通过调整计算单元的时钟频率,使得同一层的池化实际池化耗时和卷积预计耗时相接近,可以克服卷积耗时和池化耗时的不均匀性,节约计算资源,并可避免计算资源处于空闲状态,从而可以提高计算资源的使用率。其中,确定计算单元的数量、确定目标时钟频率并调整计算资源单位的时钟频率的具体过程可以参见以下图2至图7所对应的实例。
进一步地,请参见图2,是本申请实例提供的一种计算资源调整方法的流程示意图。如图2所示,所述方法可以包括:
步骤S101,获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
具体的,一个卷积神经网络模型是由多个卷积层和多个池化层堆叠而成,在流程上,处理顺序为:卷积层1、池化层1、卷积层2、池化层2...卷积层N、池化层N。在卷积层中执行卷积处理,在池化层中执池化处理。其中,卷积层i和池化层i处于同一层,i∈[1,N]。同一层的池化处理是在对应卷积层的卷积处理之后,也就是说卷积处理所生成的卷积数据和池化处理生成的池化数据之间存在先后依赖关系,先生成卷积数据后生成池化数据。
卷积神经网络模型中,与正在进行池化处理的池化层相邻且未处理的池化层称之为目标池化层,或者与正在进行卷积处理的卷积层相关联的池化层称之为目标池化层,例如,卷积神经网络模型存在4个卷积层和4个池化层,若池化层2正在进行池化处理,因此池化层3(与池化层2相邻且未处理的池化层)是目标池化层;若卷积层2正在进行卷积处理,因此池化层2(与卷积层2相关联的池化层)是目标池化层。
获取目标池化层的期望池化耗时和目标池化层的待处理数据量,其中目标池化层的期望池化耗时和目标池化层相关联的卷积层的卷积预计耗时相接近(由于硬件因素,同一层的期望池化耗时和卷积预计耗时之间是相近,而不是相等),即是位于同一层的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值在时间阈值范 围内,上述时间阈值是根据模型的性能以及当前硬件参数提前设置好的。
卷积层的卷积预计耗时与卷积神经网络模型的深度、复杂度以及输入数据有关,因此可以提前预测得到卷积神网络模型中任意卷积层的卷积预计耗时,对应地可以预测任意池化层的期望池化耗时。例如,卷积层1的卷积预计耗时为20微秒,卷积层2的卷积预计耗时为30微秒,时间阈值为3毫秒;对应地,与卷积层1对应地池化层1的期望池化耗时为19微秒(卷积预计耗时和期望池化耗时之间的差值为1毫秒,小于时间阈值3毫秒),与卷积层2对应地池化层2的期望池化耗时为32微秒(卷积预计耗时和期望池化耗时之间的差值为2毫秒,小于时间阈值3毫秒)。
正在进行池化处理时,用于池化处理的计算资源单位(此处的计算资源单位可以为上述图1b实例中的计算单元1至计算单元5)的时钟频率称之为当前时钟频率;或者正在进行卷积处理时,上述计算资源单位的时钟频率也可以称之为当前时钟频率。值得注意的是,上述计算资源单位用于池化处理,因此卷积处理过程中计算资源单位的时钟频率保持不变。举例来说,若池化层2正在进行池化处理,用于池化处理的时钟频率为200MHz,因此当前时钟频率为200MHz;若卷积层3正在进行卷积处理,当前时钟频率仍为200MHz,这是因为在卷积层3中进行卷积处理过程中,计算资源单位的当前时钟频率与相邻且已处理的池化层2中的时钟频率相等,也即是从池化层2开始池化处理一直到卷积层3卷积处理完成,期间用于池化处理的计算资源单位的当前时钟频率一直保持200MHz不变。其中,计算资源单位是中央处理器中用于数值、指令、逻辑运算的最小单位部件,计算资源单位可以工作在多个时钟频率下,时钟频率越高运算能力越强,时钟频率越低运算能力越低;用于池化处理的计算资源单位可以用于实现池化处理中的边界处理、滑窗处理、最大池化运算或者平均池化运算。
步骤S102,根据所述目标池化层的期望池化耗时、所述目标池化层的待处理数据量,确定目标时钟频率;
在一些实例中,步骤S102包括:计算具有平均时钟频率的计算资源单位的计算能力值的总和,得到总计算值(例如,具有平均时钟频率的计算资源单位的计算能力值为2,用于池化处理的计算资源单位的数量是3,因此总计算能力值为3×2=6),其中平均时钟频率是系统提供的多个时钟频率的平均值,计算能力值是指计算资源单位处理指令的能力值,时钟频率越高对应的计算资源单位的能力值越高对应的功 耗越高,时钟频率越低对应的计算资源单位的能力值越低对应的功耗越低,该系统可以是指基于FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)硬件平台的系统,或者也可以是指ASIC(专用集成电路)硬件平台的系统。提取目标池化层的待处理数据量,将上述目标池化层的待处理数据量与总计算能力值之间的比值,确定为目标池化层的平均池化耗时。将目标池化层的期望池化耗时(目标池化层的期望池化耗时与目标池化层相关联的卷积层的卷积预计耗时之间的差值在时间阈值范围内)与平均池化耗时之间的比值,确定为目标池化层的加速比。根据目标池化层的加速比、参照池化层的加速比、当前时钟频率,确定目标时钟频率。
其中,参照池化层是与目标池化层相邻且正在处理或者已处理的池化层,即正在进行池化处理的池化层可以称之为参照池化层,或者与正在进行卷积处理的卷积层相邻且已处理的池化层也可以称之为参照池化层。同理参照池化层的加速比是根据参照池化层的期望池化耗时和参照池化层的待处理数据量确定的,确定参照池化层的加速比的具体过程为:提取参照池化层的待处理数据量,将上述参照池化层的待处理数据量与上述总计算能力值之间的比值,确定为参照池化层的平均池化耗时,并将参照池化层的期望池化耗时(参照池化层的期望池化耗时与参照池化层相关联的卷积层的卷积预计耗时之间的差值在时间阈值范围内)与参照池化层的平均池化耗时之间的比值(参照池化层的期望池化耗时除以参照池化层的平均池化耗时所得的结果),确定为参照池化层的加速比。
在一些实例中,步骤S102还可以包括:计算具有平均频率的计算资源单位的计算能力值的总和,得到总计算值。提取目标池化层的待处理数据量,将上述目标池化层的待处理数据量与总计算能力值之间的比值,确定为目标池化层的平均池化耗时。将目标池化层的期望池化耗时(目标池化层的期望池化耗时与目标池化层相关联的卷积层的卷积预计耗时之间的差值在时间阈值范围内)与平均池化耗时之间的比值,确定为目标池化层的加速比。根据目标池化层的加速比和平均时钟频率,确定目标时钟频率。其中,平均时钟频率是系统提供的多个时钟频率的平均值。上述可见,目标时钟频率可以通过计算资源单位的当前时钟频率得到,也可以通过计算资源的平均时钟频率得到,其中当前时钟频率是一个动态变化的变量,而平均时钟频率是一个固定的常量。
步骤S103,当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟 频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
具体的,当检测到目标池化层相关联的卷积层中所生成的卷积数据到达数据边界时,确定目标池化层相关联的卷积层完成卷积处理。
例如,当输入卷积神经网络模型中的是图像时,为了使卷积处理后生成的特征图像的大小和输入图像的大小保持一致,在输入图像的外层加上边界,一般说来在图像外层添加数值0,以将输入图像尺寸扩大,保证卷积处理后的特征图像的尺寸和输入图像的尺寸大小一致。因此若连续检测到卷积处理后生成的卷积数据是数值0,则确定该卷积数据到达数据边界。
当目标池化层相关联的卷积层完成卷积处理,检测当前时钟频率和目标时钟频率,若当前时钟频率和目标时钟频率不同,则将当前时钟频率切换为目标时钟频率,并基于具有目标时钟频率的计算资源单位,在目标池化层提取卷积处理所生成的卷积数据,对上述卷积数据进行池化处理(边界处理、滑窗处理、最大池化运算或者平均池化运算);若目标池化层相关联的卷积层完成卷积处理,且当前时钟频率和目标时钟频率相同,则保持当前时钟频率不变,并继续基于具有当前时钟频率的计算资源单位,在目标池化层对由卷积处理所生成的卷积数据进行池化处理。
可以看出,计算目标时钟频率是在目标池化层进行池化处理之前执行的,可以是在参照池化层进行池化处理过程中计算目标时钟频率,也可以是在目标池化层相关联的卷积层进行卷积处理过程中计算目标时钟频率。在目标池化层进行池化处理过程中,目标池化层已经是下一个参照池化层,而与目标池化层相邻且未处理的池化层是下一个目标池化层。通过切换计算资源单位的时钟频率,使得目标池化层的实际池化耗时和目标池化层的期望耗时、对应卷积层的卷积预计耗时均相近,目标池化层的实际池化耗时是目标池化层的池化处理完成后,所统计出来的实际时长。
请一并参见图3,是本申请实例提供的一种确定目标时钟频率方法的流程示意图。如图3所示,确定目标时钟频率的具体过程包括如下步骤S201-步骤S203,且步骤S201-步骤S203为图2所对应实例中步骤S102的一个具体实例:
步骤S201,计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能 力值之间的比值,确定为所述目标池化层的平均池化耗时;
具体的,计算所有具有平均时钟频率的计算资源单位的计算能力值的总和,得到总计算能力值,平均时钟频率是指系统提供的多个时钟频率的平均值。将目标池化层的待处理数据量与计算出的总计算能力值之间的比值(目标池化层的待处理数据量除以总计算能力值所得的结果),确定为目标平均池化耗时。
步骤S202,将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
具体的,将目标池化层的期望池化耗时与目标池化层的平均池化耗时之间的比值(目标池化层的期望池化耗时除以目标池化层的平均池化耗时所得的结果),确定为目标池化层的加速比。
步骤S203,当所述参照池化层的加速比和所述目标池化层的加速比相同时,将所述目标时钟频率设置为与所述当前时钟频率相同的时钟频率;
具体的,提取参照池化层的加速比,并检测目标池化层的加速比和参照池化层的加速比,当目标池化层的加速比和参照池化层的加速比相同时,将目标时钟频率设置为与当前时钟频率相同的时钟频率,即目标池化进行池化处理时,保持当前时钟频率不变。举例来说,目标池化层的加速比为3,参照池化层的加速比也为3,当前时钟频率为350MHz,因此将目标时钟频率设置为350MHz,即目标时钟频率和当前时钟频率相等。
值得注意的是,即使目标池化层的加速比为数值1(说明目标池化层的期望池化耗时和实际池化耗时相近或相等),只能说明目标时钟频率与平均时钟频率相等,而与当前时钟频率不存在任何线性关系,这是因为当前时钟频率是一个不断调整的变量,当前时钟频率与平均时钟频率可能相等也可能不等,因此若通过当前时钟频率来确定目标时钟频率,还需要根据目标池化层的加速比、参照池化层的加速比、当前时钟频率3个变量之间的关系。
步骤S204,当所述参照池化层的加速比和所述目标池化层的加速比不同时,将所述参照池化层的加速比和所述目标池化层的加速比之间的比值,确定为加速系数,并将所述加速系数和所述当前时钟频率之间的乘积,确定为所述目标时钟频率。
具体的,当目标池化层的加速比和参照池化层的加速比不同时,将参照池化层的加速比和目标池化层的加速比之间的比值(参照池化层的加速比除以目标池化层 的加速比所得的结果)乘以当前时钟频率,得到目标时钟频率。
例如,目标池化层的加速比为6,参照池化层的加速比为3,当前时钟频率为220MHz,因此目标时钟频率为:(3/6)×220MHz=110MHz。其中,参照池化层的加速比是根据参照池化层的期望池化耗时和参照池化层的待处理数据量生成的;参照池化层是与所述目标池化层相邻且正在处理或者已处理的池化层。
也就是说,根据两个池化层的加速比之间的比值和当前的时钟频率,计算出下一次池化处理的计算资源单位的目标时钟频率。举例来说,参照池化层正在进行池化处理,而当前时钟频率为200MHz;目标池化层的期望池化耗时为40微秒,目标池化层的平均池化耗时为10微秒,因此目标池化层的加速比为4;参照池化层的期望池化耗时为60微秒,参照池化层的平均池化耗时为30微秒,因此参照池化层的加速比为2;根据目标池化层的加速比、参照池化层的加速比、当前时钟频率之间的比例关系,目标时钟频率为:100MHz((2/4)×200MHz=100MHz)。
请一并参见图4,是本申请实例提供的另一种确定目标时钟频率方法的流程示意图。如图4所示,确定目标时钟频率的具体过程包括如下步骤S301-步骤S303,且步骤S301-步骤S303为图2所对应实例中步骤S102的一个具体实例:
步骤S301,计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
具体的,计算所有具有平均时钟频率的计算资源单位的计算能力值的总和,得到总计算能力值,平均时钟频率是指系统提供的多个时钟频率的平均值。将目标池化层的待处理数据量与计算出的总计算能力值之间的比值(目标池化层的待处理数据量除以总计算能力值所得的结果),确定为目标平均池化耗时。
步骤S302,将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
具体的,将目标池化层的期望池化耗时与目标池化层的平均池化耗时之间的比值(目标池化层的期望池化耗时除以目标池化层的平均池化耗时所得的结果),确定为目标池化层的加速比。
步骤S303,将所述目标池化层的加速比的倒数和所述平均时钟频率的乘积,确定为所述目标时钟频率。
具体的,将目标池化层的加速比的倒数乘以平均时钟频率所得的结果确定为目标时钟频率。也就是说,根据目标池化层的加速比和平均时钟频率(平均时钟频率保持不变),计算出下一次池化处理的计算资源单位的目标时钟频率。举例来说,目标池化层的期望池化耗时为100微秒,目标池化层的平均池化耗时为20微秒,因此目标池化层的加速比为5,平均时钟频率为500MHz,因此目标时钟频率为:100MHz((1/5)×500MHz=100MHz)。
本申请实例通过获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。由于可以根据池化层的期望池化耗时和池化层的待处理数据量而动态地调整计算资源单位的时钟频率,使得任意池化层的实际池化耗时和期望池化耗时相接近,也使得任意池化层的实际池化耗时和对应的卷积层的卷积预计耗时相匹配。因此,通过调整计算资源单位的时钟频率,可以满足任意层的实际池化耗时和卷积预计耗时相接近的要求,从而可以避免计算资源的空闲,进而提高计算资源的使用率。
进一步的,请参见图5,是本申请实例提供的另一种计算资源调整方法的流程示意图。如图5所示,计算资源调整方法可以包括:
步骤S401,预测多个卷积层的卷积预计耗时,并根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值;
具体的,通过卷积神经网络模型的深度、复杂度以及输入数据的大小,预测卷积神经网络模型中每一个卷积层的卷积预计耗时,根据预测的多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时。由于硬件因素,位于相同层中的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值在时间阈值范围内。
根据卷积神经网络模型中,多个池化层的期望池化耗时确定池化期望值,池化期望值是通过计算多个池化层的期望池化耗时的平均值得到的,代表期望池化耗时的平均值。
例如,卷积神经网络模型中存在3个卷积层和3个池化层,卷积层1的卷积预计耗时为20微秒,卷积层2的卷积预计耗时为50微秒,卷积层3的卷积预计耗时为10微秒;对应卷积层1的池化层1的期望池化耗时为18微秒,对应卷积层2的池化层3的期望池化耗时为53微秒,对应卷积层3的池化层3的期望池化耗时为12微秒,对上述3个池化层的期望池化耗时计算平均值,得到池化期望值为:28微秒((18微秒+53微秒+12微秒)/3=28微秒)。
步骤S402,预测多个池化层的待处理数据量,根据所述多个池化层的待处理数据量确定基础池化数据量;
具体的,根据卷积神经网络模型的深度、复杂度以及输入数据的大小,预测卷积神经网络模型中每一个池化层的待处理数据量,在多个池化层的待处理数据量中提取数值最小的待处理数据量,作为基础池化数据量。选取最小的待处理数据量作为基础池化数据量是为了减小池化处理占用计算芯片的面积,最大化提升计算资源单位的计算效率。
例如,卷积神经网络模型中存在3个池化层,池化层1的待处理数据量为2KB,池化层2的待处理数据量为3KB,池化层3的待处理数据量为5KB,因此,2KB为基础池化数据量(2KB<3KB<5KB)。
步骤S403,根据所述池化期望值、所述基础池化数据量、具有平均时钟频率的所述计算资源单位的计算能力值,确定用于进行池化处理的计算资源单位的数量;所述平均时钟频率是系统提供的多个时钟频率的平均值;
具体的,将池化期望值乘以具有平均时钟频率的计算资源单位的计算能力值所得到的乘积,再除以基础池化数据量,即(池化期望值*具有平均时钟频率的计算资源单位的计算能力值)/基础池化数据量,得到用于进行池化处理的计算资源单位的数量。该数量一旦确定,在后续的卷积处理和池化处理过程中,都不再调整计算资源单位的数量,而只调整计算资源单位的频率。平均时钟频率是系统提供的多个时钟频率的平均值,计算能力值是指计算资源单位处理指令的能力值,计算能力值越高计算资源单位处理指令的能力就越强。
请一并参考图6,是本申请实例提供的一种确定计算资源单位数量的流程示意图。如图6所示,根据卷积神经网络模型的深度、算法复杂度和输入数据的大小预测卷积神经网络模型中n个卷积层分别对应的卷积预计耗时TC1-TCn,其中TCn表 示第n个卷积层的卷积预计耗时;并根据上述多个卷积预计耗时预测每个卷积层对应的池化层的期望池化耗时TP1-TPn,其中TPn表示第n个池化层的期望池化耗时;计算多个期望池化耗时的平均值得到池化期望值Avg_TP,即Avg_TP=(TP1+TP2+...+TPn)/n。预测n个池化层的待处理数据量VP1-VPn,其中VPn表示第n个池化层的待处理数据量,在上述多个待处理数据量中提取数值最小的待处理数据量作为基础池化数据量Min_VP。根据池化期望值Avg_TP、基础池化数据量Min_VP、以及具有平均时钟频率的计算资源单位的计算能力值V,可以提前计算出计算资源单位的数量Num,Num=(Avg_TP*V)/Min_VP。
步骤S404,获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
步骤S405,根据所述目标池化层的期望池化耗时、所述目标池化层的待处理数据量,确定目标时钟频率;
其中,步骤S404-步骤S405的具体实现方式可参见上述图2所对应实例中对步骤S101-步骤S102的描述,且确定目标时钟频率的具体过程可以参见上述图3中的步骤S201-步骤S203以及图4中的步骤S301-步骤S303,这里不再赘述。
步骤S406,将所述系统提供的多个时钟频率中除所述目标时钟频率以外的时钟频率删除;
具体的,系统会为计算资源单位提供多个时钟频率,为了降低功耗,确定目标时钟频率后,将目标时钟频率以外的(未使用的)时钟频率均删除。
步骤S407,当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
其中,步骤S407的具体实现方式可参见上述图2所对应实例中对步骤S103的描述,这里将不在继续进行赘述。
请一并参见图7,是本申请实例提供的一种切换目标时钟频率的交互示意图。如图7所示,系统中的时钟频率产生器200a生成不同频率的时钟,以驱动计算资源 单位的不同计算能力值。卷积数据存储器200d中存储卷积处理后生成的卷积数据。当正在进行卷积处理或正在进行池化处理时,时钟频率选择器200b计算目标池化层的加速比,进而确定目标时钟频率。当数据边界检测器200c检测到卷积数据到达数据边界时(即卷积处理完成),时钟频率选择器200b在时钟频率产生器200a所生成的多个时钟频率中,为计算资源单位1、计算资源单位2...计算资源单位n选择目标时钟频率(之前卷积处理过程或者池化处理过程,时钟频率选择器200b已经确定了目标时钟频率),并将多余的时钟频率删除(即屏蔽),以降低系统的功耗。将计算资源单位的时钟频率切换为目标时钟频率,并利用具有目标时钟频率的计算资源单位1、具有目标时钟频率的计算资源单位2...、具有目标时钟频率的计算资源单位n,对卷积数据进行池化处理。其中,数据边界检测器200c在卷积处理过程中是一直持续检测的。其中,时钟频率产生器200a、时钟频率选择器200b、数据边界检测器200c可以集成为上述图1b中所示的时钟频率控制器100g。
在一些实例中,上述方法还可以应用于服务器中,例如,将卷积神经网络模型部署在用于进行云计算的FPGA云服务器中,即在FPGA云服务器中的FPGA硬件上部署用于进行卷积计算的计算资源池、卷积数据存储单元、用于进行池化计算的计算资源池、池化数据存储单元、时钟频率控制器(可以为上述图1b中所示的时钟频率控制器100g),其中,用于进行池化计算的计算资源池可以包括图7所对应实例中的计算资源单位1、计算资源单位2、...、计算资源单位n。当正在进行卷积处理或者正在进行池化处理时,时钟频率控制器根据目标池化层的期望池化耗时和平均池化耗时,计算目标池化层的加速比,进而确定目标时钟频率。利用用于进行卷积计算的计算资源池进行卷积处理完成后,将卷积得到的卷积数据存储于卷积数据存储单元中,并由时钟频率控制器在由FPAG所提供的多个时钟频率中为用于进行池化计算的计算资源池选择预先确定好的目标时钟频率,并将多余的时钟频率屏蔽,以降低系统的功耗,并由时钟频率控制器将当前时钟频率切换为目标时钟频率,再利用具有目标时钟频率的用于进行池化计算的计算资源池,对卷积数据存储单元中的卷积数据进行池化处理,并将池化处理后的池化数据存储在池化数据存储单元中。由于FPGA可以提供强大的计算能力,所以通过在FPGA云服务器中架构卷积神经网络模型,可以流畅地运行卷积神经网络模型。
本申请实例通过获取目标池化层的期望池化耗时和目标池化层的待处理数据 量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。由于可以根据池化层的期望池化耗时和池化层的待处理数据量而动态地调整计算资源单位的时钟频率,使得任意池化层的实际池化耗时和期望池化耗时相接近,也使得任意池化层的实际池化耗时和对应的卷积层的卷积预计耗时相匹配。因此,通过调整计算资源单位的时钟频率,可以满足任意层的实际池化耗时和卷积预计耗时相接近的要求,从而可以避免计算资源的空闲,进而提高计算资源的使用率。
进一步的,请参见图8,是本申请实例提供的一种计算资源调整装置的结构示意图。如图8所示,所述计算资源调整装置1可以包括:获取模块11、第一确定模块12、切换模块13;
获取模块11,用于获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
第一确定模块12,用于根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
切换模块13,用于当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
其中,获取模块11、第一确定模块12、切换模块13的具体功能实现方式可以参见上述图2对应实例中的步骤S101-步骤S103,这里不再进行赘述。
请一并参见图8,计算资源调整装置1可以包括:获取模块11、第一确定模块12、切换模块13;计算资源调整装置1还可以包括:第一预测模块14、第二预测模块15、第二确定模块16、第三确定模块17;
第一预测模块14,用于预测多个卷积层的卷积预计耗时,并根据所述多个卷积 层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值;
第二预测模块15,用于预测多个池化层的待处理数据量,根据所述多个池化层的待处理数据量确定基础池化数据量;
第二预测模块15,具体用于在所述多个池化层的待处理数据量中提取数值最小的待处理数据量,作为所述基础池化数据量;
第二确定模块16,用于根据所述池化期望值、所述基础池化数据量、具有平均时钟频率的所述计算资源单位的计算能力值,确定用于进行池化处理的计算资源单位的数量;所述平均时钟频率是系统提供的多个时钟频率的平均值。
第三确定模块17,用于当检测到所述目标池化层相关联的卷积层中所生成的卷积数据到达数据边界时,确定所述目标池化层相关联的卷积层完成卷积处理。
其中,第一预测模块14、第二预测模块15、第二确定模块16的具体功能实现方式可以参见上述图5对应实例中的步骤S401-步骤S403,第三确定模块17的具体功能实现方式可以参见上述图2对应实例中的步骤S103,这里不再进行赘述。
请一并参见图8,第一预测模块14可以包括:第一确定单元141,第二确定单元142;
第一确定单元141,用于根据所述多个卷积层的卷积预计耗时确定所述多个池化层的期望池化耗时;位于相同层中的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值小于所述时间阈值;
第二确定单元142,用于计算所述多个池化层的期望池化耗时的平均值,并将所述平均值确定为所述池化期望值。
第一确定单元141,第二确定单元142的具体功能实现方式可以参见上述图5对应实例中的步骤S401,这里不再进行赘述。
请参见图8,第一确定模块12可以包括:第三确定单元121,第四确定单元122,第五确定单元123;
第三确定单元121,用于计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
第四确定单元122,用于将所述目标池化层的期望池化耗时与所述目标池化层 的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
第五确定单元123,用于根据所述目标池化层的加速比、参照池化层的加速比、所述当前时钟频率,确定目标时钟频率;所述参照池化层的加速比是根据所述参照池化层的期望池化耗时和所述参照池化层的待处理数据量生成的;所述参照池化层是与所述目标池化层相邻且正在处理或者已处理的池化层。
其中,第三确定单元121,第四确定单元122,第五确定单元123的具体功能实现方式可以参见上述图3对应实例中的步骤S201-步骤S203,这里不再进行赘述。
进一步地,请参见图8,第五确定模块123可以包括:第一确定子单元1231,第二确定子单元1232,删除子单元1233;
第一确定子单元1231,用于当所述参照池化层的加速比和所述目标池化层的加速比相同时,将所述目标时钟频率设置为与所述当前时钟频率相同的时钟频率;
第二确定子单元1232,用于当所述参照池化层的加速比和所述目标池化层的加速比不同时,将所述参照池化层的加速比和所述目标池化层的加速比之间的比值,确定为加速系数,并将所述加速系数和所述当前时钟频率之间的乘积,确定为所述目标时钟频率;
第一确定子单元1231,第二确定子单元1232,删除子单元1233的具体功能实现方式可以参见上述图3对应实例中的步骤S203,这里不再进行赘述。
进一步地,请参见图8,第五确定模块123可以包括:第一确定子单元1231,第二确定子单元1232,删除子单元1233,还可以包括:删除子单元1233,
删除子单元1233,用于将所述系统提供的多个时钟频率中除所述目标时钟频率以外的时钟频率删除。
删除子单元1233的具体功能实现方式可以参见上述图5对应实例中的步骤S406,这里不再进行赘述。
请参见图8,第一确定模块12还可以包括:第六确定单元124,第七确定单元125,第八确定单元126;
第六确定单元124,用于计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
第七确定单元125,用于将所述目标池化层的期望池化耗时与所述目标池化层 的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
第八确定单元126,用于根据所述目标池化层的加速比和所述平均时钟频率,确定所述目标时钟频率;
第八确定单元126,具体用于将所述目标池化层的加速比的倒数和所述平均时钟频率的乘积,确定为所述目标时钟频率。
其中,第六确定单元124,第七确定单元125,第八确定单元126的具体功能实现方式可以参见上述图4对应实例中的步骤S301-步骤S303,这里不再进行赘述。
本申请实例通过获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。由于可以根据池化层的期望池化耗时和池化层的待处理数据量而动态地调整计算资源单位的时钟频率,使得任意池化层的实际池化耗时和期望池化耗时相接近,也使得任意池化层的实际池化耗时和对应的卷积层的卷积预计耗时相匹配。因此,通过调整计算资源单位的时钟频率,可以满足任意层的实际池化耗时和卷积预计耗时相接近的要求,从而可以避免计算资源的空闲,进而提高计算资源的使用率。
进一步地,请参见图9,是本申请实例提供的一种计算设备的结构示意图。这里,所述计算设备可以为终端设备或服务器。如图9所示,上述图8中的计算资源调整装置可以应用于所述计算设备1000,所述计算设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,所述计算设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。在一些实例中,网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。在一些实例中,存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图9所示,作为一种计算机存储介质的存储器1005中可以包括操作系 统、网络通信模块、用户接口模块以及设备控制应用程序。
在图9所示的计算设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
在一个实例中,所述处理器1001还执行以下步骤:
预测多个卷积层的卷积预计耗时,并根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值;
预测多个池化层的待处理数据量,根据所述多个池化层的待处理数据量确定基础池化数据量;
根据所述池化期望值、所述基础池化数据量、具有平均时钟频率的所述计算资源单位的计算能力值,确定用于进行池化处理的计算资源单位的数量;所述平均时钟频率是系统提供的多个时钟频率的平均值。
在一个实例中,所述处理器1001在执行所述根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值时,具体执行以下步骤:
根据所述多个卷积层的卷积预计耗时确定所述多个池化层的期望池化耗时;位于相同层中的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值小于所述时间阈值;
计算所述多个池化层的期望池化耗时的平均值,并将所述平均值确定为所述池化期望值。
在一个实例中,所述处理器1001在执行所述根据所述多个池化层的待处理数据量确定基础池化数据量时,具体执行以下步骤:
在所述多个池化层的待处理数据量中提取数值最小的待处理数据量,作为所述基础池化数据量。
在一个实例中,所述处理器1001在执行所述根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率时,具体执行以下步骤:
计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
根据所述目标池化层的加速比、参照池化层的加速比、所述当前时钟频率,确定目标时钟频率;所述参照池化层的加速比是根据所述参照池化层的期望池化耗时和所述参照池化层的待处理数据量生成的;所述参照池化层是与所述目标池化层相邻且正在处理或者已处理的池化层。
在一个实例中,所述处理器1001在执行所述根据所述目标池化层的加速比、参照池化层的加速比、所述当前时钟频率,确定目标时钟频率时,具体执行以下步骤:
当所述参照池化层的加速比和所述目标池化层的加速比相同时,将所述目标时钟频率设置为与所述当前时钟频率相同的时钟频率;
当所述参照池化层的加速比和所述目标池化层的加速比不同时,将所述参照池化层的加速比和所述目标池化层的加速比之间的比值,确定为加速系数,并将所述加速系数和所述当前时钟频率之间的乘积,确定为所述目标时钟频率。
在一个实例中,所述处理器1001还执行以下步骤:
将所述系统提供的多个时钟频率中除所述目标时钟频率以外的时钟频率删除。
在一个实例中,所述处理器1001在执行所述根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率时,具体执行以下步骤:
计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到所述总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
根据所述目标池化层的加速比和所述平均时钟频率,确定所述目标时钟频率。
在一个实例中,所述处理器1001在执行所述根据所述目标池化层的加速比和所述平均时钟频率,确定所述目标时钟频率时,具体执行以下步骤:
将所述目标池化层的加速比的倒数和所述平均时钟频率的乘积,确定为所述目标时钟频率。
在一个实例中,所述处理器1001还执行以下步骤:
当检测到所述目标池化层相关联的卷积层中所生成的卷积数据到达数据边界时,确定所述目标池化层相关联的卷积层完成卷积处理。
本申请实例通过获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。由于可以根据池化层的期望池化耗时和池化层的待处理数据量而动态地调整计算资源单位的时钟频率,使得任意池化层的实际池化耗时和期望池化耗时相接近,也使得任意池化层的实际池化耗时和对应的卷积层的卷积预计耗时相匹配。因此,通过调整计算资源单位的时钟频率,可以满足任意层的实际池化耗时和卷积预计耗时相接近的要求,从而可以避免计算资源的空闲,进而提高计算资源的使用率。
应当理解,本申请实例中所描述的终端设备1000可执行前文图2到图7所对应实例中对所述计算资源调整方法的描述,也可执行前文图8所对应实例中对所述计算资源调整装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实例还提供了一种计算机存储介质,且所述计算机存储介质中存储有前文提及的计算资源调整装置1所执行的计算机程序,且所述计算机程序包括程序指令,当所述处理器执行所述程序指令时,能够执行前文图2到图7所对应实例中对所述计算资源调整方法的描述,因此,这里将不再进行 赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实例中未披露的技术细节,请参照本申请方法实例的描述。
本领域普通技术人员可以理解实现上述实例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (17)

  1. 一种计算资源调整方法,由计算设备执行,包括:
    获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
    根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
    当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
  2. 根据权利要求1所述的方法,其中,还包括:
    预测多个卷积层的卷积预计耗时,并根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值;
    预测多个池化层的待处理数据量,根据所述多个池化层的待处理数据量确定基础池化数据量;
    根据所述池化期望值、所述基础池化数据量、具有平均时钟频率的所述计算资源单位的计算能力值,确定用于进行池化处理的计算资源单位的数量;所述平均时钟频率是系统提供的多个时钟频率的平均值。
  3. 根据权利要求2所述的方法,其中,所述根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值,包括:
    根据所述多个卷积层的卷积预计耗时确定所述多个池化层的期望池化耗时;位于相同层中的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值小于所述时间阈值;
    计算所述多个池化层的期望池化耗时的平均值,并将所述平均值确定为所述池 化期望值。
  4. 根据权利要求3所述的方法,其中,所述根据所述多个池化层的待处理数据量确定基础池化数据量,具体包括:
    在所述多个池化层的待处理数据量中提取数值最小的待处理数据量,作为所述基础池化数据量。
  5. 根据权利要求2所述的方法,其中,所述根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率,包括:
    计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
    将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
    根据所述目标池化层的加速比、参照池化层的加速比、所述当前时钟频率,确定目标时钟频率;所述参照池化层的加速比是根据所述参照池化层的期望池化耗时和所述参照池化层的待处理数据量生成的;所述参照池化层是与所述目标池化层相邻且正在处理或者已处理的池化层。
  6. 根据权利要求5所述的方法,其中,所述根据所述目标池化层的加速比、参照池化层的加速比、所述当前时钟频率,确定目标时钟频率,包括:
    当所述参照池化层的加速比和所述目标池化层的加速比相同时,将所述目标时钟频率设置为与所述当前时钟频率相同的时钟频率;
    当所述参照池化层的加速比和所述目标池化层的加速比不同时,将所述参照池化层的加速比和所述目标池化层的加速比之间的比值,确定为加速系数,并将所述加速系数和所述当前时钟频率之间的乘积,确定为所述目标时钟频率。
  7. 根据权利要求6所述的方法,其中,所述将所述加速系数和所述当前时钟频率之间的乘积,确定为所述目标时钟频率之后,还包括:
    将所述系统提供的多个时钟频率中除所述目标时钟频率以外的时钟频率删除。
  8. 根据权利要求2所述的方法,其中,所述根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率,包括:
    计算所有具有所述平均时钟频率的所述计算资源单位的计算能力值的总和,得到所述总计算能力值,并将所述目标池化层的待处理数据量与所述总计算能力值之间的比值,确定为所述目标池化层的平均池化耗时;
    将所述目标池化层的期望池化耗时与所述目标池化层的平均池化耗时之间的比值,确定为所述目标池化层的加速比;
    根据所述目标池化层的加速比和所述平均时钟频率,确定所述目标时钟频率。
  9. 根据权利要求8所述的方法,其中,所述根据所述目标池化层的加速比和所述平均时钟频率,确定所述目标时钟频率,包括:
    将所述目标池化层的加速比的倒数和所述平均时钟频率的乘积,确定为所述目标时钟频率。
  10. 根据权利要求1所述的方法,其中,还包括:
    当检测到所述目标池化层相关联的卷积层中所生成的卷积数据到达数据边界时,确定所述目标池化层相关联的卷积层完成卷积处理。
  11. 一种神经网络运算方法,由计算设备执行,包括:
    获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
    根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
    当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
  12. 一种计算资源调整装置,包括:
    获取模块,用于获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
    第一确定模块,用于根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
    切换模块,用于当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
  13. 根据权利要求11所述的装置,其中,还包括:
    第一预测模块,用于预测多个卷积层的卷积预计耗时,并根据所述多个卷积层的卷积预计耗时确定多个池化层的期望池化耗时,并根据所述多个池化层的期望池化耗时确定池化期望值;
    第二预测模块,用于预测多个池化层的待处理数据量,根据所述多个池化层的待处理数据量确定基础池化数据量;
    第二确定模块,用于根据所述池化期望值、所述基础池化数据量、具有平均时钟频率的所述计算资源单位的计算能力值,确定用于进行池化处理的计算资源单位的数量;所述平均时钟频率是系统提供的多个时钟频率的平均值。
  14. 根据权利要求13所述的装置,其中,所述第一预测模块,包括:
    第一确定单元,用于根据所述多个卷积层的卷积预计耗时确定所述多个池化层的期望池化耗时;位于相同层中的卷积层的卷积预计耗时和池化层的期望池化耗时之间的差值小于所述时间阈值;
    第二确定单元,用于计算所述多个池化层的期望池化耗时的平均值,并将所述平均值确定为所述池化期望值。
  15. 一种神经网络运算装置,包括:
    获取模块,用于获取目标池化层的期望池化耗时和目标池化层的待处理数据量,并获取用于进行池化处理的计算资源单位对应的当前时钟频率;所述目标池化层的期望池化耗时与所述目标池化层相关联的卷积层的卷积预计耗时之间的差值小于时间阈值;
    第一确定模块,用于根据所述目标池化层的期望池化耗时和所述目标池化层的待处理数据量,确定目标时钟频率;
    切换模块,用于当所述目标池化层相关联的卷积层完成卷积处理,且所述当前时钟频率和所述目标时钟频率不同时,将所述当前时钟频率切换为所述目标时钟频率,并基于具有所述目标时钟频率的所述计算资源单位,在所述目标池化层进行池化处理。
  16. 一种计算设备,包括:处理器和存储器;
    所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行如权利要求1-11任一项所述的方法。
  17. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如权利要求1-11任一项所述的方法。
PCT/CN2018/118502 2018-01-25 2018-11-30 神经网络运算方法、装置以及相关设备 WO2019144701A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18902036.5A EP3745308B1 (en) 2018-01-25 2018-11-30 Neural network computing method and apparatus, and related device
US16/885,669 US11507812B2 (en) 2018-01-25 2020-05-28 Neural network operational method and apparatus, and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810072678.9 2018-01-25
CN201810072678.9A CN110083448B (zh) 2018-01-25 2018-01-25 一种计算资源调整方法、装置以及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/885,669 Continuation US11507812B2 (en) 2018-01-25 2020-05-28 Neural network operational method and apparatus, and related device

Publications (1)

Publication Number Publication Date
WO2019144701A1 true WO2019144701A1 (zh) 2019-08-01

Family

ID=67395145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118502 WO2019144701A1 (zh) 2018-01-25 2018-11-30 神经网络运算方法、装置以及相关设备

Country Status (4)

Country Link
US (1) US11507812B2 (zh)
EP (1) EP3745308B1 (zh)
CN (1) CN110083448B (zh)
WO (1) WO2019144701A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105813208A (zh) * 2015-12-01 2016-07-27 长江大学 一种多用户正交频分复用系统动态资源分配的方法及系统
CN105981055A (zh) * 2014-03-03 2016-09-28 高通股份有限公司 神经网络对当前计算资源的自适应
CN106779060A (zh) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 一种适于硬件设计实现的深度卷积神经网络的计算方法
US20170330586A1 (en) * 2016-05-10 2017-11-16 Google Inc. Frequency based audio analysis using neural networks

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157309B2 (en) * 2016-01-14 2018-12-18 Nvidia Corporation Online detection and classification of dynamic gestures with recurrent convolutional neural networks
CN110023963B (zh) * 2016-10-26 2023-05-30 渊慧科技有限公司 使用神经网络处理文本序列
CN106682574A (zh) * 2016-11-18 2017-05-17 哈尔滨工程大学 一维深度卷积网络的水下多目标识别方法
US10732694B2 (en) * 2017-09-22 2020-08-04 Qualcomm Incorporated Power state control of a mobile device
WO2020194594A1 (ja) * 2019-03-27 2020-10-01 Tdk株式会社 ニューラルネットワーク演算処理装置及びニューラルネットワーク演算処理方法
CN112200300B (zh) * 2020-09-15 2024-03-01 星宸科技股份有限公司 卷积神经网络运算方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105981055A (zh) * 2014-03-03 2016-09-28 高通股份有限公司 神经网络对当前计算资源的自适应
CN105813208A (zh) * 2015-12-01 2016-07-27 长江大学 一种多用户正交频分复用系统动态资源分配的方法及系统
US20170330586A1 (en) * 2016-05-10 2017-11-16 Google Inc. Frequency based audio analysis using neural networks
CN106779060A (zh) * 2017-02-09 2017-05-31 武汉魅瞳科技有限公司 一种适于硬件设计实现的深度卷积神经网络的计算方法

Also Published As

Publication number Publication date
US20200293869A1 (en) 2020-09-17
EP3745308A4 (en) 2021-04-07
CN110083448B (zh) 2023-08-18
EP3745308B1 (en) 2024-01-17
CN110083448A (zh) 2019-08-02
US11507812B2 (en) 2022-11-22
EP3745308A1 (en) 2020-12-02

Similar Documents

Publication Publication Date Title
CN113950066B (zh) 移动边缘环境下单服务器部分计算卸载方法、系统、设备
Niu et al. Workload allocation mechanism for minimum service delay in edge computing-based power Internet of Things
Lin et al. A reinforcement learning-based power management framework for green computing data centers
KR101201904B1 (ko) 클라우드 컴퓨팅에서의 리소스 분배 장치 및 그 방법
CN112905326B (zh) 任务处理方法及装置
Rui et al. Computation offloading in a mobile edge communication network: A joint transmission delay and energy consumption dynamic awareness mechanism
CN111065114B (zh) 节能管理方法、装置及存储介质
Gao et al. Deep neural network task partitioning and offloading for mobile edge computing
WO2022012119A1 (zh) 数据处理方法、装置、电子设备及存储介质
CN113645637A (zh) 超密集网络任务卸载方法、装置、计算机设备和存储介质
GB2599348A (en) Method and system for autoscaling containers in a cloud-native core network
CN114356544A (zh) 面向边缘集群的并行计算方法和系统
Chen et al. Maximization of value of service for mobile collaborative computing through situation-aware task offloading
CN115373861A (zh) Gpu资源调度方法、装置、电子设备及存储介质
Jiao et al. Computation offloading for multi-user mobile edge computing
CN113360266B (zh) 任务处理方法和装置
Jin et al. A virtual machine scheduling strategy with a speed switch and a multi-sleep mode in cloud data centers
WO2019144701A1 (zh) 神经网络运算方法、装置以及相关设备
Kim et al. Partition placement and resource allocation for multiple DNN-based applications in heterogeneous IoT environments
Murti et al. Learning-based orchestration for dynamic functional split and resource allocation in vRANs
CN112905315A (zh) 移动边缘计算mec网络中的任务处理方法、装置及设备
CN117332897A (zh) 人工智能驱动的新能源小时间尺度功率插值集成预测方法
Wang et al. Edge computing for artificial intelligence
CN114281544A (zh) 一种基于边缘计算的电力任务执行方法及装置
CN114706433A (zh) 设备控制方法、装置以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18902036

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018902036

Country of ref document: EP

Effective date: 20200825