CN109937410B - Core scheduling method and terminal - Google Patents

Core scheduling method and terminal Download PDF

Info

Publication number
CN109937410B
CN109937410B CN201780064697.0A CN201780064697A CN109937410B CN 109937410 B CN109937410 B CN 109937410B CN 201780064697 A CN201780064697 A CN 201780064697A CN 109937410 B CN109937410 B CN 109937410B
Authority
CN
China
Prior art keywords
core
cores
neural network
convolutional neural
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780064697.0A
Other languages
Chinese (zh)
Other versions
CN109937410A (en
Inventor
曹海恒
谭利文
杜明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN109937410A publication Critical patent/CN109937410A/en
Application granted granted Critical
Publication of CN109937410B publication Critical patent/CN109937410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers

Abstract

The embodiment of the invention provides a core scheduling method and related equipment, wherein the method comprises the following steps: obtaining target model parameters, wherein the target model parameters are used for representing the calculation density of a convolution neural network model; determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameter, wherein the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to operate the convolutional neural network model; and determining a core running the convolutional neural network model from the at least two cores according to the core weight values of the at least two cores. With the core weight values of the different cores, an adapted core can be determined to run a convolutional neural network model with a specific computational density.

Description

Core scheduling method and terminal
Technical Field
The embodiment of the invention relates to the technical field of chip systems, in particular to a core scheduling method and a terminal.
Background
A Convolutional Neural Network (CNN) is a feed-forward neural network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. The convolutional neural network has more and more applications on terminals, such as image classification, feature extraction, face clustering and the like by using the convolutional neural network.
In order to improve the computing power of the terminal, the system chip on the terminal often includes a plurality of heterogeneous cores to perform different services using different cores. In the existing scheme, no effective core scheduling mechanism exists for service processing performed by operating a convolutional neural network, and an acquired convolutional neural network model is often operated on a core after the core is simply determined from a chip system on a terminal so as to perform service processing.
Because different convolutional neural network models often have different characteristics and different cores also have different characteristics, in the existing scheme, the characteristics of different cores are not utilized to execute a specific convolutional neural network model on an adaptive core, so that the efficiency is low when the specific convolutional neural network model is operated, and the computing resources on a terminal are wasted.
Disclosure of Invention
The embodiment of the invention provides a core scheduling method and a terminal, which are used for providing an adaptive core for a convolutional neural network model.
A first aspect of an embodiment of the present invention provides a core scheduling method, including: target model parameters are obtained, wherein the target model parameters are used for representing the calculation density of a convolution neural network model. Convolutional neural network models of different computational densities are adapted to run on different cores. Then, according to the target model parameters, determining core weight values of at least two cores from a preset first corresponding relation, wherein the core weight values of the at least two cores correspond to the target model parameters, the at least two cores are heterogeneous cores on the terminal, and the hardware characteristics of the heterogeneous cores on the terminal are different, so that the different cores are suitable for running convolutional neural network models with different calculation densities. Wherein the first correspondence comprises a correspondence of the target model parameter and core weight values of the at least two cores, the core weight values being used to represent a degree of preference for the cores to be selected to run the convolutional neural network model, such that the cores suitable for running the convolutional neural network model can be determined by the core weight values, and the cores running the convolutional neural network model are determined from the at least two cores according to the core weight values of the at least two cores. When the core of the convolutional neural network model is determined to be operated, the core weight value can be directly used, namely the core with the maximum core weight value is directly determined to operate the convolutional neural network model; the core weight value may also be indirectly used, for example, after the core weight value is modified by using other parameters, a modified weight value is obtained, and then the modified weight value is used to determine the core for running the convolutional neural network model.
Therefore, the adaptive core can be determined to run the convolutional neural network model with specific calculation density through the core weight values of different cores, and if the larger core weight value is, the higher the efficiency of running the convolutional neural network model is, the core determined according to the core weight value can run the convolutional neural network model efficiently.
With reference to the first aspect of the embodiment of the present invention, in a first implementation manner of the first aspect of the embodiment of the present invention, the method of this implementation manner further includes: and acquiring the current state parameter of the terminal, wherein the state parameter is a dynamically changing parameter. Therefore, the obtained current state parameter can reflect the operation environment of the core on the terminal, and different operation environments can also affect different core operation convolutional neural network models. For this purpose, according to the state parameters, determining parameter weight values of at least two cores from a preset second corresponding relation, wherein the parameter weight values of the at least two cores correspond to the state parameters, the second corresponding relation comprises the corresponding relation between the state parameters and the parameter weight values of the at least two cores, and the parameter weight values are used for indicating the priority of the cores selected to run the convolutional neural network model under the state parameters. Thus, the parameter weight value reflects the influence of the dynamic environment factors on the terminal on the operation of the convolutional neural network mode of the core.
Determining a kernel from the at least two kernels from which to run the convolutional neural network model based on the kernel weight values of the at least two kernels, comprising: for each kernel, the kernel weight value is modified using the parameter weight value to obtain a first modified weight value, which is used to indicate a priority level of the kernel selected to run the convolutional neural network model. Thus, the first correction weight value has an influence of the core weight value and also has an influence of the parameter weight value. At this time, a kernel running the convolutional neural network model is determined from the at least two kernels according to the first modified weight values of the at least two kernels. A kernel that is better suited to run the convolutional neural network model will be determined.
Therefore, the obtained current state parameter of the terminal can reflect the current dynamic operation environment factor of the terminal, the parameter weight value of each core can be determined according to the state parameter and the second corresponding relation, the parameter weight value reflects the influence of the state parameter on the terminal on the operation of the convolutional neural network model by the core, and the core with the larger core weight value is more suitable for operating the convolutional neural network model under the state parameter, so that the higher priority is given to scheduling for operating the convolutional neural network model. The first correction weight value obtained by correcting the core weight value by using the state parameter is referred to by more factors, and the suitability degree of the core running convolutional neural network model can be reflected better, so that the core determined according to the first correction weight value has a better running effect on running the convolutional neural network model.
With reference to the first implementation manner of the first aspect of the embodiment of the present invention, in a second implementation manner of the first aspect of the embodiment of the present invention, the current state parameter of the terminal is a current core usage rate of each core. Thus, according to the state parameters, determining the parameter weight values of the at least two cores from the preset second corresponding relationship includes: and for each core, determining a performance weight value from a preset second corresponding relation according to the core utilization rate, wherein the performance weight value of each core corresponds to the core utilization rate of each core. The second corresponding relation comprises the corresponding relation between the core utilization rate of each core and the performance weight value of each core. The determined core weight value reflects the influence degree of the current core utilization rate of the core on the operation of the convolutional neural network.
In this way, the acquired state parameter is the core utilization rate, so that the core utilization rate of the core is used as one of the reference factors of the scheduling core according to the performance weight value determined by the core utilization rate and the second corresponding relation, and the influence of the core utilization rate on the operation of the convolutional neural network model can be considered by using the first correction weight value obtained by correcting the core weight value by using the performance weight value.
With reference to the second implementation manner of the first aspect of the embodiment of the present invention, in a third implementation manner of the first aspect of the embodiment of the present invention, the method of this implementation manner further includes: and acquiring the current residual electric quantity value of the terminal. And then, according to the residual electric quantity value, determining power consumption weight values of at least two cores from a preset third corresponding relation, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value. And the third corresponding relation comprises a corresponding relation between the residual electric quantity value and power consumption weight values of at least two cores, and the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value. The determined power consumption weight value can reflect the influence degree of the residual electric quantity of the terminal on the operation of the convolutional neural network model.
Thus, determining a kernel from the at least two kernels running the convolutional neural network model based on the first modified weight values of the at least two kernels comprises: for each core, the first correction weight value is corrected using the power consumption weight value to obtain a second correction weight value, which is used to represent the priority of the core selected to run the convolutional neural network model. Then, a kernel running the convolutional neural network model is determined from the at least two kernels according to the second modified weight values of the at least two kernels.
Therefore, the power consumption weight value of each core is determined according to the current residual electric quantity value of the terminal and the third corresponding relation, the influence of the residual electric quantity of the terminal on the operation of the convolutional neural network models of different cores is further considered by using the second correction weight value obtained by correcting the first correction weight value by using the power consumption weight value, and therefore the cores determined according to the second correction weight value have a better effect on the operation of the convolutional neural network models.
With reference to the first aspect of the embodiment of the present invention, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the method of this implementation manner further includes: and acquiring the current core utilization rate of each core, and then determining a performance parameter from the second corresponding relation according to the core utilization rate for each core, wherein the performance parameter of each core corresponds to the core utilization rate of each core. The second corresponding relation comprises the corresponding relation between the performance parameter of each core and the core utilization rate of each core. The core has different operation requirements under different core utilization rates, and the operation modes of the cores with different core utilization rates can be controlled through different performance parameters through presetting of the second corresponding relation.
Thus, after determining the kernel running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels, the method of this implementation further includes: and operating the convolutional neural network model on the target core by using the performance parameters of the target core, wherein the target core is the core for operating the convolutional neural network model.
Therefore, the specific operation mode of the core can be controlled through specific performance parameters, so that the core for operating the convolutional neural network model can operate in a mode set by a user, and the control requirement of the user on the core is met.
With reference to the fourth implementation manner of the first aspect of the embodiment of the present invention, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the performance parameter includes one or more of thread priority information, sleep time information, and thread number information. The thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model; the sleep time information is the time interval between the two convolutional neural network models operated by the core; the thread number information is used when the core runs the convolutional neural network model.
In this way, control of the thread, run time, sub-thread, etc. running the core of the convolutional neural network model can be achieved.
With reference to the first implementation manner of the first aspect of the present embodiment, in a sixth implementation manner of the first aspect of the present embodiment, the determining, according to the state parameters and from a preset second corresponding relationship, the parameter weight values of at least two cores includes: and determining power consumption weight values of at least two cores from a preset second corresponding relation according to the residual electric quantity value, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value, and the second corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores. The determined power consumption weight value reflects the influence degree of the current residual capacity value of the terminal on the operation of the convolutional neural network models with different cores.
In this way, the acquired state parameter is a residual electric quantity value of the terminal, so that the residual electric quantity of the terminal is used as one of the consideration factors of the scheduling core according to the power consumption weight value determined by the residual electric quantity value and the second corresponding relation, and the influence of the residual electric quantity of the terminal on the operation of the convolutional neural network model can be considered by using the first correction weight value obtained by correcting the core weight value by using the power consumption weight value.
With reference to the first aspect of the embodiment of the present invention and any one of the first to sixth implementation manners of the first aspect of the embodiment of the present invention, in a seventh implementation manner of the first aspect of the embodiment of the present invention, the target model parameter is the number of weight parameters of the convolutional neural network model. The number of the weight parameters can accurately reflect the calculation density of the convolutional neural network model.
With reference to the first aspect of the embodiment of the present invention and any one of the first to seventh implementation manners of the first aspect of the embodiment of the present invention, in an eighth implementation manner of the first aspect of the embodiment of the present invention, the at least two cores include at least two of a CPU, a GPU, a DSP, and a systolic array processor. The systolic array processor may for example comprise a neural network processor NPU or a tensor processor TPU or the like. The calculation cores have different characteristics, the execution of the same convolutional neural network model can have different execution efficiencies, and the cores with excellent operation can be effectively determined by using the core scheduling method of the embodiment of the invention for the cores.
With reference to the first aspect of the embodiment of the present invention and any one of the first to eighth implementation manners of the first aspect of the embodiment of the present invention, in a ninth implementation manner of the first aspect of the embodiment of the present invention, determining core weight values of at least two cores from a preset first corresponding relationship according to a target model parameter includes: determining a target model parameter interval where a target model parameter is located in a preset first corresponding relation, and then determining core weight value intervals of at least two cores in the first corresponding relation, wherein the core weight value intervals of the at least two cores correspond to the target model parameter interval, the first corresponding relation comprises a corresponding relation between the target model parameter interval and the core weight value intervals of the at least two cores, and the target model parameter interval comprises the target model parameter. And for each core, determining a core weight value from the core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
Therefore, the target model parameter interval and the core weight value interval in the first corresponding relation are in a numerical range, more specific parameters can be covered, the corresponding relation between the core weight values of the at least two cores and the target model parameter can be more accurately reflected by the position mapping mode, the core weight values of different cores are easier to distinguish, and therefore the core weight values can more reflect the selected priority degree of the cores.
In a second aspect, an embodiment of the present invention provides a core scheduling method, including:
obtaining target model parameters, wherein the target model parameters are used for representing the calculation density of a convolution neural network model;
according to the target model parameter, determining core weight values of at least two cores from a preset first corresponding relation, wherein the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to operate the convolutional neural network model.
And distributing the convolutional neural network model to different cores to run according to the core weight values of the at least two cores.
In a third aspect, an embodiment of the present invention provides a core scheduling method, including:
acquiring a task type parameter, wherein the task type parameter is used for representing the type of a calculation task;
determining core weight values of at least two cores from a preset fourth corresponding relation according to the task type parameter, wherein the core weight values of the at least two cores correspond to the task type parameter, the at least two cores are heterogeneous cores on a terminal, the fourth corresponding relation comprises the corresponding relation between the task type parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to run the computing task;
determining a core from the at least two cores to run the computing task according to the core weight values of the at least two cores.
In a fourth aspect, an embodiment of the present invention provides a terminal, where the terminal has a function of a host in the foregoing method. The functions may be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible implementation, the terminal includes:
the device comprises an acquisition unit, a calculation unit and a processing unit, wherein the acquisition unit is used for acquiring target model parameters, and the target model parameters are used for representing the calculation density of a convolution neural network model;
a weight value determining unit, configured to determine, according to the target model parameter, core weight values of at least two cores from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight values are used to indicate a priority degree of a core selected to run the convolutional neural network model;
a kernel determining unit, configured to determine, from the at least two kernels, a kernel for running the convolutional neural network model according to the kernel weight values of the at least two kernels.
In another possible implementation manner, the terminal includes: including a processor and memory. The processor may be configured to enable the terminal to perform corresponding functions in the method of the first aspect. For example, the processor is configured to: obtaining target model parameters, wherein the target model parameters are used for representing the calculation density of a convolution neural network model; determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameter, wherein the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to operate the convolutional neural network model; determining a kernel from the at least two kernels running the convolutional neural network model according to the kernel weight values of the at least two kernels.
In a fifth aspect, the present invention provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
In a sixth aspect, embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.
In a seventh aspect, a chip apparatus is provided, which includes a processing unit configured to perform the method of the first aspect.
In an eighth aspect, a chip apparatus is provided that includes a processor and a memory. The memory includes instructions that the processor executes to perform the method of the above aspects.
In a ninth aspect, a chip system is provided, which comprises a processor for enabling a terminal to implement the functions referred to in the first to third aspects, such as sending or processing data and/or information referred to in the method. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the network device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
In the technical solution provided in the embodiment of the present invention, a target model parameter is obtained, where the target model parameter is used to represent a computation density of a convolutional neural network model, and then, according to the target model parameter, core weight values of at least two cores are determined from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, where the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight value is used to represent a priority of a core selected to run the convolutional neural network model. The kernel running the convolutional neural network model is thus determined from the at least two kernels according to the kernel weight values of the at least two kernels.
Heterogeneous cores on the terminal have different characteristics, and the different cores are suitable for operating convolutional neural network models with different calculation densities. If a first corresponding relationship is preset, where the first corresponding relationship includes a corresponding relationship between a target model parameter and core weight values of at least two cores, where the target model parameter is used to represent a calculation density of a convolutional neural network model, and the at least two cores are heterogeneous cores on a terminal, after the target model parameter of the convolutional neural network model is obtained, the core weight values of the at least two cores may be determined from the preset first corresponding relationship according to the target model parameter. The core weight values are used to represent the degree of priority that the core is selected to run the convolutional neural network model, by which the core suitable for running the convolutional neural network model can be determined. In this way, the determination of the kernel for running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels can be achieved. The adaptive core can be determined to run the convolutional neural network model with specific calculation density through the core weight values of different cores, and if the larger core weight value is, the higher the efficiency of running the convolutional neural network model is, the core determined according to the core weight value can run the convolutional neural network model efficiently.
Drawings
FIG. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network according to another embodiment of the present invention;
fig. 3 is a usage scenario diagram related to a core scheduling method according to another embodiment of the present invention;
fig. 4 is a flowchart of a method for scheduling a core according to another embodiment of the present invention;
fig. 5 is a flowchart of a method for scheduling a core according to another embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of a terminal according to another embodiment of the present invention;
fig. 7 is a schematic structural diagram of a terminal according to another embodiment of the present invention.
Detailed Description
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding various embodiments of the present invention, some technical terms are described below, and various embodiments below may refer to the description of the technical terms below.
1. Convolutional neural network
The convolutional neural network is a feedforward neural network, and the artificial neurons of the convolutional neural network can respond to a part of surrounding units in a coverage range and have excellent performance on large-scale image processing.
The convolutional neural network consists of one or more convolutional layers and an apical fully-connected layer (corresponding to a classical neural network), while also including associated weights and pooling layers. This structure enables the convolutional neural network to utilize a two-dimensional structure of the input data. Convolutional neural networks can give superior results in terms of image and speech recognition compared to other deep learning structures. This model can also be trained using a back propagation algorithm. Compared with other deep and feedforward neural networks, the convolutional neural network needs fewer estimated parameters, so that the convolutional neural network becomes an attractive deep learning structure.
The convolutional neural network is different from the general neural network in that the convolutional neural network includes a feature extractor composed of convolutional layers and sub-sampling layers. In the convolutional layer of the convolutional neural network, one neuron is connected to only part of the neighbor neurons. In a convolutional layer of CNN, there are usually several feature planes (featuremaps), each of which is composed of some neurons arranged in a rectangle, and the neurons of the same feature plane share a weight, where the shared weight is a convolution kernel. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel learns to obtain a reasonable weight in the training process of the network. Sharing weights (convolution kernels) brings the immediate benefit of reducing the connections between layers of the network, while reducing the risk of over-fitting. Subsampling, also called pooling (posing), usually takes the form of both mean (mean) and maximum (max) subsampling. Sub-sampling can be viewed as a special convolution process. Convolution and sub-sampling greatly simplify the complexity of the model and reduce the parameters of the model.
2. Weight parameters for convolutional neural networks
Neural networks, including convolutional neural networks, are composed of a stack of layers, each layer being composed of nodes. The operation is carried out in the node, the operation mode of the node is approximately similar to that of a neuron of a human being, and when enough stimulation information is met, signals are activated and released. Nodes combine input data with a set of coefficients (or weighting parameters) to specify their importance in the algorithm learning task by amplifying or suppressing the input. As shown in FIG. 1, bit 1, and X are input1To XmFor inputting data, W0And Wm is a weight parameter. The sum of the products of the input data and the weight parameter enters an activation function of the node, and whether the signal is continuously transmitted in the network or not and the transmission distance are judged, so that the final result of the network is determined by how the signal is influenced.
The elements of the convolutional neural network may be as shown in FIG. 2, where h (X) is the output data, X1To XmTo input data.
When a plurality of cells as shown in fig. 2 are combined and have a hierarchical structure, a neural network model is formed.
3. Application of convolutional neural network
In recent years, convolutional neural networks have been applied in many directions, for example, in the fields of speech recognition, face recognition, general object recognition, motion analysis, natural language processing, and the like.
At present, a convolutional neural network is applied more and more on a mobile terminal, particularly, the application form of the intelligent photo album comprises image classification, feature extraction, face clustering and the like, and the calculation characteristics of the applications comprise a large number of matrix operations.
4. Convolutional neural network model
The convolutional neural network model is a specific network model (or an algorithm example) obtained after the convolutional neural network is trained, the convolutional neural network model has the characteristics of the convolutional neural network, and the convolutional neural network model has specific calculation density and can be used for executing specific application services.
5. System-on-chip
The device is provided with various cores (also referred to as processors or computing units) that constitute a system chip. The core of the embodiments of the present invention is primarily related to heterogeneous cores, and these types of cores include, but are not limited to, the following:
1) a Central Processing Unit (CPU) is an ultra-large scale integrated circuit, and is an operation Core (Core) and a Control Core (Control Unit) of a computer. Its functions are mainly to interpret computer instructions and to process data in computer software.
2) A Graphics Processing Unit (GPU), also called a display core, a visual processor, and a display chip, is a microprocessor dedicated to image operation on a personal computer, a workstation, a game machine, and some mobile devices (e.g., a tablet pc, a smart phone, etc.).
3) Digital Signal Processor (DSP), which refers to a chip capable of implementing Digital Signal processing techniques. The DSP chip adopts a Harvard structure with separated programs and data, is provided with a special hardware multiplier, widely adopts pipeline operation, provides special DSP instructions, and can be used for quickly realizing various digital signal processing algorithms.
4) Systolic array processor
A systolic array processor is an application specific chip (ASIC) that employs a systolic array (systolic array) architecture in which data "flows" in a predetermined "pipelined" fashion rhythmically between the processing elements of the array. In the process of data flow, all the processing units process the data flowing through it simultaneously in parallel, so that it can achieve high parallel processing speed.
The systolic array processor may be a Neural-Network Processing Unit (NPU), a Tensor Processing Unit (TPU), an Intelligent Processing Unit (IPU), or the like.
4.1) Neural-Network Processing Unit (NPU), the NPU simulates human neurons and synapses at the circuit layer, and directly processes large-scale neurons and synapses with a deep learning instruction set, one instruction completing the Processing of a group of neurons. Compared with a von Neumann structure with separated storage and calculation adopted in a CPU, the NPU realizes the integration of storage and calculation through synaptic weights, thereby greatly improving the operation efficiency.
4.2) Tensor Processor (TPU) artificial intelligence aims at endowing the robot with human intelligence, and machine learning is a powerful method for realizing artificial intelligence. Machine learning is the subject of studying how computers learn automatically. The TPU is such a chip dedicated to machine learning, and it can be an Artificial Intelligence (AI) accelerator for a tensflow platform, which is essentially a systolic array structure accelerator. Its internal instruction set may also run when the Tensorflow program changes or updates the algorithm. The TPU can provide high throughput, low precision calculations for forward operations of the model rather than model training, and is more energy efficient (TOPS/w). The TPU may also be referred to as an Intelligent Processing Unit (IPU).
Fig. 3 is a usage scenario diagram related to the core scheduling method according to the embodiment of the present invention. In the usage scenario, a System On Chip (SoC) is provided On the terminal, and the SoC includes at least two cores, and the at least two cores are heterogeneous cores. The at least two cores may include a CPU, GPU, DSP, systolic array processor, etc. Systolic array processors include, but are not limited to, neural network processors, tensor processors, and the like. These chips may be referred to as cores for performing computations on the terminal. Wherein different cores have different energy efficiency ratios.
The terminal can use a specific algorithm to execute different application services, the method of the embodiment of the invention relates to operation of a convolutional neural network model, and the terminal can use the convolutional neural network model to execute different application services.
When executing an unused application service, the terminal may meet different requirements, for example, a real-time scene application (such as camera preview) requires real-time image recognition, and the requirement on performance is high; the image library classifies the imported images and carries out the classification in the background, and at the moment, the requirement on the real-time performance of the operation is lower, and the requirement on reducing the power consumption is more biased.
Therefore, when the terminal runs a specific convolutional neural network model, efficient core scheduling needs to be performed according to computational requirements (e.g., performance, power consumption, etc.), and the scheduling core runs the convolutional neural network model to execute a specific service. This will be beneficial for the execution of the application service on the terminal, e.g. resulting in a more efficient or energy efficient execution, etc.
Therefore, the embodiment of the invention provides a core scheduling method, which is used for providing an adaptive core for a convolutional neural network model so as to efficiently operate the convolutional neural network model.
Fig. 4 is a flowchart of a method for scheduling a core according to an embodiment of the present invention. Referring to the above and to fig. 4, a method of an embodiment of the invention comprises:
step 401: and acquiring target model parameters.
Wherein the target model parameters are used to represent the computational density of a convolutional neural network model.
And the terminal acquires target model parameters, and the calculation density of the specific convolutional neural network model can be determined through the target model parameters. Different cores are suitable for running convolutional neural network models with different calculation densities, so that the cores can be selected according to the target model parameters to run the convolutional neural network models with the target model parameters.
It is understood that the specific form of the target model parameter is various, for example, the target model parameter is the weight parameter number of the convolutional neural network model. The target model parameter may also be a number of layers (layers) of the convolutional neural network model, which may represent a depth of the convolutional neural network model. The target model parameters may also be other parameters, which may reflect the computation density of the convolutional neural network model, which may also be referred to as the complexity of the convolutional neural network model.
Reference is made to the above section describing terms regarding the convolutional neural network model and the computational density of the convolutional neural network model.
There are various specific implementation manners of step 401, and several examples thereof are given as follows:
example one: and the terminal acquires a convolutional neural network model, and obtains target model parameters of the convolutional neural network model by analyzing the convolutional neural network model.
Example two: the analysis equipment obtains the convolutional neural network model, analyzes the convolutional neural network model to obtain target model parameters of the convolutional neural network model, and then sends the target model parameters to the terminal so that the terminal can obtain the target model parameters.
Step 402: and determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameters.
The core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to run the convolutional neural network model.
The terminal is preset with a first corresponding relation, and the first corresponding relation comprises a corresponding relation between target model parameters and core weight values of at least two cores. Thus, after the terminal obtains the target model parameter, the core weight values of the at least two cores can be determined from the first corresponding relationship according to the target model parameter, the determined core weight values of the at least two cores correspond to the target model parameter, and the corresponding relationship between the core weight values of the at least two cores and the target model parameter is the corresponding relationship between the target model parameter and the core weight values of the at least two cores included in the first corresponding relationship.
The core weight values are used to indicate a degree of priority with which the cores are selected to run the convolutional neural network model, so that the terminal can use the core weight values to implement scheduling of the appropriate cores to run the convolutional neural network model. For example, if a first kernel runs a convolutional neural network model of a specific computational density more efficiently than a second kernel, in other words, the first kernel is better suited for running the convolutional neural network model than the second kernel, the kernel weight value of the first kernel is set higher than the kernel weight value of the second kernel to indicate that the first kernel is selected with a higher priority than the second kernel in order to run the convolutional neural network model. Thus, the first correspondence may be established in advance using the target model parameter and the core weight values of the at least two cores. When the terminal obtains the specific target model parameter, the core weight value of the core can be determined from the first corresponding relation according to the target model parameter.
In order to implement the operation of the convolutional neural network model by scheduling a suitable core using the core weight value, in some embodiments, the core weight value needs to be set according to the hardware characteristic or the architecture characteristic of the core or the suitability of the calculation mode for operating the specific convolutional neural network model. In this case, the core weight value is specifically used to indicate the adaptation degree of the hardware characteristic of the core to the running convolutional neural network model, the adaptation degree of the architecture characteristic of the core to the running convolutional neural network model, or the adaptation degree of the calculation mode of the core to the running convolutional neural network model.
At least two cores in the first corresponding relation are heterogeneous cores on the terminal, and the core characteristics are different, so that different operation effects can be generated when the convolutional neural network model is operated, and the setting of the core weight value is feasible.
As to which core is suitable for running which convolution neural network model of computational density, it can be obtained from a test performed in advance, for example, setting a test efficiency parameter that indicates the time taken for running the convolution neural network model to perform a specific service. Then, the convolutional neural network model with specific computational density is run on different cores to obtain the test efficiency parameters of the different cores. And then configuring a larger core weight value for the core with a large test efficiency parameter.
The core weight value may specifically represent how well the hardware characteristics of the core are to run a convolutional neural network model. At least two cores in step 402 are heterogeneous cores on the terminal, and different heterogeneous cores have different hardware characteristics, so that different heterogeneous cores are suitable for running convolutional neural network models with different computation densities. It is more practical to schedule heterogeneous cores to run convolutional neural network models.
Alternatively, the at least two cores may be at least two of a CPU, GPU, DSP, systolic array processor. For example, the heterogeneous cores provided on the terminal may be any two, or any three, or any four, or all of CPUs, GPUs, DSPs, and systolic array processors. It is to be understood that the heterogeneous core provided on the terminal may also be other cores. The systolic array processor may be a specific chip such as an NPU or TPU.
It is understood that the core weight values of the at least two cores determined in step 402 have various specific implementation forms, for example, as follows:
in one example, the core weight value is in the form of a numerical value, which may specifically be: the form of a percentage, e.g., 10%, 30%, etc.; the form of the score, for example,
Figure BDA0002032694000000101
or
Figure BDA0002032694000000102
Etc.; fractional forms, e.g., 0.5, 1.0, 1.5, etc.;
in another example, the core weight value is in the form of a representation of a level, e.g., a first priority, a fifth priority, etc.
It is to be understood that in the first correspondence, there are multiple expressions of the target model parameter and the core weight value, and in some examples, the target model parameter or the core weight value may be a specific value, for example, the target model parameter is 1000, 2000, etc., and the core weight value may be 0.5, 0.2, etc. In other examples, the target model parameter or the core weight value may also be a value range, i.e. a value range, e.g. the target model parameter is the range [10000, 15000], [15000, 20000], and the core weight value is the range [0.1, 0.6], [0.6,0.8], and the like. The embodiment of the present invention does not limit the specific expression form of the target model parameter and the core weight value in the first corresponding relationship.
There are various specific implementations of step 402, two examples of which are given below:
example one:
the target model parameters and the core weight values included in the first corresponding relationship are specific numerical values, and at this time, the specific implementation manner of step 402 includes: and matching the target model parameters in the step 401 with the target model parameters in the first corresponding relationship, and if the matching is the same, determining the core weight values of at least two cores corresponding to the target model parameters which are the same as the matching from the first corresponding relationship.
Example two:
the first corresponding relation comprises a target model parameter and a core weight value which are in the form of a target model parameter interval and a core weight value interval. At this time, the specific implementation manner of step 402 includes:
step A1: in the preset first corresponding relation, the target model parameter interval where the target model parameter of step 401 is located is determined.
For example, the target model parameter is the number of weight parameters, and the first corresponding relationship includes a corresponding relationship between a range of the number of weight parameters [1 million, 3 million ], a range of core weight values [0.2,0.4] of the CPU, and a range of core weight values [0.1,0.3] of the GPU. If the target model parameter of the convolutional neural network model acquired by the terminal is 1.5 million, it can be determined that 1.5 million of the target model parameter of the convolutional neural network model falls into the weight parameter number interval [1 million, 3 million ] in the first corresponding relationship.
Step A2: in the first correspondence, core weight value intervals of at least two cores are determined.
The core weight value intervals of the at least two cores correspond to the target model parameter intervals, the first corresponding relation comprises the corresponding relation between the target model parameter intervals and the core weight value intervals of the at least two cores, and the target model parameter intervals comprise target model parameters.
For example, the first correspondence relationship includes a correspondence relationship between a weight parameter number interval [1 million, 3 million ] and a core weight value interval [0.2,0.4] of the CPU, and a correspondence relationship between a weight parameter number interval [1 million, 3 million ] and a core weight value interval [0.1,0.3] of the GPU. After determining that the target model parameters of the convolutional neural network model fall into a weight parameter quantity interval [1 million, 3 million ], determining a core weight value interval [0.2,0.4] of the CPU and a core weight value interval [0.1,0.3] of the GPU, which correspond to the weight parameter quantity interval, in the first corresponding relation.
Step A3: for each core, a core weight value is determined from the core weight value interval.
And the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
For example, the position of the target model parameter 1.5 million in the target model parameter interval [1 million, 3 million ] is at one-half, so that the core weight value 0.3 at one-half is determined from the core weight value interval [0.2,0.4] for the CPU; for the GPU, a core weight value of 0.2 located at half is determined from the core weight value interval [0.1,0.3 ].
In another example, the first corresponding relationship includes the target model parameter as a value interval, and the first corresponding relationship includes the core weight value as a specific value. At this time, the specific implementation manner of step 402 includes: in a preset first corresponding relationship, a target model parameter interval where the target model parameter of step 401 is located is determined, and then, in the first corresponding relationship, core weight values of at least two cores corresponding to the target model parameter interval are determined.
Step 403: and determining a core running the convolutional neural network model from the at least two cores according to the core weight values of the at least two cores.
After the terminal determines the core weight values of the at least two cores, the core weight values are used for representing the priority degree of the cores selected to run the convolutional neural network model, so that the cores suitable for running the convolutional neural network model in step 401 are determined from the at least two cores according to the core weight values of the at least two cores, and the cores can be scheduled to run the convolutional neural network model to execute specific application services.
With respect to step 403, there are several specific implementations, a few of which are listed below:
example one of step 403:
and determining the core with the largest core weight value from the at least two cores, wherein the core with the largest core weight value is used for running the convolutional neural network model. Since the core weight values are used to represent the degree of preference that the cores are selected to run the convolutional neural network model, cores with large core weight values are scheduled to run the convolutional neural network model in preference to cores with small core weight values.
Example two of step 403:
in this example, to specifically execute step 403, the terminal needs to execute other steps to obtain a parameter for modifying the core weight value, then modify the core weight value using the parameter to obtain a modified weight value, and determine the core running the convolutional neural network model from the at least two cores by using the modified weight value. The corrected weight value introduces more reference factors, so that the core weight value can reflect the core which is suitable for scheduling better than the core weight value. The details are as follows:
the method of this example further includes:
step B1: and acquiring the current state parameters of the terminal.
Wherein, the state parameter is a dynamically changing parameter.
The state parameters are parameters which dynamically change on the terminal, reflect the specific operation environment when the terminal operates the convolutional neural network model, are influenced by different operation environments, and different effects can be generated when the convolutional neural network model is operated by the core. And acquiring the current state parameter of the terminal, and taking the state parameter as one of reference factors required by a scheduling core, so that the core which is more suitable for running the convolutional neural network model can be determined.
The status parameters include, but are not limited to, a remaining power value of the terminal, a core usage rate, a temperature of the core, and the like.
Step B2: and determining the parameter weight values of at least two cores from a preset second corresponding relation according to the state parameters.
The parameter weight values of the at least two cores correspond to the state parameters, the second corresponding relation comprises the corresponding relation between the state parameters and the parameter weight values of the at least two cores, and the parameter weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the state parameters.
And presetting a second corresponding relation on the terminal, wherein the second corresponding relation comprises a corresponding relation between the state parameters and the parameter weight values of at least two cores. Therefore, after the terminal acquires the current state parameters of the terminal, the parameter weight values of at least two cores can be determined from the preset second corresponding relation according to the state parameters, and the parameter weight values of the at least two cores correspond to the state parameters. The corresponding relationship between the parameter weight values of the at least two cores and the state parameters is the corresponding relationship between the state parameters included in the second corresponding relationship and the parameter weight values of the at least two cores. Wherein the parameter weight values are used to represent a degree of priority that the kernel is selected to run the convolutional neural network model under the state parameters. The kernel with a large parameter weight value runs the convolutional neural network model in preference to the kernel with a small parameter weight value. Therefore, the terminal may modify the core weight value in step 402 by using the core weight value, so that the modified weight value further considers the state parameter on the terminal, and can reflect the core suitable for running the convolutional neural network model.
Wherein the at least two kernels of step B2 and the at least two kernels of step 402 refer to the same.
As to which core is more suitable for running the convolutional neural network model under different state parameters, it can be obtained by performing a test in advance, for example, setting a test efficiency parameter, running a convolutional neural network model with a specific computation density on different cores, which are under the state parameters of specific terminals, and then obtaining the test efficiency parameters of the different cores. And then configuring a larger core weight value for the core with a large test efficiency parameter.
It is understood that the at least two core parameter weight values determined in step B2 have various specific implementation forms, such as percentage, fraction, or level representation, and the like, and specific reference may be made to the above detailed description of specific implementation forms of the core weight values.
It is to be understood that, in the second corresponding relationship, the state parameter and the parameter weight value may have multiple expressions, for example, may be specific values or value intervals, and refer to the above detailed description of the expressions of the target model parameter and the core weight value in the first corresponding relationship.
There are various specific implementations of step B2, two examples of which are given below:
in an example, the state parameter and the parameter weight value included in the second corresponding relationship are specific numerical values, and the specific implementation manner of step B2 includes: and B1, matching the state parameters in the step B and the state parameters in the second corresponding relation, and if the matching is the same, determining the parameter weight values of at least two cores corresponding to the matched same state parameters from the second corresponding relation.
In another example, the second correspondence includes a state parameter and a parameter weight value as a numerical range. At this time, the specific implementation manner of step B2 includes: determining a state parameter interval where the state parameters of the step B1 are located in a preset second corresponding relation; then, in a second corresponding relationship, determining parameter weight value intervals of at least two cores, where the parameter weight value intervals of the at least two cores correspond to the state parameter interval, and the second corresponding relationship includes a corresponding relationship between the state parameter interval and the parameter weight value intervals of the at least two cores. Subsequently, for each core, a parameter weight value is determined from the parameter weight value interval. Wherein, the position of the parameter weight value in the parameter weight value interval is the same as the position of the state parameter in the state parameter interval in the step B1. For details, reference may be made to the above detailed description of example two of the specific implementations of step 402.
After the steps B1 and B2 are executed, step 403 is executed, and at this time, step 403 specifically includes step B3 and step B4, as follows:
step B3: for each core, the parameter weight value is used for correcting the core weight value to obtain a first correction weight value.
Wherein the first modified weight value is used to represent a degree of priority of the kernel being selected to run the convolutional neural network model. The kernel with the large first correction weight value runs the convolutional neural network model in preference to the kernel with the small first correction weight value.
There are various specific correction methods for correcting the core weight value using the parameter weight value, and the specific correction method may be set in advance. For example, the parameter weight value and the core weight value are multiplied to obtain a first correction weight value, or the parameter weight value is used for correcting the core weight value according to a preset correction relationship to obtain the first correction weight value, for example, the core weight value is a third priority, the parameter weight value is a fifth priority, and the preset correction relationship is that the highest level of the two weight values is determined to be the first correction weight value, so that the first correction weight value is the third priority.
"for each core" in step B3 means for each of the aforementioned at least two cores. Step B3 is to determine specific kernels for the at least two kernels, and correct the kernel weight value of the kernel using the parameter weight value of the kernel to obtain a first corrected weight value of the kernel.
Step B4: a kernel running the convolutional neural network model is determined from the at least two kernels according to the first modified weight values of the at least two kernels.
After the terminal obtains the first correction weight values of the at least two cores, the core for operating the convolutional neural network model is determined from the at least two cores according to the first correction weight values of the at least two cores, so that the core suitable for operating the convolutional neural network model is determined.
The specific implementation manner of step B4 is various, for example, a kernel with the largest first modified weight value is determined from at least two kernels, and the kernel with the largest first modified weight value is used for running the convolutional neural network model. Alternatively, the first modified weight values are further modified using other parameters, and the further modified weight values are used to determine the core of running the convolutional neural network model. Steps B1 to B2 are performed similarly to the other steps B1 to B2, except that other parameters are obtained and the first modified weight value is modified.
The parameter weight value is obtained according to the current state parameter of the terminal, the current state parameter reflects the specific operation environment of the terminal when the convolutional neural network model operates, so that the parameter weight value reflects the influence of the current environment of the terminal on the core operation convolutional neural network model, the determination of the core weight value is determined according to the target model parameter representing the calculation density of the convolutional neural network model, the core weight value reflects the influence of the hardware characteristic of the core on the operation convolutional neural network model, therefore, more factors are considered in the first correction weight value obtained by correcting the core weight value by using the parameter weight value, and the core more suitable for the operation convolutional neural network model can be determined according to the first correction weight value.
There are several implementations of steps B1 to B2, and two implementations are given below:
the implementation mode is as follows:
in step B1, the current state parameter of the terminal is the current core usage of each core.
In this case, step B2 specifically includes: and for each core, determining a performance weight value from a preset second corresponding relation according to the core utilization rate.
The performance weight value of each core corresponds to the core utilization rate of each core, the performance weight value is used for indicating the priority of the core selected to run the convolutional neural network model under the current core utilization rate of the core, and the second corresponding relation comprises the corresponding relation between the core utilization rate of each core and the performance weight value of each core.
The core utilization rate refers to the core resources occupied by the programs running on the terminal, and represents the busy degree of the core running programs. The higher the core usage of a core, the more programs are running on that core and vice versa. The core usage may be a specific value, e.g., 10%, 2%, etc.
The performance weight values are used to represent the degree of priority that the kernel is selected to run the convolutional neural network model at the current kernel usage of the kernel. The performance weight value of a core reflects the current computing resource availability of that core. The performance weighted value is large, which indicates that the current computing resource utilization degree of the core is high, so that the core is preferentially scheduled to run the convolutional neural network model. The core with a large performance weight value is preferentially used for running the convolutional neural network model.
It is to be understood that, in the second corresponding relationship, the core usage rate and the performance weight value may also have multiple expressions, for example, they may be specific values or value intervals, which is described in detail above with reference to the expressions of the core weight value and the target model parameter in the first corresponding relationship.
Each core in this implementation means each core of the at least two cores, and for each core, the performance weight value of the core is determined from a preset second corresponding relationship according to the core usage rate of the core.
Reference may be made to the above-mentioned detailed description of two examples of the specific implementation of step B2 regarding "determining the performance weight value from the preset second correspondence according to the core usage".
The implementation mode two is as follows:
in step B1, the current state parameter of the terminal is the current remaining electric quantity value of the terminal;
in this case, step B2 specifically includes: and determining the power consumption weight values of at least two cores from a preset second corresponding relation according to the residual electric quantity value.
The power consumption weight values of the at least two cores correspond to the residual electric quantity value, the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value, and the second corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores.
The remaining power value is a value of the remaining power on the terminal. The remaining charge value is expressed by, but not limited to, percent, amp-hour, and the like.
The power consumption weight value is used for expressing the priority degree of the core selected to run the convolutional neural network model under the current specific residual electric quantity value of the terminal, and the core with the large power consumption weight value is preferentially used for running the convolutional neural network model. In some embodiments, the power consumption weight value may be set with reference to the power consumption of the core, for example, as the remaining power value decreases, the power consumption weight value of the core with high power consumption decreases by a larger amount than that of the core with low power consumption. Therefore, the power consumption weight value can reflect the suitability degree of the core operation convolution neural network model under the current residual electric quantity value of the terminal.
It is to be understood that, in the second corresponding relationship, the remaining power value and the power consumption weight value may also have multiple expressions, for example, may be specific values or value intervals, and specific reference may be made to the above detailed description of the expressions of the target model parameter and the core weight value in the first corresponding relationship.
Each core in this implementation means each core of the at least two cores, and for each core, the power consumption weight value of the core is determined from a preset second corresponding relationship according to the remaining electric quantity value.
Reference may be made to the above-mentioned detailed description of two examples of the specific implementation of step B2 regarding "determining the power consumption weight values of at least two cores from the preset second correspondence according to the remaining electric quantity value".
It is to be understood that, in step B4, the first modified weight value may be further modified to obtain a second modified weight value, so that the kernel for running the convolutional neural network model is determined from the at least two kernels according to the second modified weight value of the at least two kernels.
For example, in the first implementation manner of steps B1 to B2, the current remaining electric quantity value of the terminal may be further obtained, so as to determine the power consumption weight values of at least two cores according to another preset corresponding relationship between the remaining electric quantity value and the other core, and thus, for each core, the power consumption weight value is used to correct the first correction weight value, so as to obtain the second correction weight value. The determination of the power consumption weight value may refer to the detailed description in the above implementation mode two.
Accordingly, in some examples, in the second implementation manner of step B1 to step B2, the core usage rate of each core may be further obtained, so that for each core, a performance weight value is determined according to the core usage rate and another preset corresponding relationship, and then the first modified weight value is modified by using the performance weight value to obtain a second modified weight value. The determination of the performance weight value may refer to the detailed description of the first implementation manner.
In order to clearly understand the above scheme of secondarily correcting the weight value, a specific example is given below, as follows:
in the first implementation manner of the foregoing step B1 to step B2, the method of this example further includes:
step C1: acquiring a current residual electric quantity value of the terminal;
step C2: determining power consumption weight values of at least two cores from a preset third corresponding relation according to the residual electric quantity value
The power consumption weight values of the at least two cores correspond to the residual electric quantity value, the third corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores, and the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value.
Thus, step B4 specifically includes step C3 and step C4. The details of step C3 and step C4 are as follows:
step C3: and for each core, correcting the first correction weight value by using the power consumption weight value to obtain a second correction weight value.
Wherein the second modified weight value is used to represent a degree of priority of the kernel being selected to run the convolutional neural network model;
step C4: and determining a kernel for running the convolutional neural network model from the at least two kernels according to the second modified weight values of the at least two kernels.
Regarding the specific implementation manner of step C4, in some examples, the core with the largest second modified weight value may be determined from the at least two cores, and the core with the largest second modified weight value is used to run the convolutional neural network model. In other examples, the second modified weight value is further modified using other parameters to obtain a further modified weight value, and the further modified weight value is used to determine a kernel for running the convolutional neural network model.
Optionally, the core scheduling method in some examples of the present invention further includes determining a performance parameter, and the core may have a plurality of different operation modes when operating the convolutional neural network algorithm, and may be controlled to operate the convolutional neural network model using the determined performance parameter in order to control a specific operation mode of the core. The performance parameter determination needs to be combined with a specific use state of a core, and specifically, the performance parameter of the core may be determined according to a core use rate of the core. The details are as follows:
in some examples of the invention, the core scheduling method further comprises:
step D1: and acquiring the current core utilization rate of each core.
The terminal firstly obtains the core utilization rate, so that the performance parameters used by the core can be determined according to the core utilization rate. Each core here is each of the aforementioned at least two cores.
Specifically, the terminal may read the current core usage rate of the core through an Application Programming Interface (API) provided by the operating system.
Step D2: for each core, determining performance parameters from the second correspondence according to the core utilization.
The performance parameter of each core corresponds to the core utilization rate of each core, and the second corresponding relationship comprises the corresponding relationship between the performance parameter of each core and the core utilization rate of each core. The performance parameters are used to indicate the mode of operation of the core.
The terminal is preset with a second corresponding relation, and the second corresponding relation comprises a corresponding relation between the performance parameter of each core and the core utilization rate of each core. Therefore, after the terminal obtains the core utilization rate of each core, the terminal determines the performance parameters from the second corresponding relation according to the core utilization rate aiming at each core, and therefore the performance parameters of the at least two cores can be obtained.
The core usage rate in the second correspondence may be a specific numerical value, for example, 10%, 23%, or a numerical value interval, for example, [ 10%, 35% ], and this is not specifically limited in the embodiment of the present invention.
Specifically, for each core, the terminal matches the current core utilization rate of the core with the core utilization rate in the second corresponding relationship, and if the current core utilization rate of the core is the same as the core utilization rate in the second corresponding relationship or falls into the core utilization rate value interval in the second corresponding relationship, the matching is successful, and the terminal can determine the performance parameter corresponding to the successfully matched core utilization rate from the second corresponding relationship. The foregoing operations are performed for each core to obtain performance parameters for each core.
Step D3: after determining a kernel for running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels, running the convolutional neural network model on the target kernel using the performance parameters of the target kernel.
Wherein, the target core is the core for operating the convolutional neural network model.
The terminal obtains the performance parameters of each core, and after the target core for operating the convolutional neural network model is determined, when the terminal operates the convolutional neural network model by using the target core, the performance parameters of the target core can be used for operating the convolutional neural network model on the target core, so that the specific operation condition of the target core is controlled by setting the performance parameters, and the operation requirements of a user on the core are met.
Wherein the performance parameters comprise one or more of thread priority information, sleep time information and thread number information. The thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model; the sleep time information is the time interval between the two convolutional neural network models operated by the core; the thread number information is thread number information used when the core runs the convolutional neural network model.
In one example, the target core runs the convolutional neural network model using thread priority information, the target core uses sub-threads, and the sub-threads are scheduled according to priority information of the sub-threads indicated by the thread priority information.
In another example, the target core runs the convolutional neural network model using the sleep time information, and after the target core runs the convolutional neural network model, the target core does not run another convolutional neural network model for an interval indicated by the sleep time information.
In another example, the target core runs the convolutional neural network model using the thread number information, generates the thread numbers indicated by the thread number information, and then runs the convolutional neural network model using these thread numbers.
It can be understood that, since the determination of the performance parameter and the determination of the performance weight value both use the current core utilization rate of the core, in some embodiments, the determination of the performance parameter and the determination of the performance weight value may be implemented in the same step, and at this time, the current core utilization rate of each core may also be obtained only once.
And determining the performance parameters for controlling the core operation through the core utilization rate, so that the target core can use the performance parameters of the target core to operate the convolutional neural network model when operating the convolutional neural network model, and the target core can efficiently operate when the corresponding relation between the performance parameters of each core and the core utilization rate of each core, which is included in the second corresponding relation, is preset according to a mode of improving the execution efficiency.
In other embodiments of the present invention, a method for scheduling a core is further provided, where the method includes:
step E1: and acquiring target model parameters.
Wherein the target model parameters are used to represent the computational density of a convolutional neural network model.
The detailed description of step E1 can be referred to in step 401.
Step E2: and determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameters.
The core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to run the convolutional neural network model.
The detailed description of step E2 can be referred to in the detailed description of step 402. In this embodiment, the corresponding relationship between the target model parameter in the first corresponding relationship and the core weight values of the at least two cores may be preset according to an effect of the convolutional neural network model cooperatively operated by different cores.
Step E3: and distributing the convolutional neural network model to different cores to run according to the core weight values of at least two cores.
In this embodiment, a plurality of cores may be used to cooperatively run the convolutional neural network model, that is, the convolutional neural network model is allocated to different cores to run, but since the core weight value is determined for each core, after the convolutional neural network model is allocated to different cores to run according to the core weight value of each core, the proportion of the core running the convolutional neural network model is determined by the core weight value, for example, a core with a large core weight value is allocated to a core with a large core weight value to run, and a small part of the convolutional neural network model is allocated to a core with a small core weight value to run, so that the core with a large core weight value plays a role of running the main core.
In other embodiments of the present invention, a method for scheduling a core is further provided, where the method includes:
step F1: and acquiring task type parameters.
Wherein the task type parameter is used to indicate the type of the computing task.
The calculation task may be, for example, image recognition, voice recognition, image classification, etc., and the task type parameter may be text information, such as text of "image recognition," text of "voice recognition," or information such as letters or numbers, such as "001," as long as it can function to identify the type of the specific calculation task.
Step F2: and determining the core weight values of the at least two cores from a preset fourth corresponding relation according to the task type parameters.
The fourth corresponding relation comprises the corresponding relation between the task type parameter and the core weight values of the at least two cores, and the core weight values are used for expressing the priority degree of the cores selected to run the computing task.
In this embodiment, the corresponding relationship between the task type parameter and the core weight values of the at least two cores is preset, so that after the task type parameter is obtained, the obtained task type parameter is used for matching with the task type parameter in the fourth corresponding relationship, and the core weight values of the at least two cores corresponding to the task type parameter which is successfully matched can be determined.
Step F3: determining a core running the computing task from the at least two cores according to the core weight values of the at least two cores.
The core weight value is used to indicate a priority degree of the cores selected to run the computing task, so that the cores running the computing task can be determined from the at least two cores according to the core weight values of the at least two cores, for example, the cores with a large core weight value are selected to run the computing task, or the cores running the computing task are determined from the at least two cores after the core weight values of the at least two cores are corrected and adjusted according to other parameters.
In summary, heterogeneous cores on the terminal have different characteristics, and different cores are suitable for running convolutional neural network models with different computation densities. If a first corresponding relationship is preset, where the first corresponding relationship includes a corresponding relationship between a target model parameter and core weight values of at least two cores, where the target model parameter is used to represent a calculation density of a convolutional neural network model, and the at least two cores are heterogeneous cores on a terminal, after the target model parameter of the convolutional neural network model is obtained, the core weight values of the at least two cores may be determined from the preset first corresponding relationship according to the target model parameter. The core weight values are used to represent the degree of priority that the core is selected to run the convolutional neural network model, by which the core suitable for running the convolutional neural network model can be determined. In this way, the determination of the kernel for running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels can be achieved. The adaptive core can be determined to run the convolutional neural network model with specific calculation density through the core weight values of different cores, and if the larger core weight value is, the higher the efficiency of running the convolutional neural network model is, the core determined according to the core weight value can run the convolutional neural network model efficiently.
Fig. 5 is a flowchart of a method for core scheduling according to an embodiment of the present invention, where the method may be applied to a terminal, and the method in the embodiment shown in fig. 5 may be implemented based on the method in the embodiment shown in fig. 4, and in the embodiment shown in fig. 5, a systolic array processor is used as an NPU, and a target model parameter is a weight parameter number.
Referring to fig. 5 and the above description, the method of the embodiment of the present invention includes:
step 501: and the terminal acquires a convolutional neural network model.
The terminal may use a specific algorithm to perform the application service, for example, the application service may be performed by a convolutional neural network model. For this purpose, the terminal first obtains a convolutional neural network model to be used.
The terminal may obtain the convolutional neural network model in a variety of ways, for example, the terminal obtains the convolutional neural network model sent by other devices, or the terminal locally establishes the convolutional neural network model.
The terminal may use the convolutional neural network model for image and speech recognition on a variety of application services, for example, running the convolutional neural network model. Examples of performing image services by running a convolutional neural network model may be performing operations such as image classification, image feature extraction, face clustering, etc., which are computationally characterized by comprising a large number of matrix operations, and thus are suitable for performing using the convolutional neural network model.
Regarding the establishment method of the convolutional neural network model, the convolutional neural network model may be obtained by training, for example, collecting a large amount of relevant data, and using the data to perform convolutional neural network training. The device for executing the step of training the convolutional neural network model may be a terminal, or may be a server or other devices.
Step 502: and the terminal acquires the weight parameter quantity.
Wherein the weight parameter number is used for representing the calculation density of a convolution neural network model;
the convolutional neural network models are used for performing calculations, e.g. for image processing, wherein different convolutional neural network models have different characteristics, e.g. the calculation density of different convolutional neural network models may be different. The calculation density can be determined by the number of weight parameters in the convolutional neural network model, namely, the number of weight parameters of the convolutional neural network model can indicate the calculation density of the convolutional neural network model. The contents of the weight parameters of the convolutional neural network model can be referred to the above description.
Generally speaking, a computation density-intensive convolutional neural network model is suitable for running on the GPU, which may be, for example, a large matrix convolutional neural network model;
the sparse-computation-density convolutional neural network model, which is suitable for running on a CPU, may be, for example, a small matrix, or a serial, or for-round convolutional neural network model.
Different cores have different calculation characteristics, different types of convolutional neural network models have different calculation densities, and which type of convolutional neural network model is suitable for operating in which core can be known according to empirical data or experimental tests.
Specifically, a convolutional neural network model such as classification, feature extraction, object detection or the like of the ResNet (residual error network) 18 is suitable for being operated on an NPU or a GPU; the convolutional neural network model belonging to a small network, such as non-face (e.g., dog face/cat face) recognition, identification card image recognition, and the like, is suitable for running on a CPU.
Therefore, in the embodiment of the present invention, after the terminal acquires the convolutional neural network model, the number of weight parameters of the convolutional neural network model is acquired, so that the scheduling of the core for operating the convolutional neural network model can be considered according to the number of weight parameters of the convolutional neural network model.
The specific implementation manner of the terminal for obtaining the number of the weight parameters of the convolutional neural network model may be as follows:
and analyzing the obtained convolutional neural network model by using an analyzer by the terminal to obtain the number of the weight parameters of the convolutional neural network model.
Step 503: and determining the core weight values of at least two cores from a preset first corresponding relation according to the number of the weight parameters.
The core weight values of the at least two cores correspond to the weight parameter number, and the at least two cores are heterogeneous cores on the terminal. The kernel weight values are used to represent the degree of priority that the kernels are selected to run the convolutional neural network model.
In the embodiment of the present invention, a plurality of heterogeneous cores may be provided on the terminal, where the cores are computing units for performing computations, and the heterogeneous cores areTypes include, but are not limited to, CPU, GPU, DSP, NPU, etc. After step 503 is executed, each core corresponds to a core weight value, and a plurality of cores correspond to a plurality of core weight values, which may be in the form of a list, for example, the list { W }CPU=1.0;WGPU=0.5;WNPU=0.1},WCPUIndicates that the core weight value of the CPU is 1.0, WGPURepresents a core weight value of 0.5, W for the GPUNPUThe core weight value representing the NPU is 0.1.
The first corresponding relationship includes a corresponding relationship between the number of weight parameters and core weight values of the at least two cores. I.e. the parameters of the first correspondence include the number of weight parameters and the core weight value. In the first correspondence relationship, the specific form of these two parameters may be a specific numerical value, or a numerical range. The determined core weight value may be a specific numerical value. In the first correspondence relationship, the correspondence between the weight parameter number and the core weight value may be a correspondence between specific numerical values, or a correspondence between numerical value ranges. The core type to which the core weight value belongs in the first corresponding relationship may be a plurality of heterogeneous cores, and the number of the determined core weight values may also be a plurality of core types, wherein different core weight values belong to different heterogeneous cores.
The specific implementation manner of step 503 may be: and after the terminal acquires the weight parameter number of the convolutional neural network model, matching the acquired weight parameter number with the weight parameter number of the first corresponding relation, and when the acquired weight parameter number is the same as the weight parameter number of the first corresponding relation or the acquired weight parameter number is within the weight parameter number interval of the first corresponding relation, successfully matching. Then, the core weight values corresponding to the number of matched weight parameters are determined in the first correspondence.
If in the first corresponding relationship, when the weight parameter number interval corresponds to the core weight value interval, after the corresponding core weight value interval is determined, the specific core weight value can be calculated by using the weight parameter number of the convolutional neural network model, for example, the core weight value is determined by using a linear mapping manner in the corresponding core weight value range according to the weight parameter number of the convolutional neural network model.
The first corresponding relationship may be established according to experimental test data or empirical data. The first corresponding relation may be obtained by reading from a storage device of the terminal, or may be obtained by the terminal from another device.
Step 503 is described below by taking a specific example as follows:
a correspondence relationship is prestored in the terminal, and is shown in table 1.
Table 1:
Figure BDA0002032694000000201
in the correspondence shown in table one, the convolutional neural network model with the weight parameter number of less than 50 millions (million) is a small network model, which is suitable for running on a CPU, so for such convolutional neural networks, the core weight value of the CPU can be set to 1, the core weight value of the GPU is linearly set between 0 and 1 according to the weight parameter number of the convolutional neural network model, and the core weight value of the NPU is set between 0 and 0.5 according to the weight parameter number of the convolutional neural network model.
The convolutional neural network model of 50-200 millions is a medium-sized network model and is suitable for running on a GPU, so that the core weight value of the GPU is 1, the core weight value of a CPU is linearly set between 1 and 0.5 according to the number of weight parameters of the convolutional neural network model, and the core weight value of an NPU is set between 0.5 and 1.0 according to the number of weight parameters of the convolutional neural network model.
The 200-500 million convolutional neural network model is a large network model and is suitable for running on a special accelerating device NPU, so that the core weight value of the NPU is 1, the core weight value of the CPU is linearly set between 0.5 and 0 according to the weight parameter number of the convolutional neural network model, and the core weight value of the GPU is linearly set between 1.0 and 0.5 according to the weight parameter number of the convolutional neural network model.
The method of determining the core weight values of a core using linear mapping is as follows:
1) in the correspondence relationship in table 1, a target weight parameter number section in which the number of weight parameters of the convolutional neural network model is located is determined.
2) In the correspondence relationship in table 1, the core weight value intervals of the at least two cores are determined, and the target core weight value intervals of the at least two cores correspond to the target weight parameter number intervals, respectively.
Wherein the at least two cores are heterogeneous cores on the terminal.
3) And for each core, determining a core weight value from a core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
Now, taking the number of weight parameters of a convolutional neural network model acquired by a terminal as 100 millions, and taking heterogeneous cores of the terminal as CPU, GPU and NPU as examples, the core weight values are calculated as follows:
1) the terminal matches the weight parameter number 100Million of the convolutional neural network model with the weight parameter number interval in the corresponding relationship of table 1, and determines that the weight parameter number 100Million is located in the target weight parameter number interval '50-200 Million' in table 1.
2) In the correspondence relationship in table 1, the core weight values of the respective heterogeneous cores corresponding to the target weight parameter number interval "50 Million to 200 Million" are determined as follows: GPU is 1, CPU is 1-0.5, NPU is 0.5-1.0.
3) In order to determine the core weight value of each core from the core weight value interval, the terminal linearly maps the determined core weight value intervals of the CPU, the GPU and the NPU according to the position of the weight parameter quantity of the convolutional neural network model in the target weight parameter quantity interval to obtain the core weight value of each core. In the correspondence relationship in table 1, the position of the weight parameter number 100Million of the convolutional neural network model in the target weight parameter number range 50Million to 200Million is (100-50)/(200-50) ═ 1/3, and from this 1/3, the CPU reads out that the weight parameter number is (100-50)/(200-50) — 1/3Linear mapping within a target core weight value range of 1-0.5 to obtain a core weight value of the CPU, and setting the core weight value of the CPU to be SaThen (S)a-1)/(0.5-1) ═ 1/3, the core weight value of the CPU was calculated to be 0.83. Using the same method, the core weight value of the NPU is calculated to be 0.66.
In other examples of the present invention, the core weight value of the core may also be directly determined from the first correspondence relationship.
For example, the number of weight parameters of the convolutional neural network model is still 100 millions.
1) The terminal matches the weight parameter number 100Million of the convolutional neural network model with the weight parameter number in the corresponding relation of table 1, and determines that the weight parameter number 100Million is located in the target core weight value interval of 50 Million-200 Million.
2) In the correspondence relationship in table one, the core weight value of the GPU corresponding to the target core weight value range "50 Million to 200 Million" is 1, so that the core weight value of the GPU can be directly obtained.
As can be seen from the above description, there are various methods for determining the core weight values from the corresponding relations in the embodiments of the present invention, and when the specific core weight values are specific values, the core weight values can be directly determined, and if the core weight values are in a value range in the first corresponding relation, linear mapping is performed according to the number of weight parameters of the convolutional neural network model to obtain the core weight values of the core.
The core weight value reflects the degree that the hardware characteristic of the core is suitable for the current convolutional neural network model, and the higher the core weight value is, the more suitable the core is for operating the convolutional neural network model. Thus, the terminal may schedule the core of the convolutional neural network model to run according to the core weight value, for example, in the above example, the terminal may schedule the core GPU with the largest core weight value to run the convolutional neural network model obtained in step 501.
However, hardware characteristics of different cores are suitable for different convolutional neural network models, the operating environment of the core also has an influence on the operation of the convolutional neural network model, the core weight value reflects the correlation between the static characteristics of the core and the convolutional neural network model, and if the core is scheduled to operate the convolutional neural network model only by considering the hardware characteristics of the core, the obtained operation effect is not necessarily the best. In order to select a core more suitable for running the current convolutional neural network model, the running environment parameters, namely the dynamic characteristics of the core are also considered, and the core which is most suitable for running the convolutional neural network model at present can be selected by combining the static characteristics and the dynamic characteristics of the core. The specific combination manner may be to use the dynamic parameters of the core to adjust the core weight value to obtain a new weight value, so as to schedule the core according to the new weight value.
Therefore, the core scheduling method according to the embodiment of the present invention further includes the following steps.
Hereinafter, the core usage rate and the remaining electric quantity value of the terminal will be described as examples. I.e. to increase the load balancing decision of the performance and power consumption dimensions to modify the weight values to decide which core to schedule.
Step 504: and acquiring the current core utilization rate of each core.
Core usage is a dynamically changing parameter.
Different operating environments on the terminal have different influences on the core operating convolutional neural network model, the performance state of the core is one of the dynamic characteristics of the core, and different computing capabilities exist when the same core is in different performance states, so that the performance state of the core has influences on the operation of the convolutional neural network model, the current performance state of the core is taken as one of the consideration factors for generating the core scheduling strategy, and the scheduled core can operate the current convolutional neural network model more efficiently.
Core usage by a core is an important performance state parameter for the core, and thus core usage can be used as one of the considerations for core scheduling policies. The utilization of the core indicates the load condition of the core.
The specific implementation manner of the terminal acquiring the current usage rates of the multiple heterogeneous cores may be, for example:
and the terminal calls a preset core utilization rate reading program or uses an API (application program interface) provided by the terminal system for reading the core utilization rate, so that the core utilization rate of each core on the terminal is read. The kernel that needs to read the kernel usage rate is the kernel of the convolutional neural network model to be run on the terminal, i.e. the kernel shown in step 503 above.
For example, the terminal reads through the API provided by the terminal system that the core usage of the GPU is 25%, i.e. the GPU has 75% of the available computing resources.
Step 505: and for each core, determining a performance weight value and a performance parameter from a preset second corresponding relation according to the core utilization rate.
The performance weight value of each core corresponds to the core utilization rate of each core, and the performance parameter of each core corresponds to the core utilization rate of each core.
The performance weight value is used to represent the priority of a core selected to run the convolutional neural network model at its current core usage, and the performance weight value of a core reflects the current computational resource availability of that core. Different cores may have different performance weight values.
Running the convolutional neural network model on the core requires running using specific performance parameters of the core, including one or more of thread priority information, sleep time information, thread number information.
The thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model; the sleep time information is the time interval between the two convolutional neural network models operated by the core; the thread number information is used when the core runs the convolutional neural network model.
Wherein the performance weight values and performance parameters obtained in step 505 may also be presented in the form of a list.
The second correspondence includes a correspondence of a core usage rate of each core and a performance weight value of each core. The second correspondence relationship further includes a correspondence relationship between a performance parameter of each core and a core usage rate of each core. In other words, the second correspondence relationship is a correspondence relationship between the core usage rate and the performance weight value, and the performance parameter. I.e. the parameters of the second correspondence include core usage, performance weight values and performance parameters. The three parameters may be in the form of specific values or ranges. The determined performance weight value may be a specific numerical value. In the second correspondence, the core types to which the performance weight values belong may be one or more, and the number of the determined performance weight values may also be one or more, where different performance weight values belong to different cores. The performance parameter of the core may be one or more, and the present invention is not particularly limited thereto.
The specific implementation of step 505 may be as follows:
1) and determining a performance weight value corresponding to the current core utilization rate of the core according to the second corresponding relation.
After the terminal obtains the current core utilization rate of the core, the terminal matches the current core utilization rate with the core utilization rate in the second corresponding relationship, and if the current core utilization rate is the same as the core utilization rate in the second corresponding relationship or the current core utilization rate is within the numerical range of the core utilization rate in the second corresponding relationship, the matching is successful. And then, in the second corresponding relation of the terminal, determining a performance weight value corresponding to the successfully matched core utilization rate.
If in the second correspondence, the core usage rate corresponds to the value range of the performance weight value, and after the value range of the corresponding performance weight value is determined, a specific performance weight value can be calculated by using the current core usage rate of the core, for example, the performance weight value of the core is determined by using a linear mapping manner in the value range of the corresponding performance weight value according to the current core usage rate of the core.
2) And determining the performance parameters corresponding to the current core utilization rate of the core according to the second corresponding relation.
In the above operation, if the current core utilization rate is the same as the core utilization rate in the second corresponding relationship, or the current core utilization rate is within the value range of the core utilization rate in the second corresponding relationship, the matching is successful, and at this time, the terminal may further determine, in the second corresponding relationship, a performance parameter corresponding to the core utilization rate that is successfully matched, where the performance parameter may be a specific value.
The performance parameters of the core may be set in the second correspondence according to the specific core usage, and need to be obtained by system tuning. For example, when the core utilization rate is the lowest, more threads can be used to obtain higher processing performance, and when the core utilization rate is higher, fewer threads are used to process neural network computing requests, so that the influence on the core utilization rate of the local high enterprise is as small as possible.
The second corresponding relationship may be established in advance according to experimental test data or empirical data. The terminal may obtain the second corresponding relationship by reading from a memory of the terminal, or by obtaining from other devices.
Step 505 is described below by taking a specific example as follows:
there is a correspondence in advance in the terminal, and the correspondence is shown in table 2.
TABLE 2
Figure BDA0002032694000000241
As shown in table 2, in the corresponding relationship, the performance weight value is divided into three ranges, i.e. three levels, and the core usage rate and the performance parameter are also divided into three levels, where the usage rate of each level of cores is a range of values, and the performance parameter of each level is a specific value. In this example, the core utilization rate is not considered too high, so as to prevent the operation of the terminal from being affected by the overload of the core.
The method of determining the performance weight value of the core is as follows:
1) for each core, determining a target core utilization rate interval where the current core utilization rate of the core is located in the corresponding relation of the table 2;
2) for each core, determining a performance weight value interval corresponding to the target core usage interval in the corresponding relationship of table 2;
3) and for each core, determining a performance weight value from a performance weight value interval, wherein the position of the performance weight value in the performance weight value interval is the same as the position of the current core utilization rate of the core in the target core utilization rate interval.
Taking the current core usage of the GPU as 25% as an example, the performance weight value is calculated as follows:
1) in the corresponding relationship in table 2, it is determined that the target core usage interval in which the current core usage of the GPU is 25% is 2% to 30%.
2) In the corresponding relation of table 2, it is determined that the performance weight value interval corresponding to the target core usage interval is 0.8 to 0.5.
3) The relative position relation of the current core utilization rate of 25% of the GPU in the target core utilization rate range of 2% -30% is (25-2)/(30-2) ═ 23/28, linear mapping is carried out in a performance weight value interval of 0.8-0.5 according to 23/28, and the performance weight value of the GPU is set to be SxThen (S)x-0.8)/(0.5-0.8) ═ 23/28. And calculating to obtain the performance weight value x of the GPU which is approximately equal to 0.55.
The method of determining the performance parameters is as follows:
1) in the second corresponding relation, determining a target core utilization rate interval where the current core utilization rate is located;
2) and in the second corresponding relation, determining a performance parameter corresponding to the target core utilization rate interval.
Still take the current core utilization of the GPU as 25% as an example, wherein the performance parameters are set as follows:
1) in the corresponding relation of table 2, a target core utilization rate interval in which 25% of the current core utilization rate of the GPU is located is determined to be 2% -30%;
2) in the correspondence of table 2, the performance parameters corresponding to 2% to 30% of the target core usage interval are determined as follows: the thread priority information is 0, the sleep time information is 400ms, and the thread number information is 2.
It is understood that in some embodiments of the present invention, the corresponding relationship of step 505 may include a performance weight value but not a performance parameter, or not include a performance weight value but include a performance parameter. Or splitting the second corresponding relationship into two corresponding relationships, wherein one of the two corresponding relationships is the corresponding relationship between the core utilization rate and the performance weight value, and the other corresponding relationship is the corresponding relationship between the core utilization rate and the performance parameter.
It is understood that the core usage rate in step 505 is one example of the state parameters of the terminal, and the state parameters of the terminal may further include a terminal remaining power value, a temperature of the core, and the like. Thus, the second correspondence may also be a correspondence of other state parameters and parameter weight values of the at least two cores for indicating a degree of priority with which the cores are selected to run the convolutional neural network model under the particular state parameter.
Step 506: for each core, a first correction weight value is obtained by correcting the core weight value by using the performance weight value.
The first modified weight value is used to represent a degree of priority that the kernels are selected to run the convolutional neural network model, and kernels with a larger first modified weight value are suitable for running the convolutional neural network model than kernels with a smaller first modified weight value.
When the terminal needs to operate the convolutional neural network model on a specific core, not only the physical characteristics of the core need to be adapted to the characteristics of the convolutional neural network model, but also the current performance state of the core needs to be adapted to operate the convolutional neural network model, and therefore, the performance parameters of the core and the hardware characteristics of the core need to be considered in a combined manner. The specific implementation mode is that the performance weight values of the heterogeneous cores are used for correcting the core weight values of the heterogeneous cores, so that the obtained first correction weight value combines the core weight value and the performance weight value, and the first correction weight value of the core reflects the hardware characteristic of the core and the degree that the current core utilization rate of the core is suitable for operating the convolutional neural network model. Compared with the method only using the core weight value reflecting the static characteristic of the core, the first correction weight value can reflect the adaptation degree of the core to the convolutional neural network model better. And scheduling the core for running the convolutional neural network model according to the first modified weight value, so that the core running more efficiently can be selected.
There are various ways of correcting the core weight value of the core by using the performance weight value of the core, for example, multiplying the performance weight value of the core by the core weight value of the core, or performing a weighting operation to obtain a first corrected weight value.
For example, the core weight value of the GPU is 1, the performance weight value of the GPU is 0.7, and the first modified weight value of the GPU is obtained by multiplying the core weight value and the performance weight value.
Step 507: and acquiring the residual electric quantity value of the terminal.
The core of the terminal generates electricity consumption when operating the convolutional neural network model, and different electricity consumptions can be generated when operating the same convolutional neural network model on different cores due to different power consumptions of different cores. In order not to affect the continuous use of the terminal by the user, the remaining power of the terminal needs to be one of the considerations of the scheduling core. This is particularly a consideration in terminals where the electrical energy storage is small.
For this reason, the terminal needs to acquire a remaining power value thereon, which is used to indicate how much power is currently left in the terminal.
The specific way for the terminal to obtain the remaining electric quantity value may be, for example: and the terminal detects the current residual electric quantity of the terminal by using an electric quantity detection program on the terminal to obtain a residual electric quantity value.
Step 508: and determining the power consumption weight values of the at least two cores from a preset third corresponding relation according to the residual electric quantity value.
And the power consumption weight values of the at least two cores correspond to the residual electric quantity value. The power consumption weight value is used to represent the degree of preference that the kernel is selected to run the convolutional neural network model at a residual power value. The kernel with a large power consumption weight value is suitable for running the convolutional neural network model than the kernel with a small power consumption weight value.
Because different heterogeneous cores have different power consumption characteristics, different electric quantity consumptions can be generated when the same convolutional neural network model is operated, and the power consumption characteristics of the different cores can be used as one of the consideration factors for scheduling the cores.
If only the power consumption of the core is considered, the core possibly scheduled by the convolutional neural network model is the core with smaller power consumption, but the computing power of the core may not be better than that of other cores, that is, different parameters on a terminal need to be considered in an integrated manner, so that a better core scheduling strategy can be generated. Therefore, in the embodiment of the invention, the residual capacity of the terminal and the power consumption of the core are comprehensively considered to determine the power consumption weight value of the core. Specifically, the power consumption weight value of the core is determined by using a third corresponding relationship, parameters of the third corresponding relationship include a residual electric quantity value and the power consumption weight value, and the power consumption weight value is set in consideration of the power consumption of the core in the third corresponding relationship.
The third correspondence includes a correspondence of the remaining electric quantity value and the power consumption weight values of the at least two cores. That is, the parameters of the third correspondence include a remaining electric quantity value and a power consumption weight value. The two parameters may be in the form of specific values or ranges of values. The determined power consumption weight value can be a specific numerical value. In the third correspondence, the core types to which the power consumption weight values belong may be one or more, and the number of the determined power consumption weight values may also be one or more, where different power consumption weight values belong to different cores.
If in the third corresponding relation, the residual electric quantity value interval corresponds to the power consumption weight value, after the value range of the corresponding power consumption weight value is determined, the specific power consumption weight value can be calculated by using the current residual electric quantity value of the terminal, for example, the power consumption weight value is determined in the power consumption weight value interval by using a linear mapping mode according to the position of the current residual electric quantity value of the terminal in the corresponding power consumption weight value interval.
Step 508 is illustrated below by taking a specific example as follows:
table 3:
Figure BDA0002032694000000261
as shown in table 3, in the corresponding relationship, the power consumption weight value is divided into two ranges, i.e., into two levels, and the remaining power value range is also divided into two levels.
Considering that the NPU has a low power consumption, the remaining power range corresponding to the low power consumption weight value is set to be the lowest, that is, as long as the power is greater than 8%, the power consumption weight value can be set to be a higher level.
The power consumption weight value calculation method comprises the following steps:
1) in the corresponding relation of table 3, determining a target residual electric quantity value interval where the residual electric quantity value of the terminal is located;
2) in the corresponding relation of table 3, a target power consumption weight value interval corresponding to the target remaining electric quantity value interval is determined;
3) and for each core, determining a power consumption weight value from a target power consumption weight value interval, wherein the position of the power consumption weight value in the target power consumption weight value interval is the same as the position of the residual electric quantity value of the terminal in the target residual electric quantity value interval.
Taking the current remaining electric quantity value of the terminal as 40% as an example, the power consumption weight value is calculated as follows:
1) in the corresponding relationship of table 3, the target remaining power interval in which the remaining power value of the terminal is 40% is determined: 10% -100% of corresponding CPU, 0% -50% of corresponding GPU and 8% -100% of corresponding NPU;
2) for each core, in the corresponding relationship of table 3, a target power consumption weight value interval corresponding to the target remaining power amount interval is determined: the number of the CPU is 0.8-1.0 corresponding to the CPU, the number of the GPU is 0-0.8 corresponding to the GPU, and the number of the NPU is 0.8-1.0 corresponding to the NPU;
3) the relative position relationship of the 40% of the remaining electric quantity value of the terminal within the target remaining electric quantity value range of 10% -100% of the corresponding CPU is (40-10)/(100-10) ═ 1/3. According to 1/3, linear mapping is performed within a target power consumption weight range of 0.8-1.0 corresponding to the CPU, and the power consumption weight of the CPU is set to SyThen (S)y-0.8)/(1-0.8) ═ 1/3. Calculating to obtain the power consumption weight value S of the CPUy0.86. By using the same calculation method, the power consumption weight value of the GPU is calculated to be 0.64, and the power consumption weight value of the NPU is calculated to be 0.87.
Step 509: and for each core, correcting the first correction weight value by using the power consumption weight value to obtain a second correction weight value.
The second modified weight value is used to represent a degree of priority that the kernel is selected to run the convolutional neural network model; kernels having a larger second modified weight value are suitable for running the convolutional neural network model than kernels having a smaller second modified weight value.
And correcting the first correction weight value by using the power consumption weight value, namely correcting the first correction weight value of the core by using the residual electric quantity value of the terminal and the power consumption of the core.
In the embodiment of the present invention, the second modified weight value may be in the form of a list.
As described above, the first correction weight value of the core reflects the hardware calculation characteristics of the core and the degree that the current core utilization rate of the core is suitable for running the convolutional neural network model, and after the first correction weight value of the core is corrected by using the power consumption weight value of the core, the second correction weight value of the core is obtained and more parameters are combined to generate the core scheduling policy. According to the description of the weighted values, parameters related to the second modified weighted value of the core include hardware computing characteristics of the core, computing density of the convolutional neural network model, core utilization rate, residual electric quantity of a terminal where the core is located and power consumption of the core, so that the second modified weight of the core can reflect the suitability degree of the core for operating the convolutional neural network model, and the core capable of operating the convolutional neural network model efficiently can be determined more accurately according to the second modified weights of different cores.
There are various specific ways of modifying the first modification weight value by using the power consumption weight value, for example, multiplying the power consumption weight value of the core by the first modification weight value, or using a weighting operation, etc. to obtain the second modification weight value of the core.
For example, the power consumption weight value of the GPU is 0.4, the first modification weight value of the CPU is 0.7, and the terminal multiplies the first modification weight value by the power consumption weight value to obtain the second modification weight value of the GPU of 0.28.
It can be understood that, in the embodiment of the present invention, the core weight value may be corrected by using the performance weight value first, or by using the power consumption weight value first, or by using both the performance weight value and the power consumption weight value, which is not specifically limited in this respect.
Step 510: and determining a target core with the largest second correction weight value from the at least two cores.
The target kernel is used to run the kernel of the convolutional neural network model.
After the terminal obtains the second correction weight values of the multiple heterogeneous cores, the second correction weight values can be used for scheduling the cores. And comparing the second correction weight values of the cores, and selecting the target core with the maximum second correction weight value, wherein the second weight value of the core reflects the appropriateness of the core for running the convolutional neural network model, so that the convolutional neural network model is suitable for running on the target core.
For example, the second modified weight value of the plurality of cores is { W }CPU=0.4;WGPU=0.8;WNPUAnd 0.5, the terminal selects the GPU with the largest second correction weight value to run the convolutional neural network model obtained in step 501, so as to execute a specific application service.
Step 511: a convolutional neural network model is run on the target core using the performance parameters of the target core.
Running the convolutional neural network model on the target core involves a specific way of running, e.g., how to run the convolutional neural network model using the on-core threads. In the above step, since the performance parameter of the core is determined according to the current core usage rate of the core, that is, the performance parameter of the target core is determined, the terminal may run the convolutional neural network model on the target core using the performance parameter.
When there are multiple performance parameters to be determined, each performance parameter is used to run the convolutional neural network model. For example, the number of concurrent threads of the target core is controlled according to the thread number information of the performance parameter; according to the sleep time information of the performance parameters, after the network computing request is executed, the sleep time of the core is controlled, namely the core does not run the next convolution neural network model within the interval time indicated by the sleep time information; and controlling the priority of the sub-threads in the target core according to the thread priority information of the performance parameters.
For example, after the target core runs the convolutional neural network model, under the action of the sleep time information, the sleep API of the system is called, the target core sleeps for a period of time, another convolutional neural network model is not run in the period of time, and after the period of time, the next new convolutional neural network model is processed. Thus, when the core utilization rate is high, the target core can sleep for a longer time through the setting of the sleep time information so as to maintain the core utilization rate at a reasonable level.
To sum up, the method of the embodiment of the present invention may determine, after obtaining the weight parameter number of the convolutional neural network, the core weight values of a plurality of cores according to the weight parameter number and the preset first corresponding relationship, where the core weight value is used to indicate a priority level of the core selected to operate the convolutional neural network model, and then modify the core weight value according to the dynamic parameter of the terminal where the core is located, for example, the core usage rate of the core determines the performance weight value according to the second corresponding relationship, the performance weight value is used to indicate a priority level of the core selected to operate the convolutional neural network model under the current core usage rate of the core, the power consumption weight value is determined according to the third corresponding relationship through the residual power value of the terminal, and the power consumption weight value is used to indicate under the residual power value, the core is selected to operate the priority of the convolutional neural network model, the performance weight value and the power consumption weight value are used in sequence to correct the core weight value to obtain a second correction weight value, and in the plurality of cores, the target core with the largest second correction weight value is the core which is most suitable for operating the convolutional neural network model, the target core can be scheduled to operate the convolutional neural network model, the operation efficiency can be improved, and the power consumption can be reduced.
Fig. 6 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present invention. As shown in fig. 6, for convenience of illustration, only the portion related to the embodiment of the present invention is shown, and details of the technique are not disclosed, please refer to the method portion of the embodiment of the present invention. The terminal may be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, etc., taking the terminal as the mobile phone as an example:
fig. 6 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present invention. Referring to fig. 6, the handset includes: a Radio Frequency (RF) circuit 610, a memory 620, an input unit 630, a display unit 640, a sensor 650, an audio circuit 660, a wireless fidelity (WiFi) module 670, a central processor 680, and a power supply 690.
In some embodiments, the handset may also include a graphics processor 681, a digital signal processor 682, a systolic array processor 683, etc., which may be specifically a neural network processor, a tensor processor, a smart processor, etc.
Those skilled in the art will appreciate that the handset configuration shown in fig. 6 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 6:
the RF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information from a base station and then processes the received downlink information to the central processor 680; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 610 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.
The memory 620 may be used to store software programs and modules, and the central processing unit 680 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 620. The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 630 may include a touch panel 631 and other input devices 632. The touch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 631 or near the touch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the central processing unit 680, and can receive and execute commands sent by the central processing unit 680. In addition, the touch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 630 may include other input devices 632 in addition to the touch panel 631. In particular, other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 640 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The Display unit 640 may include a Display panel 641, and optionally, the Display panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 631 can cover the display panel 641, and when the touch panel 631 detects a touch operation thereon or nearby, the touch operation is transmitted to the central processor 680 to determine the type of the touch event, and then the central processor 680 provides a corresponding visual output on the display panel 641 according to the type of the touch event. Although in fig. 6, the touch panel 631 and the display panel 641 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 650, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 641 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 641 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
Audio circuit 660, speaker 661, and microphone 662 can provide an audio interface between a user and a cell phone. The audio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by the audio circuit 660 and converted into audio data, which are output to the central processor 680 for processing, and then passed through the RF circuit 610 to be sent to, for example, another cellular phone, or output to the memory 620 for further processing.
WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 670, and provides wireless broadband Internet access for the user. Although fig. 6 shows the WiFi module 670, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.
The cpu 680 is a control center of the mobile phone, and connects various parts of the whole mobile phone through various interfaces and lines, and executes various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 620 and calling data stored in the memory 620, thereby integrally monitoring the mobile phone. Optionally, central processor 680 may include one or more processing units; preferably, the central processor 680 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the central processor 680.
The handset also includes a power supply 690 (e.g., a battery) for supplying power to the various components, preferably, the power supply may be logically connected to the central processor 680 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.
In this embodiment of the present invention, the central processing unit 680 included in the terminal may be configured to: obtaining target model parameters, wherein the target model parameters are used for representing the calculation density of a convolution neural network model; determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameter, wherein the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority degree of the cores selected to operate the convolutional neural network model; and determining a core running the convolutional neural network model from the at least two cores according to the core weight values of the at least two cores.
Optionally, the central processor 680 may be further configured to: acquiring a current state parameter of the terminal, wherein the state parameter is a dynamically changing parameter; determining parameter weight values of at least two cores from a preset second corresponding relation according to the state parameters, wherein the parameter weight values of the at least two cores correspond to the state parameters, the second corresponding relation comprises the corresponding relation between the state parameters and the parameter weight values of the at least two cores, and the parameter weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the state parameters; for each core, correcting the core weight value by using the parameter weight value to obtain a first correction weight value, wherein the first correction weight value is used for expressing the priority degree of the core selected to run the convolutional neural network model; a kernel running the convolutional neural network model is determined from the at least two kernels according to the first modified weight values of the at least two kernels.
Optionally, the current state parameter of the terminal is the current core usage rate of each core, and the central processing unit 680 may further be configured to: and for each core, determining a performance weight value from a preset second corresponding relation according to the core utilization rate, wherein the performance weight value of each core corresponds to the core utilization rate of each core, the performance weight value is used for expressing the priority of the core selected to run the convolutional neural network model under the current core utilization rate of the core, and the second corresponding relation comprises the corresponding relation between the core utilization rate of each core and the performance weight value of each core.
Optionally, the current remaining electric quantity value of the terminal is obtained, and the central processing unit 680 may be further configured to: determining power consumption weight values of the at least two cores from a preset third corresponding relation according to the residual electric quantity value, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the third corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores, and the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value; for each core, correcting the first correction weight value by using the power consumption weight value to obtain a second correction weight value, wherein the second correction weight value is used for expressing the priority degree of the core selected to run the convolutional neural network model; and determining a kernel for running the convolutional neural network model from the at least two kernels according to the second modified weight values of the at least two kernels.
Optionally, the central processor 680 may be further configured to: acquiring the current core utilization rate of each core; for each core, determining a performance parameter from a second corresponding relation according to the core utilization rate, wherein the performance parameter of each core corresponds to the core utilization rate of each core, and the second corresponding relation comprises the corresponding relation between the performance parameter of each core and the core utilization rate of each core; and after determining the core for running the convolutional neural network model from the at least two cores according to the core weight values of the at least two cores, running the convolutional neural network model on the target core by using the performance parameters of the target core, wherein the target core is the core for running the convolutional neural network model.
Optionally, the current state parameter of the terminal is a current remaining electric quantity value of the terminal, and the central processing unit 680 may be further configured to: and determining power consumption weight values of the at least two cores from a preset second corresponding relation according to the residual electric quantity value, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value, and the second corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores.
Optionally, the central processor 680 may be further configured to: determining a target model parameter interval in which a target model parameter is located in a preset first corresponding relation; determining core weight value intervals of at least two cores in a first corresponding relation, wherein the core weight value intervals of the at least two cores correspond to the target model parameter interval, the first corresponding relation comprises the corresponding relation between the target model parameter interval and the core weight value intervals of the at least two cores, and the target model parameter interval comprises target model parameters; and for each core, determining a core weight value from a core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
Optionally, the central processor 680 may be further configured to perform the steps 401 to 403.
Optionally, the central processing unit 680 may be further configured to perform the steps 501 to 511.
In summary, the central processing unit 680 obtains a target model parameter, where the target model parameter is used to represent the computation density of a convolutional neural network model, and then, the central processing unit 680 determines, according to the target model parameter, core weight values of at least two cores from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, where the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight value is used to represent a priority of the core selected to run the convolutional neural network model. The central processor 680 thus determines a kernel from the at least two kernels from which to run the convolutional neural network model based on the kernel weight values of the at least two kernels. Heterogeneous cores on the terminal have different characteristics, and the different cores are suitable for operating convolutional neural network models with different calculation densities. If a first corresponding relationship is preset, where the first corresponding relationship includes a corresponding relationship between a target model parameter and core weight values of at least two cores, where the target model parameter is used to represent a calculation density of a convolutional neural network model, and the at least two cores are heterogeneous cores on a terminal, after the target model parameter of the convolutional neural network model is obtained, the core weight values of the at least two cores may be determined from the preset first corresponding relationship according to the target model parameter. The core weight values are used to represent the degree of priority that the core is selected to run the convolutional neural network model, by which the core suitable for running the convolutional neural network model can be determined. In this way, the determination of the kernel for running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels can be achieved. The adaptive core can be determined to run the convolutional neural network model with specific calculation density through the core weight values of different cores, and if the larger core weight value is, the higher the efficiency of running the convolutional neural network model is, the core determined according to the core weight value can run the convolutional neural network model efficiently.
Fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present invention, where the terminal may be integrated in the terminal shown in fig. 6, and the terminal shown in fig. 7 may be configured to perform the steps performed by the terminal shown in fig. 4 or fig. 5.
Referring to fig. 7, the terminal according to the embodiment of the present invention includes:
an obtaining unit 701, configured to obtain target model parameters, where the target model parameters are used to represent a computation density of a convolution neural network model;
a weight value determining unit 702, configured to determine, according to the target model parameter, core weight values of at least two cores from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on the terminal, the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight values are used to indicate a priority degree of the cores selected to run the convolutional neural network model;
a kernel determining unit 703, configured to determine, from the at least two kernels, a kernel for running the convolutional neural network model according to the kernel weight values of the at least two kernels.
Alternatively,
the obtaining unit 701 is further configured to obtain a current state parameter of the terminal, where the state parameter is a dynamically changing parameter;
the weight value determining unit 702 is further configured to determine, according to the state parameter, parameter weight values of at least two cores from a preset second corresponding relationship, where the parameter weight values of the at least two cores correspond to the state parameter, the second corresponding relationship includes a corresponding relationship between the state parameter and the parameter weight values of the at least two cores, and the parameter weight values are used to indicate a priority degree of the cores selected to run the convolutional neural network model under the state parameter;
a core determining unit 703 including a modifying module 704 and a core determining module 705;
a modification module 704, configured to modify the core weight value using the parameter weight value for each core to obtain a first modified weight value, where the first modified weight value is used to indicate a priority of the core selected to run the convolutional neural network model;
a kernel determining module 705, configured to determine a kernel for running the convolutional neural network model from the at least two kernels according to the first modified weight values of the at least two kernels.
Alternatively,
the current state parameter of the terminal is the current core utilization rate of each core;
the weight value determining unit 702 is further configured to determine, for each core, a performance weight value from a preset second corresponding relationship according to the core usage rate, where the performance weight value of each core corresponds to the core usage rate of each core, the performance weight value is used to indicate a priority of a core selected to run the convolutional neural network model under the current core usage rate of the core, and the second corresponding relationship includes a corresponding relationship between the core usage rate of each core and the performance weight value of each core.
Alternatively,
the obtaining unit 701 is further configured to obtain a current remaining electric quantity value of the terminal;
the weight value determining unit 702 is further configured to determine, according to the remaining electric quantity value, power consumption weight values of the at least two cores from a preset third corresponding relationship, where the power consumption weight values of the at least two cores correspond to the remaining electric quantity value, the third corresponding relationship includes a corresponding relationship between the remaining electric quantity value and the power consumption weight values of the at least two cores, and the power consumption weight values are used to indicate a priority level of the cores selected to run the convolutional neural network model under the remaining electric quantity value;
a modification module 704, further configured to modify, for each core, the first modification weight value using the power consumption weight value to obtain a second modification weight value, where the second modification weight value is used to indicate a priority of the core being selected to run the convolutional neural network model;
the kernel determining module 705 is further configured to determine a kernel for running the convolutional neural network model from the at least two kernels according to the second modified weight values of the at least two kernels.
Alternatively,
the terminal further comprises a parameter determination unit 706 and an execution unit 707;
an obtaining unit 701, configured to obtain a current core utilization rate of each core;
a parameter determining unit 706, configured to determine, for each core, a performance parameter from a second corresponding relationship according to the core usage rate, where the performance parameter of each core corresponds to the core usage rate of each core, and the second corresponding relationship includes a corresponding relationship between the performance parameter of each core and the core usage rate of each core;
a running unit 707, configured to, after the core determining unit determines a core, in which the convolutional neural network model is run, from the at least two cores according to the core weight values of the at least two cores, run the convolutional neural network model on a target core using the performance parameter of the target core, where the target core is the core on which the convolutional neural network model is run.
Alternatively,
the performance parameters comprise one or more of thread priority information, sleep time information and thread number information;
the thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model;
the sleep time information is the time interval between the two convolutional neural network models operated by the core;
the thread number information is used when the core runs the convolutional neural network model.
Alternatively,
the current state parameter of the terminal is the current residual electric quantity value of the terminal;
the weight value determining unit 702 is further configured to determine, according to the remaining electric quantity value, power consumption weight values of the at least two cores from a preset second corresponding relationship, where the power consumption weight values of the at least two cores correspond to the remaining electric quantity value, the power consumption weight values are used to indicate a priority degree of the cores being selected to run the convolutional neural network model under the remaining electric quantity value, and the second corresponding relationship includes a corresponding relationship between the remaining electric quantity value and the power consumption weight values of the at least two cores.
Alternatively,
the target model parameters are the weight parameter number of the convolutional neural network model.
Alternatively,
the at least two cores include at least two of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP) and a systolic array processor.
Alternatively,
the weight value determining unit 702 is further configured to determine a target model parameter interval where the target model parameter is located in a preset first corresponding relationship; determining core weight value intervals of at least two cores in a first corresponding relation, wherein the core weight value intervals of the at least two cores correspond to the target model parameter interval, the first corresponding relation comprises the corresponding relation between the target model parameter interval and the core weight value intervals of the at least two cores, and the target model parameter interval comprises target model parameters; and for each core, determining a core weight value from a core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
In summary, the obtaining unit 701 obtains a target model parameter, where the target model parameter is used to represent a computation density of a convolutional neural network model, and then the weight value determining unit 702 determines, according to the target model parameter, core weight values of at least two cores from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, where the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight value is used to represent a priority of the core selected to run the convolutional neural network model. The kernel determination unit 703 thus determines a kernel, which runs the convolutional neural network model, from among the at least two kernels according to the kernel weight values of the at least two kernels. Heterogeneous cores on the terminal have different characteristics, and the different cores are suitable for operating convolutional neural network models with different calculation densities. If a first corresponding relationship is preset, where the first corresponding relationship includes a corresponding relationship between a target model parameter and core weight values of at least two cores, where the target model parameter is used to represent a calculation density of a convolutional neural network model, and the at least two cores are heterogeneous cores on a terminal, after the target model parameter of the convolutional neural network model is obtained, the core weight values of the at least two cores may be determined from the preset first corresponding relationship according to the target model parameter. The core weight values are used to represent the degree of priority that the core is selected to run the convolutional neural network model, by which the core suitable for running the convolutional neural network model can be determined. In this way, the determination of the kernel for running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels can be achieved. The adaptive core can be determined to run the convolutional neural network model with specific calculation density through the core weight values of different cores, and if the larger core weight value is, the higher the efficiency of running the convolutional neural network model is, the core determined according to the core weight value can run the convolutional neural network model efficiently.
An embodiment of the present invention further provides a chip apparatus, where the chip includes a processing unit, and is configured to execute the methods shown in fig. 4 and fig. 5.
The embodiment of the invention also provides a chip device which comprises a processor and a memory. The memory includes instructions that the processor executes to perform the methods described above in fig. 4 and 5.
In an embodiment of the present invention, a chip device may be a chip in a terminal, where the chip includes: a processing unit, such as a processor, which may be the central processor 680 described previously, and a communication unit. The communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc., includes a system bus. Optionally, the chip further includes a storage unit, where the storage unit may be a memory inside the chip, such as a register, a cache, a Random Access Memory (RAM), an EEPROM, or a FLASH; the memory unit may also be a memory external to the chip, which may be of the various types of memory 620 described previously. The processor is coupled to the memory and is operable to execute the instructions stored in the memory to cause the chip arrangement to perform the methods described above with reference to fig. 4 and 5.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (21)

1. A method for core scheduling, comprising:
obtaining target model parameters, wherein the target model parameters are used for representing the calculation density of a convolution neural network model;
determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameter, wherein the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, the hardware characteristics of different heterogeneous cores are different, the at least two cores comprise at least two of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP) and a systolic array processor, the first corresponding relation comprises the corresponding relation between the target model parameter and the core weight values of the at least two cores, and the core weight values are used for representing the priority of the cores selected to run the convolutional neural network model;
determining a kernel from the at least two kernels running the convolutional neural network model according to the kernel weight values of the at least two kernels.
2. The method of claim 1,
the method further comprises the following steps:
acquiring the current state parameter of the terminal, wherein the state parameter is a dynamically changing parameter;
determining parameter weight values of the at least two cores from a preset second corresponding relation according to the state parameter, wherein the parameter weight values of the at least two cores correspond to the state parameter, the second corresponding relation comprises the corresponding relation between the state parameter and the parameter weight values of the at least two cores, and the parameter weight values are used for representing the priority degree of the cores selected to run the convolutional neural network model under the state parameter;
the determining a kernel from the at least two kernels running the convolutional neural network model according to the kernel weight values of the at least two kernels comprises:
for each core, correcting the core weight value by using a parameter weight value to obtain a first correction weight value, wherein the first correction weight value is used for representing the priority degree of the core selected to run the convolutional neural network model;
determining a kernel from the at least two kernels running the convolutional neural network model according to the first modified weight values of the at least two kernels.
3. The method of claim 2,
the current state parameter of the terminal is the current core utilization rate of each core;
determining the parameter weight values of the at least two cores from a preset second corresponding relation according to the state parameters includes:
for each core, determining a performance weight value from a preset second corresponding relation according to the core utilization rate, wherein the performance weight value of each core corresponds to the core utilization rate of each core, the performance weight value is used for indicating the priority degree of the core selected to operate the convolutional neural network model under the current core utilization rate of the core, and the second corresponding relation comprises the corresponding relation between the core utilization rate of each core and the performance weight value of each core.
4. The method of claim 3,
the method further comprises the following steps:
acquiring a current residual electric quantity value of the terminal;
determining power consumption weight values of the at least two cores from a preset third corresponding relation according to the residual electric quantity value, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the third corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores, and the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value;
the determining a kernel from the at least two kernels running the convolutional neural network model according to the first modified weight values of the at least two kernels comprises:
for each core, correcting the first correction weight value by using the power consumption weight value to obtain a second correction weight value, wherein the second correction weight value is used for representing the priority degree of the core selected to run the convolutional neural network model;
determining a kernel from the at least two kernels running the convolutional neural network model according to the second modified weight values of the at least two kernels.
5. The method of claim 1,
the method further comprises the following steps:
acquiring the current core utilization rate of each core;
for each core, determining a performance parameter from a second corresponding relation according to the core utilization rate, wherein the performance parameter of each core corresponds to the core utilization rate of each core, and the second corresponding relation comprises the corresponding relation between the performance parameter of each core and the core utilization rate of each core;
after determining a kernel running the convolutional neural network model from the at least two kernels according to the kernel weight values of the at least two kernels, the method further comprises:
running the convolutional neural network model on a target core using performance parameters of the target core, the target core being the core on which the convolutional neural network model is run.
6. The method of claim 5,
the performance parameters comprise one or more of thread priority information, sleep time information and thread number information;
the thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model;
the sleep time information is the time interval between the two convolutional neural network models operated by the core;
the thread number information is used when the core runs the convolutional neural network model.
7. The method of claim 2,
the current state parameter of the terminal is the current residual electric quantity value of the terminal;
determining the parameter weight values of the at least two cores from a preset second corresponding relation according to the state parameters includes:
determining power consumption weight values of the at least two cores from a preset second corresponding relation according to the residual electric quantity value, wherein the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the power consumption weight values are used for expressing the priority degree of the cores selected to run the convolutional neural network model under the residual electric quantity value, and the second corresponding relation comprises the corresponding relation between the residual electric quantity value and the power consumption weight values of the at least two cores.
8. The method according to any one of claims 1 to 7,
the target model parameters are the weight parameter number of the convolutional neural network model.
9. The method according to any one of claims 1 to 7,
determining core weight values of at least two cores from a preset first corresponding relation according to the target model parameters, including:
determining a target model parameter interval in which the target model parameter is located in a preset first corresponding relation;
determining core weight value intervals of at least two cores in the first corresponding relation, wherein the core weight value intervals of the at least two cores correspond to the target model parameter interval, the first corresponding relation comprises the corresponding relation between the target model parameter interval and the core weight value intervals of the at least two cores, and the target model parameter interval comprises the target model parameter;
for each core, determining a core weight value from a core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
10. A terminal, comprising:
the device comprises an acquisition unit, a calculation unit and a processing unit, wherein the acquisition unit is used for acquiring target model parameters, and the target model parameters are used for representing the calculation density of a convolution neural network model;
a weight value determining unit, configured to determine, according to the target model parameter, core weight values of at least two cores from a preset first corresponding relationship, where the core weight values of the at least two cores correspond to the target model parameter, the at least two cores are heterogeneous cores on a terminal, hardware characteristics of different heterogeneous cores are different, the at least two cores include at least two of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), and a systolic array processor, the first corresponding relationship includes a corresponding relationship between the target model parameter and the core weight values of the at least two cores, and the core weight values are used to indicate a priority of a core selected to run the convolutional neural network model;
a kernel determining unit, configured to determine, from the at least two kernels, a kernel for running the convolutional neural network model according to the kernel weight values of the at least two kernels.
11. The terminal of claim 10,
the acquiring unit is further configured to acquire a current state parameter of the terminal, where the state parameter is a dynamically changing parameter;
the weight value determining unit is further configured to determine, according to the state parameter, parameter weight values of the at least two cores from a preset second corresponding relationship, where the parameter weight values of the at least two cores correspond to the state parameter, the second corresponding relationship includes a corresponding relationship between the state parameter and the parameter weight values of the at least two cores, and the parameter weight values are used to indicate a priority of a core selected to operate the convolutional neural network model under the state parameter;
the core determining unit comprises a correcting module and a core determining module;
the correction module is used for correcting the core weight value by using the parameter weight value for each core to obtain a first correction weight value, and the first correction weight value is used for representing the priority degree of the core selected to run the convolutional neural network model;
the kernel determining module is configured to determine, from the at least two kernels, a kernel for running the convolutional neural network model according to the first modified weight values of the at least two kernels.
12. The terminal of claim 11,
the current state parameter of the terminal is the current core utilization rate of each core;
the weight value determining unit is further configured to determine, for each core, a performance weight value from a preset second corresponding relationship according to a core usage rate, where the performance weight value of each core corresponds to the core usage rate of each core, the performance weight value is used to indicate a priority level of a core selected to run the convolutional neural network model at a current core usage rate of the core, and the second corresponding relationship includes a corresponding relationship between the core usage rate of each core and the performance weight value of each core.
13. The terminal of claim 12,
the obtaining unit is further configured to obtain a current remaining electric quantity value of the terminal;
the weight value determining unit is further configured to determine, according to the residual electric quantity value, power consumption weight values of the at least two cores from a preset third corresponding relationship, where the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the third corresponding relationship includes a corresponding relationship between the residual electric quantity value and the power consumption weight values of the at least two cores, and the power consumption weight values are used to indicate a priority degree of a core selected to run the convolutional neural network model under the residual electric quantity value;
the correction module is further configured to correct the first correction weight value for each core using the power consumption weight value to obtain a second correction weight value, where the second correction weight value is used to indicate a priority of the core being selected to run the convolutional neural network model;
the kernel determining module is further configured to determine, from the at least two kernels, a kernel to run the convolutional neural network model according to the second modified weight values of the at least two kernels.
14. The terminal of claim 10,
the terminal also comprises a parameter determining unit and an operating unit;
the obtaining unit is further configured to obtain a current core utilization rate of each core;
the parameter determining unit is configured to determine, for each core, a performance parameter from a second corresponding relationship according to a core usage rate, where the performance parameter of each core corresponds to the core usage rate of each core, and the second corresponding relationship includes a corresponding relationship between the performance parameter of each core and the core usage rate of each core;
the operation unit is configured to, after the core determination unit determines a core for operating the convolutional neural network model from the at least two cores according to the core weight values of the at least two cores, operate the convolutional neural network model on a target core using the performance parameter of the target core, where the target core is the core for operating the convolutional neural network model.
15. The terminal of claim 14,
the performance parameters comprise one or more of thread priority information, sleep time information and thread number information;
the thread priority information is the priority information of the sub-threads when the core runs the convolutional neural network model;
the sleep time information is the time interval between the two convolutional neural network models operated by the core;
the thread number information is used when the core runs the convolutional neural network model.
16. The terminal of claim 11,
the current state parameter of the terminal is the current residual electric quantity value of the terminal;
the weight value determining unit is further configured to determine, according to the residual electric quantity value, power consumption weight values of the at least two cores from a preset second corresponding relationship, where the power consumption weight values of the at least two cores correspond to the residual electric quantity value, the power consumption weight values are used to indicate a priority degree of a core selected to run the convolutional neural network model under the residual electric quantity value, and the second corresponding relationship includes a corresponding relationship between the residual electric quantity value and the power consumption weight values of the at least two cores.
17. The terminal according to any of claims 10-16,
the target model parameters are the weight parameter number of the convolutional neural network model.
18. The terminal according to any of claims 10-16,
the weight value determining unit is further configured to determine a target model parameter interval in which the target model parameter is located in a preset first corresponding relationship; determining core weight value intervals of at least two cores in the first corresponding relation, wherein the core weight value intervals of the at least two cores correspond to the target model parameter interval, the first corresponding relation comprises the corresponding relation between the target model parameter interval and the core weight value intervals of the at least two cores, and the target model parameter interval comprises the target model parameter; for each core, determining a core weight value from a core weight value interval, wherein the position of the core weight value in the core weight value interval is the same as the position of the target model parameter in the target model parameter interval.
19. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-9.
20. A terminal, comprising:
a processor and a memory;
the processor is arranged to cause the terminal to perform the method of any one of claims 1 to 9 by invoking operating instructions stored by the memory.
21. A chip apparatus, characterized in that the apparatus comprises a processing unit;
wherein the processing unit is adapted to perform the method of any of claims 1-9.
CN201780064697.0A 2017-10-25 2017-10-25 Core scheduling method and terminal Active CN109937410B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/107614 WO2019079994A1 (en) 2017-10-25 2017-10-25 Core scheduling method and terminal

Publications (2)

Publication Number Publication Date
CN109937410A CN109937410A (en) 2019-06-25
CN109937410B true CN109937410B (en) 2021-02-23

Family

ID=66247167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780064697.0A Active CN109937410B (en) 2017-10-25 2017-10-25 Core scheduling method and terminal

Country Status (2)

Country Link
CN (1) CN109937410B (en)
WO (1) WO2019079994A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11789894B2 (en) 2022-01-27 2023-10-17 Wistron Corporation Acceleration system and dynamic configuration method thereof

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442919B (en) * 2019-07-12 2022-12-27 西安空间无线电技术研究所 Microwave component micro-discharge numerical simulation method based on GPU (graphics processing Unit) architecture
CN114237859B (en) * 2022-02-25 2022-05-13 中瓴智行(成都)科技有限公司 Distributed intelligent terminal GPU (graphics processing Unit) computing power improving method, terminal, system and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103119580A (en) * 2010-09-25 2013-05-22 英特尔公司 Application scheduling in heterogeneous multiprocessor computing platforms
CN103645954A (en) * 2013-11-21 2014-03-19 华为技术有限公司 CPU scheduling method, device and system based on heterogeneous multi-core system
WO2015050474A1 (en) * 2013-10-03 2015-04-09 Huawei Technologies Co., Ltd Method and system for assigning a computational block of a software program to cores of a multi-processor system
CN105930902A (en) * 2016-04-18 2016-09-07 中国科学院计算技术研究所 Neural network processing method and system
CN107003988A (en) * 2014-12-19 2017-08-01 英特尔公司 Storage device and method for performing convolution algorithm
CN107209548A (en) * 2015-02-13 2017-09-26 英特尔公司 Power management is performed in polycaryon processor

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8683243B2 (en) * 2011-03-11 2014-03-25 Intel Corporation Dynamic core selection for heterogeneous multi-core systems
US8782645B2 (en) * 2011-05-11 2014-07-15 Advanced Micro Devices, Inc. Automatic load balancing for heterogeneous cores
CN103299277B (en) * 2011-12-31 2016-11-09 华为技术有限公司 Gpu system and processing method thereof
US9766673B2 (en) * 2015-02-27 2017-09-19 Intel Corporation Supercapacitor-based power supply protection for multi-node systems
CN105224502A (en) * 2015-09-28 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of degree of depth learning method based on GPU and system
CN106201651A (en) * 2016-06-27 2016-12-07 鄞州浙江清华长三角研究院创新中心 The simulator of neuromorphic chip

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103119580A (en) * 2010-09-25 2013-05-22 英特尔公司 Application scheduling in heterogeneous multiprocessor computing platforms
WO2015050474A1 (en) * 2013-10-03 2015-04-09 Huawei Technologies Co., Ltd Method and system for assigning a computational block of a software program to cores of a multi-processor system
CN103645954A (en) * 2013-11-21 2014-03-19 华为技术有限公司 CPU scheduling method, device and system based on heterogeneous multi-core system
CN107003988A (en) * 2014-12-19 2017-08-01 英特尔公司 Storage device and method for performing convolution algorithm
CN107209548A (en) * 2015-02-13 2017-09-26 英特尔公司 Power management is performed in polycaryon processor
CN105930902A (en) * 2016-04-18 2016-09-07 中国科学院计算技术研究所 Neural network processing method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11789894B2 (en) 2022-01-27 2023-10-17 Wistron Corporation Acceleration system and dynamic configuration method thereof
TWI819480B (en) * 2022-01-27 2023-10-21 緯創資通股份有限公司 Acceleration system and dynamic configuration method thereof

Also Published As

Publication number Publication date
WO2019079994A1 (en) 2019-05-02
CN109937410A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
US11307864B2 (en) Data processing apparatus and method
CN106919918B (en) Face tracking method and device
EP3553676A1 (en) Smart recommendation method and terminal
CN109257429A (en) A kind of calculating unloading dispatching method based on deeply study
CN109937410B (en) Core scheduling method and terminal
CN107223298A (en) Isomery battery unit switches
CN111209423B (en) Image management method and device based on electronic album and storage medium
JP2023510566A (en) Adaptive search method and apparatus for neural networks
CN103189853A (en) Method and apparatus for providing efficient context classification
CN113469340A (en) Model processing method, federal learning method and related equipment
WO2023207487A1 (en) Circuit wiring determination method and related device
CN113645637B (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
CN108961267A (en) Image processing method, picture processing unit and terminal device
CN114402336A (en) Neural processing unit
US20210150371A1 (en) Automatic multi-objective hardware optimization for processing of deep learning networks
CN112819152B (en) Neural network training method and device
CN114065900A (en) Data processing method and data processing device
CN115562878B (en) GPU computing resource management method and device, electronic equipment and readable storage medium
CN116828541A (en) Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning
CN108681480B (en) Background application program control method and device, storage medium and electronic equipment
CN116579380A (en) Data processing method and related equipment
WO2024055952A1 (en) Data processing method and apparatus thereof
CN117454930B (en) Method and device for outputting expression characteristic data aiming at graphic neural network
CN107886119B (en) Feature extraction method, application control method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant