CN111027018B - Method, device, computing equipment and medium for accelerating modeling of computing equipment - Google Patents

Method, device, computing equipment and medium for accelerating modeling of computing equipment Download PDF

Info

Publication number
CN111027018B
CN111027018B CN201911324820.5A CN201911324820A CN111027018B CN 111027018 B CN111027018 B CN 111027018B CN 201911324820 A CN201911324820 A CN 201911324820A CN 111027018 B CN111027018 B CN 111027018B
Authority
CN
China
Prior art keywords
vector
parameter
sequence
feature
multiply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911324820.5A
Other languages
Chinese (zh)
Other versions
CN111027018A (en
Inventor
赵原
殷山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911324820.5A priority Critical patent/CN111027018B/en
Publication of CN111027018A publication Critical patent/CN111027018A/en
Application granted granted Critical
Publication of CN111027018B publication Critical patent/CN111027018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the specification provides a method, a device, computing equipment and a medium for accelerating modeling of computing equipment, wherein in the training process of a target model, vector division is performed on model parameters and respective feature data of each training sample to realize vectorization of the model parameters and the feature data, then, a preset vector floating point multiply-add instruction is called for the training sample in each iteration training process, and a parameter vector sequence and a feature vector sequence obtained by vector division are subjected to multiply-add processing to obtain a target value of the training sample; further, a trained target model is obtained. For example, in a personalized recommendation scenario, the feature data may be personal information of the user, such as portrait information of the user.

Description

Method, device, computing equipment and medium for accelerating modeling of computing equipment
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a method, a device, computing equipment and a medium for accelerating modeling of computing equipment.
Background
With the development of computer technology, the use of Artificial Intelligence (AI) has become more and more widespread, and the AI can be more and more mature applied to various scenes such as security scenes, financial scenes, personalized recommendation scenes and the like. To construct an AI model, the model needs to be trained and then tested before being put into use. Among them, model training is the key to determine model performance. In order to ensure the generalization of the model, a large amount of sample data is generally required to be used to train the model, which makes the computation amount of model training very large.
Disclosure of Invention
The embodiment of the specification provides a method, a device, a computing device and a medium for accelerating modeling of the computing device.
In a first aspect, an embodiment of the present specification provides a method for accelerating modeling of a computing device, including: in the training process of a target model, vector division is respectively carried out on model parameters and respective feature data of each training sample to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training; calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample; and obtaining a trained target model based on the target value of the training sample in each iteration training process.
In a second aspect, an embodiment of the present specification provides an apparatus for accelerating modeling of a computing device, including: the vector division module is used for respectively carrying out vector division on model parameters and respective feature data of each training sample in the training process of a target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training; the multiplication and addition module is used for calling a preset vector floating point multiplication and addition instruction for a training sample in each iteration training process, and carrying out multiplication and addition processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample; and the model determining module is used for obtaining a trained target model based on the target value of the training sample in each iteration training process.
In a third aspect, an embodiment of the present specification provides a computing device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of accelerating modeling of a computing device as provided by the first aspect described above when executing the program.
In a fourth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method for accelerating modeling of a computing device provided in the first aspect.
In the method for accelerating modeling of computer equipment provided in one embodiment of the present specification, in a training process of a target model, a model parameter and respective feature data of each training sample are respectively subjected to vector division, then, for the training sample in each iteration training process, a preset vector floating point multiply-add instruction is called, a parameter vector sequence and a feature vector sequence obtained through vector division are subjected to multiply-add processing, so as to obtain a target value of the training sample, and further obtain a trained target model. Therefore, the characteristic data and the model parameters are vectorized, and then a plurality of multiplication and addition calculations in the model training process can be completed by calling the vector floating point multiplication and addition instruction once, so that the times of independently calling the multiplication instruction and the addition instruction are greatly reduced, namely, the number of calculation instructions required in the model training process is greatly reduced, the modeling speed of the computing equipment can be effectively improved, the time consumed by modeling is reduced, the modeling efficiency is improved, the model can be quickly put into use while the model performance is ensured, the occupation of the calculation resources in the computing equipment in the modeling process is greatly reduced, the internal resource management of the computing equipment is optimized, the computing equipment can process more calculation tasks, and the processing efficiency is improved.
Drawings
FIG. 1 is a flow chart of a method for accelerating modeling of a computing device provided in a first aspect of an embodiment of the present description;
FIG. 2 is a block diagram of an apparatus for accelerating modeling of a computing device according to a second aspect of an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a computing device provided in a third aspect of an embodiment of the present specification.
Detailed Description
In order to better understand the technical solutions provided by the embodiments of the present specification, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations on the technical solutions of the embodiments of the present specification, and the technical features in the embodiments and examples of the present specification may be combined with each other without conflict. In the embodiments of the present specification, the term "plurality" means "two or more", that is, includes two or more cases; the term "and/or" is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In this embodiment, the vector floating-point multiply-add instruction is an instruction that can perform a floating-point multiply on all corresponding elements (single-precision floating-point number or double-precision floating-point number) in a vector and perform a floating-point add on a multiplication result. For example, for a Vector floating point Multiply Add instruction VFMADD (Vector Fused Multiply Add), where the Vector length is n, an instruction R = VFMADD (a, B, C) can calculate n elements of the vectors a, B, C to obtain a Vector R, and specifically: r is g =a g ·b g +c g Wherein g is an integer of 0 to n-1, a g ,b g ,c g ,r g Corresponding to the g-th dimension elements in the vectors A, B, C and R.
It should be noted that, in the embodiment of the present disclosure, a specifically used vector floating point multiply-add instruction is not limited, and is determined specifically according to a vector floating point multiply-add instruction supported by a computing device executing model training, for example, an Intel vfmad instruction or an ARM vmla instruction may be used, or another vector floating point multiply-add instruction capable of implementing the above functions.
In the training process of linear machine learning models such as linear regression or logistic regression, there are two main time-consuming computations, one is the Hypothesis function (hypthesis function) h θ (X), wherein theta represents a model parameter, X represents a feature vector, and the other is the calculation of an updated model parameter theta' in the gradient descent process. The embodiment of the specification of the application provides a method for accelerating the modeling of computing equipment, which is characterized in that in the training process of a target model, vector division is respectively carried out on model parameters and respective characteristic data of each training sample to obtain the parameters of the model parametersThe method comprises the steps of vector sequences and respective characteristic vector sequences of each training sample, wherein the training process of a target model comprises multiple rounds of iterative training; then, aiming at the training samples in each iteration training process, calling a preset vector floating point multiply-add instruction, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain target values of the training samples; then, based on the target value of the training sample in each iteration training process, a trained target model is obtained.
Therefore, the characteristic data and the model parameters are vectorized, and then the target value needing a large number of multiply-add calculations in the model training process is calculated through the vector floating point multiply-add instructions, so that the number of the calculation instructions needed in the model training process is greatly reduced, the modeling speed of the computing equipment can be effectively improved, the time consumed by modeling is reduced, the modeling efficiency is improved, the model can be quickly put into use while the model performance is ensured, the occupation of the calculation resources in the computing equipment in the modeling process is greatly reduced, and the processing efficiency of the computing equipment is improved.
In a first aspect, fig. 1 shows a flowchart of a method for accelerating modeling of a computing device, which is provided by an embodiment of the present specification, and is applied to a computing device supporting the vector floating-point multiply-add instruction. Referring to fig. 1, the method may include at least the following steps S100 to S104.
And S100, in the training process of the target model, vector division is respectively carried out on the model parameters and the respective characteristic data of each training sample to obtain a parameter vector sequence of the model parameters and a respective characteristic vector sequence of each training sample.
In the embodiments of the present disclosure, the target model may be a linear machine model such as a linear regression model or a logistic regression model. Of course, in other embodiments of the present disclosure, the target model may be other suitable machine learning models, for example, the assumption function calculation includes the pair θ T A machine learning model of the computation of X. It can be understood that the number of model parameters in the target model is the same as the number of features included in the feature data of the training samples. To perform subsequent vector floating-point multiply-addAnd after the instruction and the vector division processing, the number of the feature vectors in the obtained feature vector sequence is the same as that of the parameter vectors in the parameter vector sequence, and the dimension of each feature vector is the same as that of each parameter vector.
In practical application, the training samples and the feature data of the training samples are determined according to the application scenario of the target model. For example, the target model is applied to predict the credit score of the user, the training sample may be the user, and the feature data may include personal information of the user, such as portrait information, payment information, and the like. Of course, the object model may be applied to other applicable application scenarios, and is not exemplified herein.
In a specific implementation process, before vector division is performed, the vector dimension of each divided vector needs to be determined, and then vector division can be performed on the feature data and the model parameters according to the vector dimension. It should be noted that the vector dimensions of the feature vector and the parameter vector obtained by dividing should be consistent with the vector dimensions supported by the preset vector floating point multiply-add instruction. Therefore, in an alternative embodiment, the implementation of vector partitioning for the model parameters and the respective feature data of each training sample separately may include: obtaining a vector dimension supported by a vector floating point multiply-add instruction; based on the vector dimension, performing vector division on the model parameters to obtain m n-dimensional parameter vectors to form the parameter vector sequence, and performing vector division on the respective characteristic data of each training sample to obtain m n-dimensional characteristic vectors to form the characteristic vector sequence. Wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
Specifically, in an application scenario, if the feature data includes a number of features greater than a preset vector dimension n supported by the vector floating-point multiply-add instruction, the number of vector partitions is greater than or equal to 2, that is, m is greater than or equal to 2. At this time, the performing process of separately performing vector division on the model parameters based on the vector dimension n to obtain m n-dimensional parameter vectors, and performing vector division on the respective feature data of each training sample to obtain m n-dimensional feature vectors may include: determining a vector division number m based on the vector dimension n supported by the vector floating point multiply-add instruction and the characteristic number; dividing the number m and the vector dimension n according to the determined vector, and constructing m n-dimensional first initial vectors and m n-dimensional second initial vectors; and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors. It should be noted that the model parameters and the feature data are divided into vectors in the same manner, that is, the values are assigned sequentially according to the same preset sequence.
In addition, in the process of vector division of the respective feature data of each training sample, one feature in the feature data is divided into one feature vector, and the features in the same feature vector and in different feature vectors are different. Similarly, in the process of dividing the model parameters into vectors, one model parameter is divided into one parameter vector, and the model parameters contained in the same parameter vector and different parameter vectors are different.
And in the vector division process, if the number of elements in the feature vector and the parameter vector is not full, namely the number of features contained in one divided feature vector is less than the vector dimension supported by the preset vector floating point multiply-add instruction, and the number of model parameters contained in one parameter vector is less than the vector dimension supported by the preset vector floating point multiply-add instruction, assigning the elements which are not full in the feature vector and the parameter vector to be preset values. Taking the above example as an example, one of the feature vectors can only contain 3 features, and when the vector dimension supported by the floating-point multiply-add instruction is less than 5, other two elements in the feature vector need to be assigned as preset values. The same is true for the vector partitioning of the model parameters. The preset value is set according to a specific calculated target value, for example, the preset value may be 0 when the target value is an assumed function value, and the preset value may be 0 or another specified value when the target value is a model parameter value in a gradient descent process.
It should be further noted that, in the vector division process, the division order, that is, the preset order, is not limited, and is specifically set according to actual needs, so that any one feature is not repeatedly divided into a plurality of feature vectors and any one model parameter is not repeatedly divided into a plurality of feature vectors.
For example, assume that the feature data includes 18 feature numbers, each of which is denoted by x 0 To x 17 The number of model parameters is also 18, respectively denoted as θ 0 To theta 17 If the number of dimensions of the vector supported by the predetermined vector floating-point multiply-add instruction is 5, the feature data may be divided into 4 feature vectors. In particular, can be selected from x 0 Begin to divide the feature data into four feature vectors in order from front to back, i.e., x 0 To x 4 Dividing into the first feature vector of the feature vector sequence, and dividing x into 5 To x 9 Dividing into the second feature vector of the feature vector sequence, dividing x into 10 To x 14 Dividing the obtained feature vector into a third feature vector of the feature vector sequence, and dividing x into 15 To x 17 And dividing the model parameters into 4 parameter vectors in the same way correspondingly. Alternatively, the reverse from x is also possible 17 The feature data is initially divided into four feature vectors in a back-to-front order, and correspondingly the model parameters are also divided into 4 parameter vectors in the same way. Alternatively or additionally, x may be in other orders, e.g. 0 、x 2 、x 4 、x 6 、x 8 Dividing into the first feature vector of the feature vector sequence, and dividing x into 10 、x 12 、x 14 、x 16 、x 1 Dividing into the second feature vector of the feature vector sequence, dividing x into 3 、x 5 、x 7 、x 9 、x 11 Dividing the obtained feature vector into a third feature vector of the feature vector sequence, and dividing x into 13 、x 15 、x 17 Dividing the model parameters into a fourth feature vector of the feature vector sequence, and carrying out vector division on the model parameters according to the same sequence.
In addition, in an application scenario, if the number of features included in the feature data is less than or equal to a vector dimension supported by the preset vector floating point multiply-add instruction, both the feature vector sequence and the parameter vector sequence include a vector. Specifically, if the number of features included in the feature data is smaller than the vector dimension supported by the vector floating-point multiply-add instruction, the unsatisfied elements in the feature vectors after feature division need to be assigned as preset values, and if the number of features included in the feature data is 6 and the vector dimension supported by the vector floating-point multiply-add instruction is 10, the unsatisfied elements need to be assigned as preset values. And if the number of the features contained in the feature data is equal to the vector dimension supported by the vector floating point multiply-add instruction, the features contained in the feature data can be just divided into one feature vector. The same is true for the vector partitioning of the model parameters.
In the specific implementation process, it is assumed that the feature number included in the feature data of the training sample is DIM, and the vector dimension corresponding to the supported vector floating-point multiply-add instruction is n. It is understood that DIM is just an example of a variable representation of a feature quantity, and other variable names commonly used to represent quantities, such as M, N, etc., may be used instead. In one embodiment, this may be represented by the following equation:
m=[(DIM+n-1)/n]
and determining the division number of the feature vectors and the parameter vectors. That is, the number m of vector partitions is obtained by dividing the value obtained by adding the feature number DIM to the vector dimension n and subtracting 1 by the vector dimension n and then rounding. For example, assuming n is 3 and dim is 10, m =4. Alternatively, in other embodiments of the present specification, the division number of the feature vectors and the parameter vectors may be obtained by rounding DIM/n and then adding 1.
It is understood that the training process of the target model includes multiple rounds of iterative training, and after completing the feature data of the training samples and the vector division of the model parameters, the following step S102 may be performed.
And step S102, calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample.
Specifically, a preset vector floating point multiply-add instruction may be called for each training sample in each iteration training process, and a parameter vector sequence and a feature vector sequence of the training sample may be subjected to multiply-add processing to obtain a target value of the training sample. Alternatively, in other embodiments of the present specification, the step S102 may be performed only on a part of training samples in each iteration training process to obtain the target value of each training sample in the part of training samples.
It can be understood that the target value is a value obtained by performing multiply-add processing based on the feature data of the training sample and the model parameter in the iterative training process. For example, the target value may be a hypothetical function h θ (X), and/or updating the calculated value of the parameter θ'. It can be appreciated that the hypothesis function h in the linear machine learning model θ The calculation of (X) includes the pair of θ T And calculating X, which may be obtained based on the result of performing the multiply-add process on the parameter vector sequence and the feature vector sequence, and a specific process will be described below.
For example, an exemplary linear regression model has the hypothetical function: h is θ (X)=θ T X, a hypothetical function of an exemplary logistic regression model is:
Figure BDA0002328104190000071
for another example, in an application scenario, the application may be implemented by:
Figure BDA0002328104190000072
and calculating a gradient descent updating parameter, wherein alpha is a learning rate, NUM is the number of samples of each iteration, and Y is a sample label.
The following description will mainly take two kinds of target values as examples, and details the calculation process of the target values. Of course, in the implementation process, the target value may also be other suitable calculation parameter values in the model training process, which is not limited herein.
In an alternative embodiment of the present disclosure, the target value may include a hypothetical function value, for example, a hypothetical function value when the target model is a linear regression model, or a hypothetical function value when the target model is a logistic regression model. In this case, in step S102, the step of invoking a preset vector floating point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain the target value of the training sample may include: calling a vector floating point multiply-add instruction, sequentially dividing a parameter vector arranged at the ith position in a parameter vector sequence, a feature vector arranged at the ith position in the feature vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, and taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i can be an integer between 0 and m-1, for example, m-1 can be taken from 0, and m is the number of the parameter vectors in the parameter vector sequence; then, after traversing the parameter vector sequence and the feature vector sequence, accumulating the elements in the current result vector, and obtaining a hypothesis function value of the training sample based on the accumulation result; the hypothesis function value is used as the target value of the training sample.
Specifically, for the training samples in each iteration of training process, the first-ranked parameter vector in the parameter vector sequence may be used as the current first vector θ 0 The feature vector X arranged first in the feature vector sequence 0 As the current second vector, using the preset initial vector as the current third vector R 0
Further, a vector multiply add step is performed: and carrying out vector multiplication and addition processing on the current first vector, the current second vector and the current third vector by using a vector floating point multiplication and addition instruction to obtain a current result vector. For example, it can be expressed as R = VFMADD (θ) 0 ,X 0 ,R 0 )。
Then, the next parameter vector in the parameter vector sequence is used as the current first vector theta 1 Taking the next feature vector in the feature vector sequence as the current second vector X 1 Taking the current result vector R as the current third vector R 1 And repeating the vector multiplication and addition steps, and so on until all vectors in the parameter vector sequence and the feature vector sequence are traversed. At this time, the preset value is 0, and the elements in the current result vector obtained in the last round are accumulated to obtain θ T And the value of X is further substituted into the hypothesis function, so that the hypothesis function value of the training sample can be obtained.
That is, the above multi-round multiply-add process can be expressed as:
R=VFMADD(θ i ,X i ,R)
where R represents the current result vector, θ i Representing the parameter vector arranged at the i-th position in the sequence of parameter vectors, X i Representing the feature vector arranged at the ith position in the feature vector sequence. And the initial value of R is a preset initial vector, the dimension of the initial vector is the same as the dimension of the characteristic vector and the dimension of the parameter vector, and the assignment of each element in the initial vector is 0.
And then, accumulating all elements in the current result vector obtained in the last round according to the following formula:
Figure BDA0002328104190000091
wherein n is the vector dimension supported by the predetermined vector floating point multiply-add instruction, i.e. the dimension of the eigenvector and the parameter vector, r i Is the ith element in the current result vector.
It can be understood that, assuming that the vector dimension supported by the vector floating-point multiply-add instruction is n, a computation that requires n multiply instructions and n add instructions can be completed by calling one vector floating-point multiply-add instruction. Therefore, in the above calculation of θ T In the process of X, compared with the case that multiplication instructions and addition instructions are used for all model parameters and feature data, the embodiment of the present specification performs the multiplication and addition instructions on the model parameters and the feature data firstVector division is carried out, then vector floating point multiply-add instructions are called to carry out multiply-add processing on the divided vectors, and time consumption can be calculated T The number of computing instructions needed by X is reduced to be close to 1/2n, and the occupation of computing resources of a computing device by a modeling process is greatly reduced.
In an alternative embodiment of the present specification, the target value may include an updated parameter value during the gradient descent. At this time, in step S102, invoking a preset vector floating point multiply-add instruction, and performing multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of the training sample may include: and calling a vector floating point multiply-add instruction, carrying out multiply-add processing on the gradient coefficient vector which is obtained in advance, the feature vector which is arranged at the jth position in the feature vector sequence and the parameter vector which is arranged at the jth position in the parameter vector sequence before descending to obtain a parameter vector sequence after descending, and taking the model parameter in the parameter vector sequence after descending as the target value of the training sample. Wherein j may be an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence. j takes m values in total between 0 and m-1 respectively, and then the multiplication and addition processing processes can be executed for m times respectively, so that the parameter vector sequence after being reduced is obtained. It should be noted that, in the embodiment of the present specification, a parameter updating manner adopted by the model training is not limited, and for example, the parameter updating manner may be applied to any one of full-batch, mini-batch, or SGD (Stochastic Gradient Descent).
Of course, the above-mentioned multiply-add process needs to be performed first to obtain the gradient coefficient vector. Specifically, the implementation process of obtaining the gradient coefficient vector may include: acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
Assuming that the gradient coefficient vector is denoted as a, the dimension of a coincides with the dimension of the parameter vector as well as the dimension of the feature vector. The sequence of the parameter vectors before descent is:
Figure BDA0002328104190000092
the feature vector sequence of the current training sample is: { X 0 ,X 1 ,…,X m-1 And if yes, invoking a vector floating point multiply add instruction, namely:
θ′ k+1 j =VFMADD(A,X j ,θ′ k j )
wherein, theta 'on the left side of equal sign' k+1 j Is the parameter vector arranged at the j-th bit in the parameter vector sequence of the next moment, and is theta 'to the right of the equal sign' k j For the parameter vector arranged at the j-th position in the parameter vector sequence at the current time, X j Representing the feature vector arranged at the j-th bit in the feature vector sequence. Calling m times of vector floating point multiply-add instructions respectively to the gradient vector A and X in the feature vector sequence of the current training sample j And theta 'in the current time-of-day parameter vector sequence' k j By performing the multiply-add processing, a parameter vector sequence at the next moment can be obtained, which can be expressed as { theta' k+1 0 ,θ′ k+1 1 ,…,θ′ k+1 m-1 And (4) rapidly obtaining the value of each model parameter at the next moment, taking the parameter vector in the parameter vector sequence at the next moment as the current moment parameter vector sequence of the next training sample in the current iteration, and so on until all the training samples used in the current iteration training are traversed. Then, the updated model parameters can be used as the model parameters for the next round of iterative training.
For example, in an application scenario, the reduced model parameters can be obtained by the following formula:
Figure BDA0002328104190000101
wherein the content of the first and second substances,
Figure BDA0002328104190000102
assigning values to each element of gradient coefficient vector to be constructedIs->
Figure BDA0002328104190000103
And then, obtaining a parameter vector sequence of the next moment according to the formula, namely obtaining the value of each model parameter of the next moment.
When the number of the model parameters included in the parameter vector is less than n (assuming that the vector length applicable to the vector floating-point multiply-add instruction is n), the unsatisfied elements are assigned as the preset values, but the elements assigned as the preset values are not real model parameters and are not considered when the model parameters are updated.
It can be understood that, assuming that the vector dimension supported by the vector floating-point multiply-add instruction is n, a computation that requires n multiply instructions and n add instructions can be completed by calling one vector floating-point multiply-add instruction. Therefore, in the process of calculating the updated model parameter θ ', compared with the process of adopting the multiplication instruction and the addition instruction for all the model parameters and the feature data, in the embodiment of the present specification, the model parameters and the feature data are vector-divided, and then the vector floating point multiplication and addition instruction is called to calculate the updated model parameter θ ', so that the number of calculation instructions required for time-consuming calculation θ ' can be reduced to nearly 1/2n, and the occupation of the calculation resources of the calculation device in the modeling process is greatly reduced.
In the specific implementation process, the feature data of the training sample and the model parameters of the target model can be subjected to vector division according to actual needs, and then the hypothesis function h is subjected to vector division θ In the calculation process of (X), and/or in the gradient descending process, in the calculation process of the updated model parameter theta', a preset vector floating point multiply-add instruction is called, so that the number of required calculation instructions for main time-consuming calculation in the model training process is greatly reduced, the modeling speed can be effectively increased, the modeling efficiency is improved, the occupation of the calculation resources of the calculation equipment in the modeling process is reduced, the internal resource management of the calculation equipment can be optimized, the calculation equipment can process more calculation tasks, and the processing efficiency is improved.
After the target values of the training samples are obtained, the following step S104 may be continuously performed to continue training the target model with the target values of the training samples.
And step S104, obtaining a trained target model based on the target value of the training sample in each iteration training process.
After the target value is obtained by the calculation in step S102, the target value may be used in subsequent calculations in the training process, such as calculation of a loss function value, until the training is completed, and a trained target model is obtained for use. It should be noted that the process of using the target value in the subsequent calculation of the training process to obtain the trained target model is the same as the implementation process of the existing model training, and therefore, the detailed description is omitted.
The method for accelerating the modeling of the computing equipment provided by the embodiment of the specification can complete multiple multiplication and addition calculations in the model training process by vector division of the feature data and the model parameters and then calling the vector floating point multiplication and addition instruction once, so that the times of calling the multiplication instruction and the addition instruction independently are greatly reduced, namely the number of the calculation instructions required in the model training process is greatly reduced, the modeling speed of the computing equipment can be effectively improved, the time consumed by modeling is reduced, the modeling efficiency is improved, the model can be rapidly put into use while the model performance is ensured, the occupation of the computing resources in the computing equipment in the modeling process is greatly reduced, the internal resource management of the computing equipment is optimized, and the computing equipment can process more computing tasks, thereby improving the processing efficiency.
In a second aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing first aspect, an embodiment of the present specification further provides an apparatus for accelerating modeling of a computing device, which is run on a computing device supporting a vector floating-point multiply-add instruction. As shown in fig. 2, the apparatus 20 includes:
the vector division module 21 is configured to perform vector division on the model parameters and the respective feature data of each training sample in a training process of the target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, where the training process of the target model includes multiple rounds of iterative training;
the multiply-add module 22 is configured to invoke a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of the training sample;
and the model determining module 23 is configured to obtain a trained target model based on the target values of the training samples in each iteration training process.
In an alternative embodiment, the vector dividing module 21 includes:
an obtaining submodule 211, configured to obtain a vector dimension supported by the vector floating-point multiply-add instruction;
the partitioning sub-module 212 is configured to perform vector partitioning on the model parameters based on the vector dimensions to obtain m n-dimensional parameter vectors, and form the parameter vector sequence, and perform vector partitioning on the feature data of each training sample to obtain m n-dimensional feature vectors, and form the feature vector sequence, where m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
In an alternative embodiment, the partitioning sub-module is configured to:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the vector division number based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
In an optional embodiment, in the process of vector partitioning of the feature data of each training sample, features included in the same feature vector and different feature vectors are different, and in the process of vector partitioning of the model parameters, model parameters included in the same parameter vector and different parameter vectors are different.
In an alternative embodiment, the apparatus 20 further comprises:
and the assignment module assigns the characteristic vector and elements which are not full in the parameter vector to preset values if the characteristic number contained in the characteristic vector is less than the vector dimension supported by the vector floating point multiply-add instruction and the model parameter number contained in the parameter vector is less than the vector dimension supported by the vector floating point multiply-add instruction in the vector division process.
In an alternative embodiment, the multiply-add module 22 includes:
the first processing sub-module 221 is configured to invoke the vector floating point multiply-add instruction, sequentially divide the parameter vector arranged at the ith bit in the parameter vector sequence, the feature vector arranged at the ith bit in the feature vector sequence, and a preset initial vector to perform multiply-add processing, obtain a current result vector, use the current result vector as the initial vector of the next multiply-add processing, and execute the next multiply-add processing, where i is an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence;
the second processing sub-module 222 is configured to, after traversing the parameter vector sequence and the feature vector sequence, perform accumulation processing on elements in the current result vector, obtain an assumed function value of the training sample based on an accumulation result, and use the assumed function value as a target value of the training sample.
In an alternative embodiment, the multiply-add module 22 includes:
the third processing sub-module 223 is configured to invoke the vector floating point multiply-add instruction, perform multiply-add processing on a gradient coefficient vector obtained in advance, a feature vector arranged at the jth position in the feature vector sequence, and a parameter vector arranged at the jth position in the parameter vector sequence before descent, to obtain a parameter vector sequence after descent, and use a model parameter in the parameter vector sequence after descent as a target value of the training sample, where j is an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence.
In an alternative embodiment, the above-mentioned multiplication and addition module 22 further includes:
the construction submodule is used for acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
In an alternative embodiment, the target model is a linear machine learning model, and the linear machine learning model includes a linear regression model and a logistic regression model.
It should be noted that, in the apparatus 20 for accelerating computing device modeling provided in the embodiment of the present specification, the specific manner in which each module performs operations has been described in detail in the method embodiment provided in the foregoing first aspect, and the specific implementation process may refer to the method embodiment provided in the foregoing first aspect, which will not be described in detail here.
In a third aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing embodiments, the present specification further provides a computing device supporting the use of a vector floating-point multiply-add instruction, such as a vfmad instruction of Intel. As shown in fig. 3, comprising a memory 304, one or more processors 302 and a computer program stored on the memory 304 and executable on the processors 302, the processor 302 when executing the program implementing the steps of any of the embodiments of the method of accelerating modeling of a computing device as provided in the previous first aspect.
Wherein in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 305 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
It will be appreciated that the configuration shown in FIG. 3 is merely illustrative and that embodiments of the present description provide a computing device that may also include more or fewer components than shown in FIG. 3, or have a different configuration than shown in FIG. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.
In a fourth aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing embodiments, the present specification embodiment further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of any of the embodiments of the method for accelerating modeling of a computing device provided in the foregoing first aspect.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The description has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims (20)

1. A method of accelerating modeling of a computing device, comprising:
in the training process of a target model, vector division is respectively carried out on model parameters and respective feature data of each training sample to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training;
calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample;
and obtaining a trained target model based on the target value of the training sample in each round of iterative training process.
2. The method of claim 1, wherein the vector partitioning of the model parameters and the respective feature data of each training sample comprises:
acquiring a vector dimension supported by the vector floating point multiply-add instruction;
based on the vector dimension, performing vector division on the model parameters to obtain m n-dimensional parameter vectors, forming the parameter vector sequence, and performing vector division on the respective feature data of each training sample to obtain m n-dimensional feature vectors, forming the feature vector sequence, wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
3. The method of claim 2, wherein the vector partitioning the model parameters based on the vector dimensions to obtain m n-dimensional parameter vectors and the vector partitioning the respective feature data of each training sample to obtain m n-dimensional feature vectors, comprises:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the number of vector partitions based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
4. The method according to claim 2, wherein the features included in the same feature vector and different feature vectors are different during the vector partitioning of the feature data of each training sample, and the model parameters included in the same parameter vector and different parameter vectors are different during the vector partitioning of the model parameters.
5. The method of claim 2, further comprising:
in the vector dividing process, if the number of the features contained in the feature vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction, and the number of the model parameters contained in the parameter vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction, the feature vector and the elements which are not full in the parameter vector are assigned to be preset values.
6. The method of claim 1, wherein the invoking a preset vector floating-point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of a training sample comprises:
calling the vector floating point multiply-add instruction, sequentially dividing the parameter vector arranged at the ith position in the parameter vector sequence, the characteristic vector arranged at the ith position in the characteristic vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence;
after traversing the parameter vector sequence and the feature vector sequence, accumulating the elements in the current result vector, obtaining an assumed function value of the training sample based on the accumulation result, and taking the assumed function value as a target value of the training sample.
7. The method according to claim 1, wherein the invoking a preset vector floating-point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of a training sample comprises:
and calling the vector floating point multiply-add instruction, carrying out multiply-add processing on a gradient coefficient vector which is obtained in advance, a characteristic vector which is arranged at the j th position in the characteristic vector sequence and a parameter vector which is arranged at the j th position in the parameter vector sequence before descending to obtain a descending parameter vector sequence, and taking a model parameter in the descending parameter vector sequence as a target value of the training sample, wherein j is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence.
8. The method according to claim 7, wherein said invoking the vector floating-point multiply-add instruction further comprises, before performing the multiply-add process on the pre-fetched gradient coefficient vector, the feature vector arranged at the j-th position in the feature vector sequence, and the parameter vector arranged at the j-th position in the parameter vector sequence before the descent, the multiply-add process further comprising:
acquiring a gradient descent coefficient of a gradient descent process in the iterative training process;
and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
9. The method of any one of claims 1-8, the target model being a linear machine learning model, the linear machine learning model comprising a linear regression model and a logistic regression model.
10. An apparatus to accelerate computing device modeling, comprising:
the vector division module is used for respectively carrying out vector division on model parameters and respective feature data of each training sample in the training process of a target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training;
the multiplication and addition module is used for calling a preset vector floating point multiplication and addition instruction for a training sample in each iteration training process, and carrying out multiplication and addition processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample;
and the model determining module is used for obtaining a trained target model based on the target value of the training sample in each iteration training process.
11. The device of claim 10, the vector partitioning module comprising:
the obtaining submodule is used for obtaining the vector dimension supported by the vector floating point multiply-add instruction;
and the division submodule is used for carrying out vector division on the model parameters based on the vector dimension to obtain m n-dimensional parameter vectors to form the parameter vector sequence, carrying out vector division on the respective characteristic data of each training sample to obtain m n-dimensional characteristic vectors to form the characteristic vector sequence, wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
12. The apparatus of claim 11, the partitioning sub-module to:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the vector division number based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
13. The apparatus according to claim 11, wherein during the vector division of the feature data of each training sample, features included in a same feature vector and different feature vectors are different, and during the vector division of the model parameters, model parameters included in a same parameter vector and different parameter vectors are different.
14. The apparatus of claim 11, further comprising:
and the assignment module assigns the feature vector and the elements which are not full in the parameter vector to preset values if the number of the features contained in the feature vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction and the number of the model parameters contained in the parameter vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction in the vector division process.
15. The apparatus of claim 10, the multiply-add module comprising:
the first processing submodule is used for calling the vector floating point multiply-add instruction, sequentially dividing the parameter vector arranged at the ith position in the parameter vector sequence, the characteristic vector arranged at the ith position in the characteristic vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence;
and the second processing submodule is used for accumulating the elements in the current result vector after traversing the parameter vector sequence and the characteristic vector sequence, obtaining an assumed function value of the training sample based on an accumulation result, and taking the assumed function value as a target value of the training sample.
16. The apparatus of claim 10, the multiply-add module comprising:
and the third processing sub-module is used for calling the vector floating point multiply-add instruction, carrying out multiply-add processing on a gradient coefficient vector acquired in advance, a characteristic vector arranged at the jth position in the characteristic vector sequence and a parameter vector arranged at the jth position in the parameter vector sequence before descending to obtain a descending parameter vector sequence, and taking a model parameter in the descending parameter vector sequence as a target value of the training sample, wherein j is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence.
17. The apparatus of claim 16, the multiply-add module further comprising:
the construction submodule is used for acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
18. The apparatus of any of claims 10-17, the target model being a linear machine learning model, the linear machine learning model comprising a linear regression model and a logistic regression model.
19. A computing device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1-9 when executing the program.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-9.
CN201911324820.5A 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment Active CN111027018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911324820.5A CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911324820.5A CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Publications (2)

Publication Number Publication Date
CN111027018A CN111027018A (en) 2020-04-17
CN111027018B true CN111027018B (en) 2023-03-31

Family

ID=70212182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911324820.5A Active CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Country Status (1)

Country Link
CN (1) CN111027018B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN109062609A (en) * 2018-02-05 2018-12-21 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing operational order
US10167800B1 (en) * 2017-08-18 2019-01-01 Microsoft Technology Licensing, Llc Hardware node having a matrix vector unit with block-floating point processing
CN109492761A (en) * 2018-10-30 2019-03-19 深圳灵图慧视科技有限公司 Realize FPGA accelerator, the method and system of neural network
CN109661647A (en) * 2016-09-13 2019-04-19 Arm有限公司 The multiply-add instruction of vector
CN110488278A (en) * 2019-08-20 2019-11-22 深圳锐越微技术有限公司 Doppler radar signal kind identification method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10146738B2 (en) * 2016-12-31 2018-12-04 Intel Corporation Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data
US11216722B2 (en) * 2016-12-31 2022-01-04 Intel Corporation Hardware accelerator template and design framework for implementing recurrent neural networks
US11132599B2 (en) * 2017-02-28 2021-09-28 Microsoft Technology Licensing, Llc Multi-function unit for programmable hardware nodes for neural network processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN109661647A (en) * 2016-09-13 2019-04-19 Arm有限公司 The multiply-add instruction of vector
US10167800B1 (en) * 2017-08-18 2019-01-01 Microsoft Technology Licensing, Llc Hardware node having a matrix vector unit with block-floating point processing
CN109062609A (en) * 2018-02-05 2018-12-21 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing operational order
CN109492761A (en) * 2018-10-30 2019-03-19 深圳灵图慧视科技有限公司 Realize FPGA accelerator, the method and system of neural network
CN110488278A (en) * 2019-08-20 2019-11-22 深圳锐越微技术有限公司 Doppler radar signal kind identification method

Also Published As

Publication number Publication date
CN111027018A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
US10970628B2 (en) Training neural networks represented as computational graphs
US11308398B2 (en) Computation method
EP3446260B1 (en) Memory-efficient backpropagation through time
CN110689115B (en) Neural network model processing method and device, computer equipment and storage medium
US20190370659A1 (en) Optimizing neural network architectures
CN110245741A (en) Optimization and methods for using them, device and the storage medium of multilayer neural network model
CN108292241A (en) Processing calculates figure
CN112084038B (en) Memory allocation method and device of neural network
EP3602419B1 (en) Neural network optimizer search
CN110008952B (en) Target identification method and device
US11537879B2 (en) Neural network weight discretizing method, system, device, and readable storage medium
CN111126668A (en) Spark operation time prediction method and device based on graph convolution network
CN114327844A (en) Memory allocation method, related device and computer readable storage medium
US20230222000A1 (en) Recommendations for scheduling jobs on distributed computing devices
US11275561B2 (en) Mixed precision floating-point multiply-add operation
CN107437111A (en) Data processing method, medium, device and computing device based on neutral net
CN112764893B (en) Data processing method and data processing system
CN110889497B (en) Learning task compiling method of artificial intelligence processor and related product
CN113312178A (en) Assembly line parallel training task allocation method based on deep reinforcement learning
CN115357554B (en) Graph neural network compression method and device, electronic equipment and storage medium
CN114330730A (en) Quantum line block compiling method, device, equipment, storage medium and product
CN115860081A (en) Core particle algorithm scheduling method and system, electronic equipment and storage medium
Wang et al. Towards efficient convolutional neural networks through low-error filter saliency estimation
CN110309946A (en) Logistics route method and device for planning, computer-readable medium and logistics system
CN113419931A (en) Performance index determination method and device of distributed machine learning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027999

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant