CN111027018A - Method, device, computing equipment and medium for accelerating modeling of computing equipment - Google Patents

Method, device, computing equipment and medium for accelerating modeling of computing equipment Download PDF

Info

Publication number
CN111027018A
CN111027018A CN201911324820.5A CN201911324820A CN111027018A CN 111027018 A CN111027018 A CN 111027018A CN 201911324820 A CN201911324820 A CN 201911324820A CN 111027018 A CN111027018 A CN 111027018A
Authority
CN
China
Prior art keywords
vector
parameter
sequence
feature
multiply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911324820.5A
Other languages
Chinese (zh)
Other versions
CN111027018B (en
Inventor
赵原
殷山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911324820.5A priority Critical patent/CN111027018B/en
Publication of CN111027018A publication Critical patent/CN111027018A/en
Application granted granted Critical
Publication of CN111027018B publication Critical patent/CN111027018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the specification provides a method, a device, computing equipment and a medium for accelerating modeling of computing equipment, wherein in the training process of a target model, vector division is performed on model parameters and respective feature data of each training sample to realize vectorization of the model parameters and the feature data, then, a preset vector floating point multiply-add instruction is called for the training sample in each iteration training process, and a parameter vector sequence and a feature vector sequence obtained by vector division are subjected to multiply-add processing to obtain a target value of the training sample; further, a trained target model is obtained. For example, in a personalized recommendation scenario, the feature data may be personal information of the user, such as portrait information of the user.

Description

Method, device, computing equipment and medium for accelerating modeling of computing equipment
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a method, a device, computing equipment and a medium for accelerating modeling of computing equipment.
Background
With the development of computer technology, the use of Artificial Intelligence (AI) has become more and more widespread, and the AI can be more and more mature applied to various scenes such as security scenes, financial scenes, personalized recommendation scenes, and the like. To construct an AI model, the model needs to be trained and then tested before being put into use. Among them, model training is the key to determine model performance. In order to ensure the generalization of the model, a large amount of sample data is generally required to be used to train the model, which makes the computation of model training very large.
Disclosure of Invention
The embodiment of the specification provides a method, a device, a computing device and a medium for accelerating modeling of the computing device.
In a first aspect, an embodiment of the present specification provides a method for accelerating modeling of a computing device, including: in the training process of a target model, vector division is respectively carried out on model parameters and respective feature data of each training sample to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training; calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample; and obtaining a trained target model based on the target value of the training sample in each round of iterative training process.
In a second aspect, an embodiment of the present specification provides an apparatus for accelerating modeling of a computing device, including: the vector division module is used for respectively carrying out vector division on model parameters and respective feature data of each training sample in the training process of a target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training; the multiplication and addition module is used for calling a preset vector floating point multiplication and addition instruction for a training sample in each iteration training process, and carrying out multiplication and addition processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample; and the model determining module is used for obtaining a trained target model based on the target value of the training sample in each iteration training process.
In a third aspect, an embodiment of the present specification provides a computing device, including: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of accelerating modeling of a computing device as provided by the first aspect described above when executing the program.
In a fourth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method for accelerating modeling of a computing device provided in the first aspect.
In the method for accelerating modeling of computer equipment provided in one embodiment of the present specification, in a training process of a target model, a model parameter and respective feature data of each training sample are respectively subjected to vector division, then, for the training sample in each iteration training process, a preset vector floating point multiply-add instruction is called, a parameter vector sequence and a feature vector sequence obtained through vector division are subjected to multiply-add processing, so as to obtain a target value of the training sample, and further obtain a trained target model. Therefore, the characteristic data and the model parameters are vectorized, and then a plurality of multiplication and addition calculations in the model training process can be completed by calling the vector floating point multiplication and addition instruction once, so that the times of independently calling the multiplication instruction and the addition instruction are greatly reduced, namely, the number of calculation instructions required in the model training process is greatly reduced, the modeling speed of the computing equipment can be effectively improved, the time consumed by modeling is reduced, the modeling efficiency is improved, the model can be quickly put into use while the model performance is ensured, the occupation of the calculation resources in the computing equipment in the modeling process is greatly reduced, the internal resource management of the computing equipment is optimized, the computing equipment can process more calculation tasks, and the processing efficiency is improved.
Drawings
FIG. 1 is a flow chart of a method for accelerating modeling of a computing device provided in a first aspect of an embodiment of the present description;
FIG. 2 is a block diagram of an apparatus for accelerating modeling of a computing device according to a second aspect of an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a computing device provided in a third aspect of an embodiment of the present specification.
Detailed Description
In order to better understand the technical solutions provided by the embodiments of the present specification, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations on the technical solutions of the embodiments of the present specification, and the technical features in the embodiments and examples of the present specification may be combined with each other without conflict. In the embodiments of the present specification, the term "plurality" means "two or more", that is, includes two or more cases; the term "and/or" is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In this embodiment, the vector floating-point multiply-add instruction is an instruction that can perform a floating-point multiply on all corresponding elements (single-precision floating-point number or double-precision floating-point number) in a vector and perform a floating-point add on a multiplication result. For example, for a vector floating point Multiply add command VFMADD (vector Fused Multiply add), where the vector length is n, a command R ═ VFMADD (a, B, C) can calculate n elements of vectors a, B, C to obtain a vector R, and specifically: r isg=ag·bg+cgWherein g is an integer of 0 to n-1, ag,bg,cg,rgCorresponding to the g-th dimension elements in the vectors A, B, C and R.
It should be noted that, in the embodiment of the present disclosure, a specifically used vector floating point multiply-add instruction is not limited, and is determined specifically according to a vector floating point multiply-add instruction supported by a computing device executing model training, for example, an Intel vfmad instruction or an ARM vmla instruction may be used, or another vector floating point multiply-add instruction capable of implementing the above functions.
In the training process of linear machine learning models such as linear regression or logistic regression, there are two main time-consuming computations, one is the Hypothesis function (hypthesis function) hθ(X), wherein theta represents a model parameter, X represents a feature vector, and the other is the calculation of an updated model parameter theta' in the gradient descent process. The embodiment of the application provides a method for accelerating the modeling of computing equipment, which comprises the steps of respectively carrying out vector division on model parameters and respective feature data of each training sample in the training process of a target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training; then, aiming at the training samples in each iteration training process, calling a preset vector floating point multiply-add instruction, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain target values of the training samples; then, based on the target value of the training sample in each iteration training process, a trained target model is obtained.
Therefore, the characteristic data and the model parameters are vectorized, and then the target value needing a large amount of multiplication and addition calculation in the model training process is calculated through the vector floating point multiplication and addition instruction, so that the number of calculation instructions needed in the model training process is greatly reduced, the modeling speed of the computing equipment can be effectively improved, the time consumed by modeling is reduced, the modeling efficiency is improved, the model can be quickly put into use while the model performance is ensured, the occupation of the calculation resources in the computing equipment in the modeling process is greatly reduced, and the processing efficiency of the computing equipment is improved.
In a first aspect, fig. 1 shows a flowchart of a method for accelerating modeling of a computing device, which is provided by an embodiment of the present specification, and is applied to a computing device supporting the vector floating-point multiply-add instruction. Referring to fig. 1, the method may include at least the following steps S100 to S104.
And S100, in the training process of the target model, vector division is respectively carried out on the model parameters and the respective characteristic data of each training sample to obtain a parameter vector sequence of the model parameters and a respective characteristic vector sequence of each training sample.
In the embodiments of the present disclosure, the target model may be a linear machine model such as a linear regression model or a logistic regression model. Of course, in other embodiments of the present disclosure, the target model may be other suitable machine learning models, for example, assuming that the function calculation includes the pair θTA machine learning model of the computation of X. It is understood that the number of model parameters in the target model is the same as the number of features contained in the feature data of the training sample. In order to execute a subsequent vector floating point multiply-add instruction, after vector division processing, the number of the feature vectors in the obtained feature vector sequence is the same as the number of the parameter vectors in the parameter vector sequence, and the dimension of each feature vector is the same as the dimension of each parameter vector.
In practical application, the training samples and the feature data of the training samples are determined according to the application scenario of the target model. For example, the target model is applied to predict the credit score of the user, the training sample may be the user, and the feature data may include personal information of the user, such as portrait information, payment information, and the like. Of course, the object model may be applied to other applicable application scenarios, and is not exemplified herein.
In a specific implementation process, before vector division is performed, the vector dimension of each divided vector needs to be determined, and then vector division can be performed on the feature data and the model parameters according to the vector dimension. It should be noted that the vector dimensions of the feature vector and the parameter vector obtained by dividing should be consistent with the vector dimensions supported by the preset vector floating point multiply-add instruction. Therefore, in an alternative embodiment, the vector partitioning of the model parameters and the respective feature data of each training sample may include: obtaining a vector dimension supported by a vector floating point multiply-add instruction; based on the vector dimension, performing vector division on the model parameters to obtain m n-dimensional parameter vectors to form the parameter vector sequence, and performing vector division on the respective characteristic data of each training sample to obtain m n-dimensional characteristic vectors to form the characteristic vector sequence. Wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
Specifically, in an application scenario, if the feature data includes a number of features greater than a preset vector dimension n supported by the vector floating-point multiply-add instruction, the number of vector partitions is greater than or equal to 2, that is, m is greater than or equal to 2. At this time, the performing process of separately performing vector division on the model parameters based on the vector dimension n to obtain m n-dimensional parameter vectors, and performing vector division on the respective feature data of each training sample to obtain m n-dimensional feature vectors may include: determining a vector division number m based on the vector dimension n supported by the vector floating point multiply-add instruction and the characteristic number; dividing the number m and the vector dimension n according to the determined vector, and constructing m n-dimensional first initial vectors and m n-dimensional second initial vectors; and sequentially assigning the model parameters to the elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to the elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors. It should be noted that the model parameters and the feature data are divided into vectors in the same manner, that is, the values are assigned sequentially according to the same preset sequence.
In addition, in the process of vector division of the respective feature data of each training sample, one feature in the feature data is divided into one feature vector, and the features contained in the same feature vector and different feature vectors are different. Similarly, in the process of dividing the model parameters into vectors, one model parameter is divided into one parameter vector, and the model parameters contained in the same parameter vector and different parameter vectors are different.
And in the vector division process, if the number of elements in the feature vector and the parameter vector is not full, namely the number of features contained in one divided feature vector is less than the vector dimension supported by the preset vector floating point multiply-add instruction, and the number of model parameters contained in one parameter vector is less than the vector dimension supported by the preset vector floating point multiply-add instruction, assigning the elements which are not full in the feature vector and the parameter vector to be preset values. Taking the above example as an example, one of the feature vectors can only contain 3 features, and when the vector dimension supported by the floating-point multiply-add instruction is less than 5, other two elements in the feature vector need to be assigned as preset values. The same is true for the vector partitioning of the model parameters. The preset value is set according to a specific calculated target value, for example, the preset value may be 0 when the target value is an assumed function value, and the preset value may be 0 or another specified value when the target value is a model parameter value in a gradient descent process.
It should be further noted that, in the vector division process, the division order, that is, the preset order, is not limited, and is specifically set according to actual needs, so that any one feature is not repeatedly divided into a plurality of feature vectors and any one model parameter is not repeatedly divided into a plurality of feature vectors.
For example, assume that the feature data includes 18 feature numbers, each of which is denoted by x0To x17The number of model parameters is also 18, respectively denoted as θ0To theta17If the number of dimensions of the vector supported by the predetermined vector floating-point multiply-add instruction is 5, the feature data may be divided into 4 feature vectors. In particular, can be selected from x0Begin to divide the feature data into four feature vectors, i.e., x, in order from front to back0To x4Dividing into the first feature vector of the feature vector sequence, and dividing x into5To x9Dividing into the second feature vector of the feature vector sequence, dividing x into10To x14Dividing the obtained feature vector into a third feature vector of the feature vector sequence, and dividing x into15To x17Dividing the feature vector into a fourth feature vector of the feature vector sequence, and dividing two insufficient feature vectors in the fourth feature vectorThe elements are assigned to preset values and, correspondingly, the model parameters are also divided into 4 parameter vectors in the same way. Alternatively, the inverse can be used from x17The feature data is initially divided into four feature vectors in a back-to-front order, and correspondingly the model parameters are also divided into 4 parameter vectors in the same way. Alternatively or additionally, x may be in other orders, e.g.0、x2、x4、x6、x8Dividing into the first feature vector of the feature vector sequence, and dividing x into10、x12、x14、x16、x1Dividing into the second feature vector of the feature vector sequence, dividing x into3、x5、x7、x9、x11Dividing the obtained feature vector into a third feature vector of the feature vector sequence, and dividing x into13、x15、x17Dividing the model parameters into a fourth feature vector of the feature vector sequence, and carrying out vector division on the model parameters according to the same sequence.
In addition, in an application scenario, if the number of features included in the feature data is less than or equal to a predetermined vector dimension supported by the vector floating point multiply-add instruction, both the feature vector sequence and the parameter vector sequence include a vector. Specifically, if the number of features included in the feature data is smaller than the vector dimension supported by the vector floating-point multiply-add instruction, the unsatisfied elements in the feature vectors after feature division need to be assigned as preset values, and if the number of features included in the feature data is 6 and the vector dimension supported by the vector floating-point multiply-add instruction is 10, the unsatisfied elements need to be assigned as preset values. And if the number of the features contained in the feature data is equal to the vector dimension supported by the vector floating point multiply-add instruction, the features contained in the feature data can be just divided into one feature vector. The same is true for the vector partitioning of the model parameters.
In the specific implementation process, it is assumed that the feature number included in the feature data of the training sample is DIM, and the vector dimension corresponding to the supported vector floating-point multiply-add instruction is n. It is understood that DIM is only an example of a variable representation of a feature quantity, and other variable names commonly used to represent quantities may be substituted, such as M, N. In one embodiment, this may be represented by the following equation:
m=[(DIM+n-1)/n]
and determining the division number of the feature vectors and the parameter vectors. That is, the number m of vector partitions is obtained by dividing the value obtained by adding the feature number DIM to the vector dimension n and subtracting 1 by the vector dimension n and then rounding. For example, if n is 3 and DIM is 10, m is 4. Alternatively, in other embodiments of the present specification, the division number of the feature vectors and the parameter vectors may be obtained by rounding DIM/n and then adding 1.
It is understood that the training process of the target model includes multiple rounds of iterative training, and after completing the feature data of the training samples and the vector division of the model parameters, the following step S102 may be performed.
And step S102, calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample.
Specifically, a preset vector floating point multiply-add instruction may be called for each training sample in each iteration training process, and a parameter vector sequence and a feature vector sequence of the training sample may be subjected to multiply-add processing to obtain a target value of the training sample. Alternatively, in other embodiments of the present disclosure, the step S102 may be performed only on a part of training samples in each iteration of training process to obtain the target value of each training sample in the part of training samples.
It can be understood that the target value is a value obtained by performing multiply-add processing based on the feature data of the training sample and the model parameter in the iterative training process. For example, the target value may be a hypothetical function hθ(X), and/or updating the calculated value of the parameter θ'. It can be appreciated that the hypothesis function h in the linear machine learning modelθThe calculation of (X) includes the pair of θTCalculating X, wherein the calculated value can be obtained based on the result of the multiplication and addition processing of the parameter vector sequence and the feature vector sequence, and the specific process is to be carried outAs described hereinafter.
For example, an exemplary linear regression model has the hypothetical function: h isθ(X)=θTX, a hypothetical function of an exemplary logistic regression model is:
Figure BDA0002328104190000071
for another example, in an application scenario, the application may be implemented by:
Figure BDA0002328104190000072
and calculating a gradient descent updating parameter, wherein α is a learning rate, NUM is the number of samples of each iteration, and Y is a sample label.
The following description will mainly take two kinds of target values as examples, and details the calculation process of the target values. Of course, in the implementation process, the target value may also be other suitable calculation parameter values in the model training process, which is not limited herein.
In an alternative embodiment of the present disclosure, the target value may include a hypothetical function value, for example, a hypothetical function value when the target model is a linear regression model, or a hypothetical function value when the target model is a logistic regression model. At this time, in step S102, the step of invoking a preset vector floating point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain the target value of the training sample may include: calling a vector floating point multiply-add instruction, sequentially dividing a parameter vector arranged at the ith position in a parameter vector sequence, a feature vector arranged at the ith position in the feature vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, and taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i can be an integer between 0 and m-1, for example, m-1 can be obtained from 0, and m is the number of parameter vectors in the parameter vector sequence; then, after traversing the parameter vector sequence and the feature vector sequence, accumulating the elements in the current result vector, and obtaining a hypothesis function value of the training sample based on the accumulation result; the hypothesis function value is used as the target value of the training sample.
Specifically, for the training samples in each iteration of training process, the first-ranked parameter vector in the parameter vector sequence may be used as the current first vector θ0The feature vector X arranged first in the feature vector sequence0As the current second vector, using the preset initial vector as the current third vector R0
Further, a vector multiply add step is performed: and carrying out vector multiplication and addition processing on the current first vector, the current second vector and the current third vector by using a vector floating point multiplication and addition instruction to obtain a current result vector. For example, it can be expressed as R ═ VFMADD (θ)0,X0,R0)。
Then, the next parameter vector in the parameter vector sequence is used as the current first vector theta1Taking the next feature vector in the feature vector sequence as the current second vector X1Taking the current result vector R as the current third vector R1And repeating the vector multiplication and addition steps, and so on until all vectors in the parameter vector sequence and the feature vector sequence are traversed. At this time, the preset value is 0, and the elements in the current result vector obtained in the last round are accumulated to obtain θTAnd the value of X is further substituted into the hypothesis function, so that the hypothesis function value of the training sample can be obtained.
That is, the above multi-round multiply-add process can be expressed as:
R=VFMADD(θi,Xi,R)
where R represents the current result vector, θiRepresenting the parameter vector arranged at the i-th position in the sequence of parameter vectors, XiRepresenting the feature vector arranged at the ith position in the feature vector sequence. And the initial value of R is a preset initial vector, the dimension of the initial vector is the same as the dimension of the characteristic vector and the dimension of the parameter vector, and the assignment of each element in the initial vector is 0.
And then, accumulating all elements in the current result vector obtained in the last round according to the following formula:
Figure BDA0002328104190000091
wherein n is the vector dimension supported by the predetermined vector floating point multiply-add instruction, i.e. the dimension of the eigenvector and the parameter vector, riIs the ith element in the current result vector.
It can be understood that, assuming that the vector dimension supported by the vector floating-point multiply-add instruction is n, a computation that requires n multiply instructions and n add instructions can be completed by calling one vector floating-point multiply-add instruction. Therefore, in the above calculation of θTIn the process of X, compared with the case that a multiplication instruction and an addition instruction are used for all model parameters and feature data, in the embodiments of the present specification, by first performing vector division on the model parameters and the feature data, and then calling a vector floating point multiplication and addition instruction to perform multiplication and addition processing on the divided vectors, time-consuming calculation θ can be performedTThe number of computing instructions required by X is reduced to approximately 1/2n, and the occupation of computing resources of a computing device by a modeling process is greatly reduced.
In an alternative embodiment of the present disclosure, the target value may include an updated parameter value during the gradient descent. At this time, in the step S102, the step of invoking a preset vector floating point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain the target value of the training sample may include: and calling a vector floating point multiply-add instruction, carrying out multiply-add processing on the gradient coefficient vector, the characteristic vector arranged at the jth position in the characteristic vector sequence and the parameter vector arranged at the jth position in the parameter vector sequence before the descent, which are obtained in advance, so as to obtain the descended parameter vector sequence, and taking the model parameter in the descended parameter vector sequence as the target value of the training sample. Wherein j may take an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence. j takes a total of m values between 0 and m-1 respectively, and then the multiplication and addition processing process can be executed for m times respectively, so that the parameter vector sequence after the reduction is obtained. It should be noted that, in the embodiment of the present specification, a parameter updating manner adopted by the model training is not limited, and for example, the parameter updating manner may be applied to any one of full-batch, mini-batch, or SGD (Stochastic Gradient Descent).
Of course, the above-mentioned multiply-add process needs to be performed first to obtain the gradient coefficient vector. Specifically, the implementation process of obtaining the gradient coefficient vector may include: acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
Assuming that the gradient coefficient vector is denoted as a, the dimension of a coincides with the dimension of the parameter vector as well as the dimension of the feature vector. The sequence of the parameter vectors before descent is:
Figure BDA0002328104190000092
the feature vector sequence of the current training sample is: { X0,X1,…,Xm-1And if yes, invoking a vector floating point multiply add instruction, namely:
θ′k+1 j=VFMADD(A,Xj,θ′k j)
wherein, theta 'on the left side of equal sign'k+1 jIs the parameter vector arranged at the j-th bit in the parameter vector sequence of the next moment, and is theta 'to the right of the equal sign'k jFor the parameter vector arranged at the j-th position in the parameter vector sequence at the current time, XjRepresenting the feature vector arranged at the j-th bit in the feature vector sequence. Calling m times of vector floating point multiply-add instructions respectively to the gradient vector A and X in the feature vector sequence of the current training samplejAnd theta 'in the current time-of-day parameter vector sequence'k jBy performing the multiply-add processing, a parameter vector sequence at the next moment can be obtained, which can be expressed as { theta'k+1 0,θ′k+1 1,…,θ′k+1 m-1The values of the model parameters at the next moment can be quickly obtained, and then the parameter vector in the parameter vector sequence at the next moment can be used as the current moment parameter vector sequence of the next training sample in the iteration of the current roundAnd e, repeating the steps until all training samples used in the iterative training of the round are traversed. Then, the updated model parameters can be used as the model parameters for the next round of iterative training.
For example, in an application scenario, the reduced model parameters can be obtained by the following formula:
Figure BDA0002328104190000101
wherein the content of the first and second substances,
Figure BDA0002328104190000102
for gradient descent coefficients, the elements of the gradient coefficient vector to be constructed are assigned
Figure BDA0002328104190000103
And then, obtaining a parameter vector sequence of the next moment according to the formula, namely obtaining the value of each model parameter of the next moment.
When the number of the model parameters included in the parameter vector is less than n (assuming that the vector length applicable to the vector floating-point multiply-add instruction is n), the unsatisfied elements are assigned as the preset values, but the elements assigned as the preset values are not real model parameters and are not considered when the model parameters are updated.
It can be understood that, assuming that the vector dimension supported by the vector floating-point multiply-add instruction is n, a computation that requires n multiply instructions and n add instructions can be completed by calling one vector floating-point multiply-add instruction. Therefore, in the process of calculating the updated model parameter θ ', compared with the process of adopting the multiplication instruction and the addition instruction for all the model parameters and the feature data, in the embodiment of the present specification, the model parameters and the feature data are vector-divided, and then the vector floating point multiplication and addition instruction is called to calculate the updated model parameter θ ', so that the number of calculation instructions required for calculating θ ' in a time-consuming manner can be reduced to be close to 1/2n, and the occupation of the calculation resources of the calculation device in the modeling process is greatly reduced.
In the specific implementation process, the feature data of the training sample and the model parameters of the target model can be subjected to vector division according to actual needs, and then the hypothesis function h is subjected to vector divisionθIn the calculation process of (X), and/or in the gradient descent process, in the calculation process of the updated model parameter θ', a preset vector floating point multiply-add instruction is called, so that the number of required calculation instructions for mainly time-consuming calculation in the model training process is greatly reduced, thus the modeling speed can be effectively increased, the modeling efficiency is improved, and the occupation of the calculation resources of the calculation equipment in the modeling process is reduced, so that the internal resource management of the calculation equipment can be optimized, the calculation equipment can process more calculation tasks, and the processing efficiency is improved.
After the target values of the training samples are obtained, the following step S104 may be continuously performed to continue training the target model with the target values of the training samples.
And step S104, obtaining a trained target model based on the target value of the training sample in each iteration training process.
After the target value is obtained by the calculation in step S102, the target value may be used in subsequent calculations in the training process, for example, calculation of a loss function value, until the training is completed, and a trained target model is obtained for use. It should be noted that the process of using the target value in the subsequent calculation of the training process to obtain the trained target model is the same as the implementation process of the existing model training, and therefore, the detailed description is omitted.
The method for accelerating the modeling of the computing equipment provided by the embodiment of the specification can complete a plurality of multiplication and addition calculations in the model training process by vector division of the feature data and the model parameters and then calling a vector floating point multiplication and addition instruction once, thereby greatly reducing the times of calling the multiplication instruction and the addition instruction independently, namely greatly reducing the number of the calculation instructions required in the model training process, effectively improving the modeling speed of the computing equipment, reducing the time consumed by modeling, improving the modeling efficiency, being beneficial to ensuring the model performance and simultaneously enabling the model to be rapidly put into use, greatly reducing the occupation of the computing resources in the computing equipment in the modeling process, optimizing the internal resource management of the computing equipment and enabling the computing equipment to process more computing tasks, thereby improving processing efficiency.
In a second aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing first aspect, an embodiment of the present specification further provides an apparatus for accelerating modeling of a computing device, which is run on a computing device supporting a vector floating-point multiply-add instruction. As shown in fig. 2, the apparatus 20 includes:
the vector division module 21 is configured to perform vector division on the model parameters and the respective feature data of each training sample in a training process of the target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, where the training process of the target model includes multiple rounds of iterative training;
the multiplication and addition module 22 is configured to call a preset vector floating point multiplication and addition instruction for a training sample in each iteration training process, and perform multiplication and addition processing on the parameter vector sequence and the feature vector sequence to obtain a target value of the training sample;
and the model determining module 23 is configured to obtain a trained target model based on the target value of the training sample in each iteration training process.
In an alternative embodiment, the vector dividing module 21 includes:
an obtaining submodule 211, configured to obtain a vector dimension supported by the vector floating-point multiply-add instruction;
and a partitioning submodule 212, configured to perform vector partitioning on the model parameters based on the vector dimensions to obtain m n-dimensional parameter vectors, to form the parameter vector sequence, and perform vector partitioning on respective feature data of each training sample to obtain m n-dimensional feature vectors, to form the feature vector sequence, where m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
In an alternative embodiment, the partitioning sub-module is configured to:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the vector division number based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
In an optional embodiment, in the process of vector partitioning of the feature data of each training sample, features included in the same feature vector and different feature vectors are different, and in the process of vector partitioning of the model parameters, model parameters included in the same parameter vector and different parameter vectors are different.
In an alternative embodiment, the apparatus 20 further comprises:
and the assignment module assigns the characteristic vector and elements which are not full in the parameter vector to preset values if the characteristic number contained in the characteristic vector is less than the vector dimension supported by the vector floating point multiply-add instruction and the model parameter number contained in the parameter vector is less than the vector dimension supported by the vector floating point multiply-add instruction in the vector division process.
In an alternative embodiment, the multiply-add module 22 includes:
the first processing sub-module 221 is configured to invoke the vector floating point multiply-add instruction, sequentially divide the parameter vector arranged at the ith bit in the parameter vector sequence, the feature vector arranged at the ith bit in the feature vector sequence, and a preset initial vector to perform multiply-add processing, obtain a current result vector, use the current result vector as the initial vector of the next multiply-add processing, and execute the next multiply-add processing, where i is an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence;
the second processing sub-module 222 is configured to, after traversing the parameter vector sequence and the feature vector sequence, perform accumulation processing on elements in the current result vector, obtain an assumed function value of the training sample based on an accumulation result, and use the assumed function value as a target value of the training sample.
In an alternative embodiment, the multiply-add module 22 includes:
the third processing sub-module 223 is configured to invoke the vector floating point multiply-add instruction, perform multiply-add processing on a gradient coefficient vector obtained in advance, a feature vector arranged at the jth position in the feature vector sequence, and a parameter vector arranged at the jth position in the parameter vector sequence before descent, to obtain a parameter vector sequence after descent, and use a model parameter in the parameter vector sequence after descent as a target value of the training sample, where j is an integer between 0 and m-1, and m is the number of parameter vectors in the parameter vector sequence.
In an alternative embodiment, the above-mentioned multiplication and addition module 22 further includes:
the construction submodule is used for acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
In an alternative embodiment, the target model is a linear machine learning model, and the linear machine learning model includes a linear regression model and a logistic regression model.
It should be noted that, in the apparatus 20 for accelerating computing device modeling provided in the embodiment of the present specification, the specific manner in which each module performs operations has been described in detail in the method embodiment provided in the foregoing first aspect, and the specific implementation process may refer to the method embodiment provided in the foregoing first aspect, which will not be described in detail here.
In a third aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing embodiments, the present specification further provides a computing device supporting the use of a vector floating-point multiply-add instruction, such as a vfmad instruction of Intel. As shown in fig. 3, comprising a memory 304, one or more processors 302 and a computer program stored on the memory 304 and executable on the processors 302, the processor 302 when executing the program implementing the steps of any of the embodiments of the method of accelerating modeling of a computing device as provided in the previous first aspect.
Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 305 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.
It will be appreciated that the configuration shown in FIG. 3 is merely illustrative and that embodiments of the present description provide a computing device that may also include more or fewer components than shown in FIG. 3, or have a different configuration than shown in FIG. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.
In a fourth aspect, based on the same inventive concept as the method for accelerating modeling of a computing device provided in the foregoing embodiments, the present specification embodiment further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of any of the embodiments of the method for accelerating modeling of a computing device provided in the foregoing first aspect.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims (20)

1. A method of accelerating modeling of a computing device, comprising:
in the training process of a target model, vector division is respectively carried out on model parameters and respective feature data of each training sample to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training;
calling a preset vector floating point multiply-add instruction for a training sample in each iteration training process, and carrying out multiply-add processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample;
and obtaining a trained target model based on the target value of the training sample in each round of iterative training process.
2. The method of claim 1, wherein the vector partitioning of the model parameters and the respective feature data of each training sample comprises:
acquiring a vector dimension supported by the vector floating point multiply-add instruction;
based on the vector dimension, performing vector division on the model parameters to obtain m n-dimensional parameter vectors, forming the parameter vector sequence, and performing vector division on the respective feature data of each training sample to obtain m n-dimensional feature vectors, forming the feature vector sequence, wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
3. The method of claim 2, wherein the vector partitioning the model parameters based on the vector dimensions to obtain m n-dimensional parameter vectors and the vector partitioning the respective feature data of each training sample to obtain m n-dimensional feature vectors, comprises:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the vector division number based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
4. The method according to claim 2, wherein the features included in the same feature vector and different feature vectors are different during the vector partitioning of the feature data of each training sample, and the model parameters included in the same parameter vector and different parameter vectors are different during the vector partitioning of the model parameters.
5. The method of claim 2, further comprising:
in the vector dividing process, if the number of the features contained in the feature vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction, and the number of the model parameters contained in the parameter vector is smaller than the vector dimension supported by the vector floating point multiply-add instruction, the feature vector and the elements which are not full in the parameter vector are assigned to be preset values.
6. The method according to claim 1, wherein the invoking a preset vector floating-point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of a training sample comprises:
calling the vector floating point multiply-add instruction, sequentially dividing the parameter vector arranged at the ith position in the parameter vector sequence, the characteristic vector arranged at the ith position in the characteristic vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence;
after traversing the parameter vector sequence and the feature vector sequence, accumulating the elements in the current result vector, obtaining an assumed function value of the training sample based on the accumulation result, and taking the assumed function value as a target value of the training sample.
7. The method according to claim 1, wherein the invoking a preset vector floating-point multiply-add instruction to perform multiply-add processing on the parameter vector sequence and the feature vector sequence to obtain a target value of a training sample comprises:
and calling the vector floating point multiply-add instruction, carrying out multiply-add processing on a gradient coefficient vector which is obtained in advance, a characteristic vector which is arranged at the j th position in the characteristic vector sequence and a parameter vector which is arranged at the j th position in the parameter vector sequence before descending to obtain a descending parameter vector sequence, and taking a model parameter in the descending parameter vector sequence as a target value of the training sample, wherein j is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence.
8. The method according to claim 7, wherein the invoking the vector floating-point multiply-add instruction further comprises, before performing multiply-add processing on a pre-obtained gradient coefficient vector, a feature vector arranged at a j-th bit in the feature vector sequence, and a parameter vector arranged at a j-th bit in the parameter vector sequence before descending:
acquiring a gradient descent coefficient of a gradient descent process in the iterative training process;
and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
9. The method of any one of claims 1-8, the target model being a linear machine learning model, the linear machine learning model comprising a linear regression model and a logistic regression model.
10. An apparatus to accelerate computing device modeling, comprising:
the vector division module is used for respectively carrying out vector division on model parameters and respective feature data of each training sample in the training process of a target model to obtain a parameter vector sequence of the model parameters and a respective feature vector sequence of each training sample, wherein the training process of the target model comprises multiple rounds of iterative training;
the multiplication and addition module is used for calling a preset vector floating point multiplication and addition instruction for a training sample in each iteration training process, and carrying out multiplication and addition processing on the parameter vector sequence and the characteristic vector sequence to obtain a target value of the training sample;
and the model determining module is used for obtaining a trained target model based on the target value of the training sample in each iteration training process.
11. The device of claim 10, the vector partitioning module comprising:
the obtaining submodule is used for obtaining the vector dimension supported by the vector floating point multiply-add instruction;
and the dividing submodule is used for carrying out vector division on the model parameters based on the vector dimension to obtain m n-dimensional parameter vectors, forming the parameter vector sequence, carrying out vector division on the respective characteristic data of each training sample to obtain m n-dimensional characteristic vectors, and forming the characteristic vector sequence, wherein m is an integer greater than or equal to 1, and n is an integer greater than or equal to 2.
12. The apparatus of claim 11, the partitioning sub-module to:
if the feature quantity contained in the feature data is larger than the vector dimension, determining the vector division number based on the vector dimension and the feature quantity;
according to the vector division number and the vector dimension, m n-dimensional first initial vectors and m n-dimensional second initial vectors are constructed;
and sequentially assigning the model parameters to elements in the m constructed first initial vectors according to a preset sequence to obtain m n-dimensional parameter vectors, and sequentially assigning the features contained in the feature data to elements in the m constructed second initial vectors according to the preset sequence to obtain m n-dimensional feature vectors.
13. The apparatus according to claim 11, wherein during the vector division of the feature data of each training sample, features included in a same feature vector and different feature vectors are different, and during the vector division of the model parameters, model parameters included in a same parameter vector and different parameter vectors are different.
14. The apparatus of claim 11, further comprising:
and the assignment module assigns the characteristic vector and elements which are not full in the parameter vector to preset values if the characteristic number contained in the characteristic vector is less than the vector dimension supported by the vector floating point multiply-add instruction and the model parameter number contained in the parameter vector is less than the vector dimension supported by the vector floating point multiply-add instruction in the vector division process.
15. The apparatus of claim 10, the multiply-add module comprising:
the first processing submodule is used for calling the vector floating point multiply-add instruction, sequentially dividing the parameter vector arranged at the ith position in the parameter vector sequence, the characteristic vector arranged at the ith position in the characteristic vector sequence and a preset initial vector to carry out multiply-add processing to obtain a current result vector, taking the current result vector as the initial vector of the next multiply-add processing to execute the next multiply-add processing, wherein i is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence;
and the second processing submodule is used for accumulating the elements in the current result vector after traversing the parameter vector sequence and the characteristic vector sequence, obtaining an assumed function value of the training sample based on an accumulation result, and taking the assumed function value as a target value of the training sample.
16. The apparatus of claim 10, the multiply-add module comprising:
and the third processing sub-module is used for calling the vector floating point multiply-add instruction, carrying out multiply-add processing on a gradient coefficient vector acquired in advance, a characteristic vector arranged at the jth position in the characteristic vector sequence and a parameter vector arranged at the jth position in the parameter vector sequence before descending to obtain a descending parameter vector sequence, and taking a model parameter in the descending parameter vector sequence as a target value of the training sample, wherein j is an integer between 0 and m-1, and m is the number of the parameter vectors in the parameter vector sequence.
17. The apparatus of claim 16, the multiply-add module further comprising:
the construction submodule is used for acquiring a gradient descent coefficient of a gradient descent process in the iterative training process; and constructing a gradient coefficient vector according to the dimension of the parameter vector, and assigning each element of the gradient coefficient vector as the gradient descent coefficient.
18. The apparatus of any of claims 10-17, the target model being a linear machine learning model, the linear machine learning model comprising a linear regression model and a logistic regression model.
19. A computing device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1-9 when executing the program.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.
CN201911324820.5A 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment Active CN111027018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911324820.5A CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911324820.5A CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Publications (2)

Publication Number Publication Date
CN111027018A true CN111027018A (en) 2020-04-17
CN111027018B CN111027018B (en) 2023-03-31

Family

ID=70212182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911324820.5A Active CN111027018B (en) 2019-12-20 2019-12-20 Method, device, computing equipment and medium for accelerating modeling of computing equipment

Country Status (1)

Country Link
CN (1) CN111027018B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
US20180189234A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data
US20180189638A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Hardware accelerator template and design framework for implementing recurrent neural networks
US20180246853A1 (en) * 2017-02-28 2018-08-30 Microsoft Technology Licensing, Llc Hardware node with matrix-vector multiply tiles for neural network processing
CN109062609A (en) * 2018-02-05 2018-12-21 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing operational order
US10167800B1 (en) * 2017-08-18 2019-01-01 Microsoft Technology Licensing, Llc Hardware node having a matrix vector unit with block-floating point processing
CN109492761A (en) * 2018-10-30 2019-03-19 深圳灵图慧视科技有限公司 Realize FPGA accelerator, the method and system of neural network
CN109661647A (en) * 2016-09-13 2019-04-19 Arm有限公司 The multiply-add instruction of vector
CN110488278A (en) * 2019-08-20 2019-11-22 深圳锐越微技术有限公司 Doppler radar signal kind identification method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN109661647A (en) * 2016-09-13 2019-04-19 Arm有限公司 The multiply-add instruction of vector
US20180189234A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data
US20180189638A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Hardware accelerator template and design framework for implementing recurrent neural networks
US20180246853A1 (en) * 2017-02-28 2018-08-30 Microsoft Technology Licensing, Llc Hardware node with matrix-vector multiply tiles for neural network processing
US10167800B1 (en) * 2017-08-18 2019-01-01 Microsoft Technology Licensing, Llc Hardware node having a matrix vector unit with block-floating point processing
CN109062609A (en) * 2018-02-05 2018-12-21 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing operational order
CN109492761A (en) * 2018-10-30 2019-03-19 深圳灵图慧视科技有限公司 Realize FPGA accelerator, the method and system of neural network
CN110488278A (en) * 2019-08-20 2019-11-22 深圳锐越微技术有限公司 Doppler radar signal kind identification method

Also Published As

Publication number Publication date
CN111027018B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US20210295161A1 (en) Training neural networks represented as computational graphs
EP3446260B1 (en) Memory-efficient backpropagation through time
EP4036803A1 (en) Neural network model processing method and apparatus, computer device, and storage medium
US10656962B2 (en) Accelerate deep neural network in an FPGA
CN112084038B (en) Memory allocation method and device of neural network
CN110245741A (en) Optimization and methods for using them, device and the storage medium of multilayer neural network model
CN114503121A (en) Resource constrained neural network architecture search
EP3602419B1 (en) Neural network optimizer search
WO2018156942A1 (en) Optimizing neural network architectures
CN108292241A (en) Processing calculates figure
CN110008952B (en) Target identification method and device
CN111126668A (en) Spark operation time prediction method and device based on graph convolution network
CN114327844A (en) Memory allocation method, related device and computer readable storage medium
CN111723910A (en) Method and device for constructing multi-task learning model, electronic equipment and storage medium
US11544105B2 (en) Recommendations for scheduling jobs on distributed computing devices
CN107437111A (en) Data processing method, medium, device and computing device based on neutral net
US11275561B2 (en) Mixed precision floating-point multiply-add operation
CN110889497B (en) Learning task compiling method of artificial intelligence processor and related product
CN112084037A (en) Memory allocation method and device of neural network
CN115357554A (en) Graph neural network compression method and device, electronic equipment and storage medium
CN113037800A (en) Job scheduling method and job scheduling device
CN115860081A (en) Core particle algorithm scheduling method and system, electronic equipment and storage medium
CN113419931A (en) Performance index determination method and device of distributed machine learning system
CN116644804A (en) Distributed training system, neural network model training method, device and medium
CN113655986B9 (en) FFT convolution algorithm parallel implementation method and system based on NUMA affinity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027999

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant