CN106991473A - The average value value pond method for parallel processing based on SIMD of vector processor-oriented - Google Patents

The average value value pond method for parallel processing based on SIMD of vector processor-oriented Download PDF

Info

Publication number
CN106991473A
CN106991473A CN201710202133.0A CN201710202133A CN106991473A CN 106991473 A CN106991473 A CN 106991473A CN 201710202133 A CN201710202133 A CN 201710202133A CN 106991473 A CN106991473 A CN 106991473A
Authority
CN
China
Prior art keywords
pond
average value
matrix
vector
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710202133.0A
Other languages
Chinese (zh)
Inventor
郭阳
张军阳
扈啸
王慧丽
胡敏慧
王子聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710202133.0A priority Critical patent/CN106991473A/en
Publication of CN106991473A publication Critical patent/CN106991473A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is:S1:Set pond matrix and pond window;S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, before drawing the corresponding row of k row elements and;S3:Configuration shuffle mode is simultaneously shuffled;S4:The result obtained in step S3 correspondence is added;S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;S6:Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and this vector registor is multiplied with cumulative and one-to-one corresponding;S7:The final result vector for drawing p/k average value pond;S8:Along the k+1 rows for moving to pond matrix A, all subgraphs of the S2 to step S7 that repeats the above steps up to having traveled through pond matrix A obtain average value pond matrix of consequence.The present invention has the advantages that principle is simple, realization facilitates, can efficiently calculate, shorten the calculating time.

Description

The average value value pond method for parallel processing based on SIMD of vector processor-oriented
Technical field
Present invention relates generally to convolutional neural networks technical field, refer in particular to a kind of vector processor-oriented based on SIMD Average value value pond method for parallel processing.
Background technology
In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortex Find that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution god Through network (ConvolutionalNeuralNetwork, CNN).Currently, convolutional neural networks have become numerous ambits One of study hotspot,, can because the network avoids the complicated early stage pretreatment to image particularly in pattern classification field To directly input original image, thus obtain more being widely applied.
Usually, one be used for recognize convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum with And follow-up grader.Convolutional layer extracts the local feature of last layer image by using the convolution kernel of different scales, once After the local feature is extracted, its position relationship between further feature is also decided therewith;Then, by seeking local average (also referred to as pondization operation) carries out Feature Mapping, obtains the characteristic information after dimensionality reduction, this feature information is output to next convolutional layer Proceed corresponding processing, until reaching last layer (output layer), so as to obtain final output result.
The pondization operation used in the convolutional neural networks model of current main-stream is mainly average value pond (Average Pooling) and maximum pond (Max Pooling), average value pondization and maximum pond the pond method that to be two kinds different, But be substantially provided to reduce the dimension of large-size images, amount of calculation is reduced, therefore occupied again in convolutional neural networks The position wanted.Average value pond is to take the average value in certain size pixel, such as, n × n is small-scale greatly, and maximum pond is Take the maximum in certain size pixel.In view of pondization operates the important function in convolutional neural networks, therefore in amount of calculation Accelerate the concurrency of pondization operation particularly significant in very huge convolutional neural networks model.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides one Kind principle is simple, realization facilitates, can efficiently calculated, can fully excavate the multi-level parallelisms of vector processor to play vectorial place Computation capability, the average value value pondization based on SIMD for the vector processor-oriented for shortening the calculating time for managing device are located parallel Reason method.
In order to solve the above technical problems, the present invention uses following technical scheme:
A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;
S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, draw preceding k Row element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateIt is assigned to a vector registor, and by this vector registor It is multiplied with the cumulative and one-to-one corresponding in step S5;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A All subgraphs, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.
As a further improvement on the present invention:The detailed process of the step S3 is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, often The rear k/2 element of group is put together;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit Number is 12.
As a further improvement on the present invention:An average value pond result c in the step S80,0Calculation formula ForWherein c0,0For first element in the matrix of consequence of average value pond, k is the chi of pond window Very little, in convolutional neural networks, pond window is square formation, ai,jTo need to carry out the member in the pond matrix A in average value pond Element.
As a further improvement on the present invention:The size of definition pond window is sizeX, two adjacent pool windows Horizontal displacement or vertical displacement be stride, average value pondization operate in pond window it is not overlapping, i.e. sizeX=stride.
As a further improvement on the present invention:The pond matrix A and pond window size k are square formation.
Compared with prior art, the advantage of the invention is that:Being averaged based on SIMD of vector processor-oriented of the present invention Value value pond method for parallel processing, realizes simple, with low cost and easy to operate, and it shuffles net using the intersection of vector processor Network substitutes the quicksort that reduction operation completes data between VPE, and once-through operation can calculate p/k average value pondization knot simultaneously Really, the Parallel Computing Performance of all processing units of vector processor can be given full play to, shortens average in convolutional neural networks It is worth the calculating time in pond, good reliability simultaneously can reduce hardware calculating power consumption.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the present invention.
Fig. 2 is the general structure schematic diagram of vector processor.
Fig. 3 is the schematic diagram of present invention average value pond process in concrete application example.
Fig. 4 is that present invention window k in pond in concrete application example is the calculation procedure signal in 2 specific embodiment Figure.
Fig. 5 is the signal of present invention pond window not overlap operation in the operation of average value pondization in concrete application example Figure.
Fig. 6 is the average value pond calculation process schematic diagram that present invention pond window in concrete application example is 3 × 3.
Embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in figure 1, a kind of average value value pond parallel processing based on SIMD of vector processor-oriented of the present invention Method, its step is:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;In actual convolution god Through in network model, general pond matrix A and pond window size k are square formation;
S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, draw preceding k Row element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateIt is assigned to a vector registor, and by this vector registor It is multiplied with the cumulative and one-to-one corresponding in step S5;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A All subgraphs, all sizes that can obtain pond matrix A are k × k average value pond matrix of consequence.
The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor. In concrete application example, above-mentioned steps S3 detailed process is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step 2 is put together, every group Rear k/2 element put together;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit Number is 12, and calculating process is as shown in Figure 6.
An average value pond result c in the step S80,0Calculation formula beWherein c0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, Chi Hua Window is generally square formation, ai,jTo need to carry out the element in the pond matrix A in average value pond, its average value pond flow is shown It is intended to as shown in Figure 3.
As shown in figure 5, the size for defining pond window is sizeX, the horizontal displacement or vertical of two adjacent pool windows Displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.
As shown in figure 4, the present invention is in a concrete application example, pond window k is 2, its detailed step:
S101:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is 16 × 16, Chi Huachi Very little is 2 × 2, and the number p of vector processing unit is 16;
S102:It is 2 according to pond size k, takes preceding 2 row element of pond matrix A to carry out corresponding accumulation operations, before drawing 2 row elements it is corresponding row and;
S103:Shuffle mode is configured, preceding 1 element of 8 constituent elements element in step S102 is put together, rear the 1 of every group Individual element is put together;
S104:The result obtained in step S103 correspondence is added, until the numerical value reduction of every constituent element element is into a VPE Untill;
S105:Instructed using vectorial VMOVI by immediateBe assigned to a vector registor, and by this register with Multiplication is added up and does in step S104;
S106:The final result vector for drawing 8 average value ponds;
S107:Along 3 rows for moving to pond matrix A, the S102 to step S106 that repeats the above steps is until traveled through pond square Battle array A all subgraphs, can calculate the average value pond matrix of consequence for obtaining all sizes of pond matrix A for k × k.
It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention Scope.

Claims (5)

1. a kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, it is characterised in that step For:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, the size of pond window For k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;
S2:According to pond window size k, the preceding k row elements of pond matrix A are taken to carry out corresponding accumulation operations, k rows member before drawing Element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and by this vector registor with walking Adding up in rapid S5 is multiplied with one-to-one corresponding;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through all of pond matrix A Subgraph, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.
2. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1, Characterized in that, the detailed process of the step S3 is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, every group K/2 element is put together afterwards;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, the number of vector processing unit For 12.
3. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1, Characterized in that, an average value pond result c in the step S80,0Calculation formula beIts Middle c0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, pond Change window is square formation, ai,jTo need to carry out the element in the pond matrix A in average value pond.
4. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3 Method, it is characterised in that the size of definition pond window is sizeX, the horizontal displacement or perpendicular of two adjacent pool windows Straight displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.
5. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3 Method, it is characterised in that the pond matrix A and pond window size k are square formation.
CN201710202133.0A 2017-03-30 2017-03-30 The average value value pond method for parallel processing based on SIMD of vector processor-oriented Pending CN106991473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710202133.0A CN106991473A (en) 2017-03-30 2017-03-30 The average value value pond method for parallel processing based on SIMD of vector processor-oriented

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710202133.0A CN106991473A (en) 2017-03-30 2017-03-30 The average value value pond method for parallel processing based on SIMD of vector processor-oriented

Publications (1)

Publication Number Publication Date
CN106991473A true CN106991473A (en) 2017-07-28

Family

ID=59413094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710202133.0A Pending CN106991473A (en) 2017-03-30 2017-03-30 The average value value pond method for parallel processing based on SIMD of vector processor-oriented

Country Status (1)

Country Link
CN (1) CN106991473A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205702A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Parallel processing method for multi-input multi-output matrix convolution
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
CN109726803A (en) * 2019-01-10 2019-05-07 广州小狗机器人技术有限公司 Pond method, image processing method and device
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN110163333A (en) * 2018-01-10 2019-08-23 成都信息工程大学 The parallel optimization method of convolutional neural networks
CN110232665A (en) * 2019-06-13 2019-09-13 Oppo广东移动通信有限公司 Maximum pond method, apparatus, computer equipment and storage medium
CN110399977A (en) * 2018-04-25 2019-11-01 华为技术有限公司 Pond arithmetic unit
CN111213125A (en) * 2017-09-08 2020-05-29 甲骨文国际公司 Efficient direct convolution using SIMD instructions
CN112651489A (en) * 2020-12-22 2021-04-13 龙芯中科(合肥)技术有限公司 Operation processing method, operation processing device and storage medium

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111213125B (en) * 2017-09-08 2023-11-07 甲骨文国际公司 Efficient direct convolution using SIMD instructions
CN111213125A (en) * 2017-09-08 2020-05-29 甲骨文国际公司 Efficient direct convolution using SIMD instructions
US11537857B2 (en) 2017-11-01 2022-12-27 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
US11734554B2 (en) 2017-11-01 2023-08-22 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
CN108205702A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Parallel processing method for multi-input multi-output matrix convolution
CN108205702B (en) * 2017-12-29 2020-12-01 中国人民解放军国防科技大学 Parallel processing method for multi-input multi-output matrix convolution
CN108205703B (en) * 2017-12-29 2021-01-12 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
CN110163333A (en) * 2018-01-10 2019-08-23 成都信息工程大学 The parallel optimization method of convolutional neural networks
CN110399977A (en) * 2018-04-25 2019-11-01 华为技术有限公司 Pond arithmetic unit
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN109726803B (en) * 2019-01-10 2021-06-29 广州小狗机器人技术有限公司 Pooling method, image processing method and device
CN109726803A (en) * 2019-01-10 2019-05-07 广州小狗机器人技术有限公司 Pond method, image processing method and device
CN110232665B (en) * 2019-06-13 2021-08-20 Oppo广东移动通信有限公司 Maximum pooling method and device, computer equipment and storage medium
CN110232665A (en) * 2019-06-13 2019-09-13 Oppo广东移动通信有限公司 Maximum pond method, apparatus, computer equipment and storage medium
CN112651489A (en) * 2020-12-22 2021-04-13 龙芯中科(合肥)技术有限公司 Operation processing method, operation processing device and storage medium

Similar Documents

Publication Publication Date Title
CN106991473A (en) The average value value pond method for parallel processing based on SIMD of vector processor-oriented
US11775836B2 (en) Hand pose estimation
JP6771018B2 (en) Improved performance of 2D array processor
US20200285446A1 (en) Arithmetic device for neural network, chip, equipment and related method
US10846591B2 (en) Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
CN108491359B (en) Submatrix operation device and method
CN107438860B (en) Architecture for high performance power efficient programmable image processing
CN106991472A (en) A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
US20210019594A1 (en) Convolutional neural network accelerating device and method
AU2017338783A1 (en) Efficient data layouts for convolutional neural networks
JP2018185847A (en) Two dimensional shift array for image processor
CN107563951B (en) Statistical operations on a two-dimensional image processor
CN108268931A (en) The methods, devices and systems of data processing
CN110188869B (en) Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm
KR102610842B1 (en) Processing element and operating method thereof in neural network
KR20070039490A (en) A bit serial processing element for a simd array processor
US20210201120A1 (en) Inference apparatus, convolution operation execution method, and program
JP7171883B2 (en) efficient convolutional engine
CN114781629B (en) Hardware accelerator of convolutional neural network based on parallel multiplexing and parallel multiplexing method
EP4071619A1 (en) Address generation method, related device and storage medium
CN112784973A (en) Convolution operation circuit, device and method
CN109447239B (en) Embedded convolutional neural network acceleration method based on ARM
CN113254391B (en) Neural network accelerator convolution calculation and data loading parallel method and device
CN113313252B (en) Depth separable convolution implementation method based on pulse array
WO2020038462A1 (en) Tongue segmentation device and method employing deep learning, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170728