CN110471883A - Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework - Google Patents
Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework Download PDFInfo
- Publication number
- CN110471883A CN110471883A CN201910642363.8A CN201910642363A CN110471883A CN 110471883 A CN110471883 A CN 110471883A CN 201910642363 A CN201910642363 A CN 201910642363A CN 110471883 A CN110471883 A CN 110471883A
- Authority
- CN
- China
- Prior art keywords
- decorrelation
- framework
- structure block
- input terminal
- artificial intelligence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 53
- 125000004122 cyclic group Chemical group 0.000 title claims abstract description 46
- 230000009466 transformation Effects 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims description 44
- 230000008569 process Effects 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 29
- 230000000694 effects Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000002203 pretreatment Methods 0.000 claims 1
- 238000013461 design Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000000354 decomposition reaction Methods 0.000 description 9
- 238000011161 development Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 230000009977 dual effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
The invention discloses a kind of artificial intelligence accelerators with Cyclic Symmetry decorrelation framework, including the Cyclic Symmetry decorrelation framework being integrated in microchip, the Cyclic Symmetry decorrelation framework includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, N is integer and N >=3, the decorrelative transformation unit of adjacent layer is successively staggered, every layer of decorrelative transformation unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each decorrelation unit structure block all has left input terminal, right input terminal, left output end and right output end, the Cyclic Symmetry decorrelation framework is two-dimensional surface framework or three-dimensional cylinder framework.The present invention can improve operation efficiency, the space flexibility for improving hardware execution, can effectively remove data broadcasting route, can handle all possible arrangement of input data vector simultaneously.
Description
Technical field
The present invention relates to artificial intelligence field, in particular to a kind of artificial intelligence with Cyclic Symmetry decorrelation framework adds
Fast device.
Background technique
Artificial intelligence (Artificial Intelligence, abbreviation AI) is the key that a positive remodeling global economy skill
Art field.Since over half a century, artificial intelligence field has put into a large amount of R&D fund to people before.Naturally, it has taken
Some mathematics and technical breakthrough were obtained, many of these breakthroughs have been widely used.Therefore, many innovations of artificial intelligence
May, all it is probably derived from the idea for improving existing artificial intelligence computing and related hardware design concept.Certainly, artificial intelligence
The strong innovation of energy, it is necessary to which all related mathematical concepts and theory established since century more than one to the past are consistent.
The development and progress of artificial intelligence, just changes all trades and professions at present.In in the past few decades, artificial intelligence,
The development in the fields such as wireless communication, signal processing is actually complementary.This can be possibly realized, and be because involved in it
Mathematical theory and computing are closely similar.
The background and fundamental cause of artificial intelligence accelerator are as follows: this term of artificial intelligence accelerator typically refers to certain
Microchip designs are specifically used to accelerate the processing speed of artificial intelligence processor active task.These accelerators are usually with one kind to most
The conventional central processor (Central Processing Units, abbreviation CPU) of number desktop computer and laptop is infeasible
Mode execute their particular task.Therefore in general, artificial intelligence accelerator will provide better performance and higher
Power efficiency is to promote the completions of certain special duties.For more broadly angle, it also may refer to it is some must be with non-
Necessity that often fast speed is completed and large-scale " artificial intelligence pretreatment ".Many artificial intelligence tasks really can be extensive
It is parallel to execute.
" decorrelation " is critically important, also very common, be because it be one in extensive science and engineering problem must and
Crucial calculating process.
The mathematical background of artificial intelligence is considerably complicated, relates in essence to a linear algebra field mathematically whole generation
The developing history of discipline.A very simple background can only be provided here by the DM Development Milestone for looking back some keys to be situated between
Continue: linear algebra has become the essential tool of machine learning, and machine learning be currently in artificial intelligence field it is most important, most flow
Capable subdomains.In linear algebra, singular value decomposition (Singular Value Decomposition, abbreviation SVD) is represented
One mathematical framework, and under this mathematical framework, machine learning are made to solve a series of engineering problem and general
Optimization task.Singular value decomposition is reputed as " bright spot " of linear algebra by many people in nearly centenary history.
Singular value decomposition has a variety of different options in actual operation level.The generally acknowledged implementation method of one of which is " QR
Algorithm "." QR algorithm " is last original in the 1950s by John G.F.Francis and Vera N.Kublanovskaya
A kind of algorithm.It is developed in the conceptive application iteration of basic " QR decomposition ".Basic QR decomposition concept is in singular value
Decomposition algorithm development has consequence in history, it is with Francis and Kublanovskaya " the QR algorithm " invented
Different.
There are 4 kinds of putative QR decomposition algorithms in history, is gram-Schimidt orthogonalization (the Gram- respectively
Schmidt process), Givens rotation (Givens'Rotations), Householder transformation (Householder
Transformation) and improvement gram-Schimidt orthogonalization algorithm (Modified Gram-Schmidt
algorithm).Between in the past few decades, this 4 kinds of QR decomposition algorithms and its relevant advantage and disadvantage are by linear algebra academia
Further investigation.The gram of improvement-Schimidt orthogonalization algorithm, which is typically considered, holds a given input signal channel
It is the most structuring of row " mutually orthogonal " process task of (be conceptually equal to implement a whole set of " decorrelation " operation), intuitive
And the method for numerical stability.One group of input signal channel is usually indicated with one group of column vector of matrix.
Fig. 1 illustrates gram-Schmidt's framework structure block schematic diagram of improvement, and Fig. 2 is then its corresponding integrated stand
Composition.Label is the structure block of GS " (i.e. the English acronym of gram-Schmidt) in Fig. 1 and Fig. 2, is one simple
Decorrelative transformation device.Lead to as shown in Figure 1, this decorrelative transformation device (gram-Schmidt's structure block) possesses two inputs
Road and an output channel.In Fig. 1,
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence
Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
The gram no matter improved-Schimidt orthogonalization algorithm (Modified Gram-Schmidt algorithm) has
Multithread row, how many useful modularized processing framework, there are still certain defects for it, such as: (1) none symmetrical processing
Framework, this will affect operation efficiency;(2) space that hardware executes is inflexible;(3) it usually requires to use longer data communication
Route, it means that cannot be by using the adjacent members block acquisition advantage etc. in the entire framework of connection as short as possible.
The all possible arrangement of input data vector cannot be handled simultaneously in the prior art.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of can improve operation
Efficiency, can effectively remove data broadcasting route, can handle input data vector institute simultaneously the space flexibility for improving hardware execution
The artificial intelligence accelerator with Cyclic Symmetry decorrelation framework of possible arrangement.
The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of with Cyclic Symmetry decorrelation framework
Artificial intelligence accelerator, including the Cyclic Symmetry decorrelation framework being integrated in microchip, the Cyclic Symmetry decorrelation frame
Structure includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, and N is integer and N >=3, and adjacent layer is gone
Dependent processing unit is successively staggered, and every layer of decorrelative transformation unit includes N number of decorrelation list arranged in the horizontal direction
First structure block, each decorrelation unit structure block all have left input terminal, right input terminal, left output end and right output end,
The Cyclic Symmetry decorrelation framework is two-dimensional surface framework or three-dimensional cylinder framework.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the decorrelative transformation
The left input terminal of building unit block inputsRight input terminal inputLeft output end outputRight output
End output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence
Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, at current layer decorrelation
Unit is managed relative to upper one layer of decorrelative transformation unit offset setting distance to the right.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the decorrelation of first layer
In processing unit, the input channel that the left input terminal of each decorrelation unit structure block is nearest with left side is connect, right input terminal with
The nearest input channel connection in right side, the left input terminal of the decorrelation unit structure block at beginning point also with the decorrelation that is located at end
The right input terminal of building unit block connects.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, in the second layer to N-
In 2 layers of decorrelative transformation unit, the left input terminal of the decorrelation unit structure block of current layer with upper one layer of left side is nearest goes
The right output end of correlation unit structure block connects, and a left side for the right input terminal decorrelation unit structure block nearest with upper one layer of right side is defeated
The right input of the correlation unit structure block of outlet connection, the left output end of the decorrelation unit structure block at beginning and next layer of end
The left input terminal of end connection, the right output end correlation unit structure block nearest with next layer of right side is connect, remaining phase in current layer
The left output end for closing building unit block is connect with the right input terminal of decorrelation unit structure block nearest on the left of in the of next layer.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, in N-1 layers go
In dependent processing unit, the left output end and right output end of each decorrelation unit structure block are used as output channel.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the N item input is logical
Road connects preprocessing process, and the 2N output channel connects last handling process, and the preprocessing process is directly located with described afterwards
The connection of reason process, or intermediate output process is handled by activity and is connect with the last handling process.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, at the same handle it is all can
The formula of cylinder sum needed for capable of arranging is as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
Implement the artificial intelligence accelerator with Cyclic Symmetry decorrelation framework of the invention, has the advantages that
It include the Cyclic Symmetry decorrelation framework being integrated in microchip due to being equipped with, Cyclic Symmetry decorrelation framework includes the input of N item
Channel, N-1 layers of decorrelative transformation unit and 2N output channel, N are the integer greater than 1, the decorrelative transformation unit of adjacent layer
It is successively staggered, every layer of decorrelative transformation unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each
Decorrelation unit structure block all has left input terminal, right input terminal, left output end and right output end, Cyclic Symmetry decorrelation framework
For two-dimensional surface framework or three-dimensional cylinder framework, the present invention exports the gram of improvement-Schmidt's algorithm dual input list
Framework structure block is changed to the design of two-output impulse generator, and the main purpose of the design is the symmetry using structure block, once make
With this new structure block, seeking maximization parallel processing manner for entire operation framework will become more controllable, can be easy to
Observe that this innovates the scalability of framework, modularization, perfect concurrency and clearly structural regularity, the present invention exist in ground
Data broadcasting route is removed in the gram of improvement-Schimidt orthogonalization algorithm basic framework, while not being influenced in the process
This mathematics target of one group of mutually orthogonal output vector is generated, therefore the present invention can improve operation efficiency, improve hardware execution
Space flexibility, data broadcasting route can be effectively removed, all possible arrangement of input data vector can be handled simultaneously.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is gram-Schmidt's structure block structural schematic diagram in background technique;
Fig. 2 is the corresponding integrated stand composition of gram-Schmidt's structure block;
Fig. 3 is that the present invention has decorrelation list in artificial intelligence accelerator one embodiment of Cyclic Symmetry decorrelation framework
The structural schematic diagram of first structure block;
Fig. 4 is the structural schematic diagram for having 3 input channels in the embodiment in Cyclic Symmetry decorrelation framework;
Fig. 5 is the structural schematic diagram for having 4 input channels in the embodiment in Cyclic Symmetry decorrelation framework;
Fig. 6 is the relational graph that preprocessing process in the embodiment, last handling process and activity handle intermediate output process;
Fig. 7 is the schematic diagram for carrying out roll up operation in the embodiment to Fig. 5;
Fig. 8 is the schematic diagram of cylinder sum needed for handling all possible arrangements in the embodiment simultaneously.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In the artificial intelligence accelerator embodiment that the present invention has Cyclic Symmetry decorrelation framework, this has Cyclic Symmetry
The artificial intelligence accelerator of decorrelation framework includes the Cyclic Symmetry decorrelation framework being integrated in microchip, which goes
Related framework includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, and N is integer and N >=3, Fig. 4 are
There is the structural schematic diagram of 3 input channels in the present embodiment in Cyclic Symmetry decorrelation framework;Fig. 5 is circulation pair in the present embodiment
Claim the structural schematic diagram for there are 4 input channels in decorrelation framework.Fig. 4 and Fig. 5 depicts 3 input channels and 4 as an example
Input channel, in practical application, the item number of input channel can carry out corresponding change with specific requirements.
The decorrelative transformation unit of adjacent layer is successively staggered, and every layer of decorrelative transformation unit includes N number of in level
The decorrelation unit structure block of direction arrangement, each decorrelation unit structure block all have left input terminal, right input terminal, left output
End and right output end, Cyclic Symmetry decorrelation framework are two-dimensional surface framework or three-dimensional cylinder framework.The present invention can improve fortune
Efficiency is calculated, the space flexibility for improving hardware execution, data broadcasting route can be effectively removed, input data vector can be handled simultaneously
All possible arrangement.
Fig. 3 is the structural schematic diagram of decorrelation unit structure block in the present embodiment, a left side for decorrelative transformation building unit block
Input terminal inputRight input terminal inputLeft output end outputRight output end output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence
Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
The gram of improvement-Schmidt's algorithm dual input list output framework structure block is changed to dual input lose-lose by the present invention
Design out.The newly-designed main purpose seeks to the symmetry using structure block.Once being using this new structure block
Entire operation framework, which seeks maximization parallel processing manner, will become more controllable.
The basic and key concept of " decorrelation " is usually related to signal processing, also general with " orthogonalization " in linear algebra
Read closely related, this has been well-known since century more than one." decorrelation " function is used as structure block to construct big rule
Mould is parallel, distributed and " Cyclic Symmetry " operation framework, this is the important breakthrough of artificial intelligence and machine learning field.In order to
Brief description, Fig. 4 illustrate the situation using only 3 input channels.Fig. 5 then illustrates the case where 4 input channels.By right
According to the variation between Fig. 4 and Fig. 5, scalability, the module of this Cyclic Symmetry decorrelation framework can be easily observed
Change, perfect concurrency and clearly structural regularity.
Use the structure block (rather than the original dual input list of Fig. 1 export structure block) of this two-output impulse generator of Fig. 3
Purpose is to obtain most efficient communication and data flow between the framework of " closest " (such as adjacent structure block).In Fig. 4 and figure
In 5 this innovation framework, in the case where not introducing data program circuit, smooth, structuring data flow is realized, it is defeated to 3
It is all both that an important design object and one are important for entering channel and 4 input channel both of these case respectively
Performance objective.
In addition to realizing scalability, concurrency, structural regularity etc., another importance of the invention is effective and ties
The monitoring mechanism of structure, and real time signal processing adjustment is carried out based on intermediate output data automatically, allow data very convenient
It is extracted from framework on ground.Fig. 6 is the pass that preprocessing process in the present embodiment, last handling process and activity handle intermediate output process
System schemes, and in Fig. 6, N input channel connects preprocessing process, and 2N output channel connects last handling process, and preprocessing process is straight
It connects and is connect with last handling process, or intermediate output process is handled by activity and is connect with last handling process.Fig. 6 is Fig. 5
One extended version, wherein also clearly illustrating pretreatment, post-processing and the intermediate output (Dynamic of activity processing
Processing of Intermediate Outputs, abbreviation DPIO) etc. processes.
Pretreatment and the general audient of last handling process know about.And activity handle concept involved in intermediate output process and
Task is then not as well known, and is heavily dependent on the demand of Setting signal processing problem.It can be said that at activity
Manage intermediate output process for many advanced signals processing problem (including in radar and wireless communication field to it is anti-interference relevant
Problem) it is vital.
From Fig. 4 and Fig. 5 as can be seen that current layer decorrelative transformation unit relative to upper one layer of decorrelative transformation unit to
Right avertence moves set distance.In the decorrelative transformation unit of first layer, the left input terminal of each decorrelation unit structure block and left side
Nearest input channel connection, the right input terminal input channel nearest with right side are connect, the decorrelation unit structure block at beginning
Left input terminal point is also connect with the right input terminal for the decorrelation unit structure block for being located at end.
In the decorrelative transformation unit of the second layer to N-2 layers, the left input of the decorrelation unit structure block of current layer
End connect with the right output end of the nearest decorrelation unit structure block in upper one layer of left side, right input terminal and it is one layer upper on the right side of recently
The left output end of decorrelation unit structure block connects, the left output end of the decorrelation unit structure block at beginning and next layer of end
The right input terminal of correlation unit structure block connects, the left input of the right output end correlation unit structure block nearest with next layer of right side
End connects, the left output end of remaining the correlation unit structure block decorrelation unit structure block nearest with next layer of left side in current layer
Right input terminal connection.
In N-1 layers of decorrelative transformation unit, the left output end and right output end of each decorrelation unit structure block
It is used as output channel.
In Fig. 5, there are 4 input channels, generates 8 output channels.It need to only check relevant to any specific output channel
Processing sequence is appreciated that entire implementation.It herein can be with output channel X3 4(k;3,2,1) for, to illustrate related place
Manage sequence:
Subscript " 3 " represents this specific output channel, once crosses " decorrelation " processing of 3 levels.(note: defeated in Fig. 5
Enter the subscript " 0 " in channel symbol, represent in the input channel initial stage, be also not carried out any " decorrelation " processing).
Coefficient " k " in bracket, represents the specific data sample in each output channel data sequence." k " is arranged from 1 to K
Sequence, K are the sum of the data sample in given output channel data sequence.
Subscript " 4 " and X3 4(k;3,2,1) other remaining symbols in, reflect this specific output channel, are logical
Cross complete the 4th input channel and other input channels it is all may with generated after necessary " decorrelation " operation.
Since the present invention relates to the exploitation of " N input -2N output " processing framework, each input channels corresponding two
A output channel.In the present embodiment, X3 4(k;And X 3,2,1)3 4(k;1,2,3) all with same, i.e. the 4th article of input channel be opposite
It answers, all develops in other words from this channel.Theoretically, this two output channels will generate same output data sequence.
Note: although their theoretical output valve is identical, due to system noise and other reasons, they may not necessarily generate identical
Output data sequence.
And in the present embodiment, X3 4(k;And X 3,2,1)3 4(k;1,2,3) the main distinction is because they execute " decorrelation "
When different order caused by.That is: X3 4(k;3,2,1) generation of output channel starts from the 4th article and the 3rd article of input channel
" decorrelation ", and X3 4(k;1,2,3) generation of output channel then starts from " decorrelation " of the 4th article with the 1st article of input channel,
And so on.In other words, the number order in bracket represents the sequence of " decorrelation " generation.Therefore this numbering system
Design, all " decorrelation " steps in entire processing framework can be allowed to be more clear clear.Understand and utilizes " decorrelation "
Sequentially, extremely important to signal specific processing scene.In addition, above-mentioned number and coefficient system design, can also support to use or
It is related to the engineering development of the application of this processing architecture invention.
From above-mentioned about output channel X3 4(k;And X 3,2,1)3 4(k;1,2,3) explanation can be seen that this from one
Specific input channel generates the unique ability of two output channels, from the design of symmetrical structure block and perfect structuring system
System design.It can be seen that there is very big difference in the gram of new processing framework and improvement-Schmidt's framework.
It is detailed the step of being related to large-scale parallel " decorrelation " process (from one layer of framework to another layer) above
Explanation.It is emphasized that the activity in Fig. 6 handles intermediate output process, and it is actually very efficient, because entirely handling
All centres " decorrelation " output is generated in framework, is all fairly regular and structuring.
After explaining above-mentioned calculating process and step, can go to below certain given input channel of processing (such as:
To given input data vector) all possible arrangement this task on.Firstly, one has giving for N input channel
All number of permutations summations for determining system are equal to N!.Therefore, the case where being related to 3 input channels this for Fig. 4, the sum of arrangement
It is 3!=6, and it is this for Fig. 5 have the case where 4 channels, the sum of arrangement is 4!=24, so analogize.
In order to better illustrate, be listed below Fig. 5 it is this include 4 input channels all possible arrangements it is as follows:
First group:
(X1,X2,X3,X4)(X2,X3,X4,X1)(X3,X4,X1,X2)(X4,X1,X2,X3)
(X4,X3,X2,X1)(X3,X2,X1,X4)(X2,X1,X4,X3)(X1,X4,X3,X2)
Second group:
(X2,X1,X3,X4)(X1,X3,X4,X2)(X3,X4,X2,X1)(X4,X2,X1,X3)
(X4,X3,X1,X2)(X3,X1,X2,X4)(X1,X2,X4,X3)(X2,X4,X3,X1)
Third group:
(X3,X1,X2,X4)(X1,X2,X4,X3)(X2,X4,X3,X1)(X4,X3,X1,X2)
(X4,X2,X1,X3)(X2,X1,X3,X4)(X1,X3,X4,X2)(X3,X4,X2,X1)
Before above three groups of arrangements are described in detail and why to be in this way grouped them, Fig. 7 is first had a look, figure
7 illustrate the case where carrying out " rolling " operation to Fig. 5.This two-dimensional surface framework that would look like plane becomes three-dimensional cylinder
" rolling " operation is possible, because having found the feature of its symmetry in process of innovation of the invention.
The cylindrical structure of Fig. 7 can be used as visualization tool, can easily see that first group of arrangement listed above
The logic of list behind.Above first group, from (X1,X2,X3,X4) this arrangement starts, then can be regarded as one and followed
Circle permutation, naturally next arrangement is just from X2Start, i.e. (X2,X3,X4,X1), so analogize.This is arranged in order.It connects down
Coming us can be reversed this process of repetition, from X4Start, formation sequence: (X4,X3,X2,X1),(X3,X2,X1,X4) etc..Thus
It is clearly understood that the method for the 8 kinds of arrangements of first group of the cylindrical structure of 4 input channels of identification.
And in the case where 4 input channels, a total of 24 kinds of arrangements.Therefore, if it is desirable to handle all 24 kinds simultaneously
Arrangement, then need 3 cylinders.Three groups are clearly outlined above, and every group of each 8 kinds of arrangement is to show and need at 3 cylinders
The special circumstances of reason.
Calculate the basic formula N of all possible arrangement sums for the system for having N input channel!On, according to above-mentioned explanation
With and its characteristics of Cyclic Symmetry, calculate the exact formulas of " while cylinder sum needed for handling all possible arrangements " are as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
Therefore as N=3, while handling the required cylinder sum of all possible arrangements is 1;
As N=4, while handling the required cylinder sum of all possible arrangements is 3;
And as N=5, while handling the required cylinder sum of all possible arrangements is 12;So analogize.
Fig. 8 graphically summarizes above situation.
It must be stressed that the calculating and realization of the above-mentioned cylinder sum needed for handling all possible arrangements simultaneously
Details is refered in particular to " while handling " such case, that is, is discussed: and if only if it is all possible arrangement be all concurrent operation
's.Certainly, in many actual conditions or special applications, this is not actual requirement.However, the present invention is directed to propose one
Innovation and " complete " solution.A other application or solution no doubt can be simplified or be determined as the case may be
System.
Current all main artificial intelligence hardware companies be all committed to various optimisation strategies and with CPU, GPU, FPGA,
The relevant service efficiency problem of the advantage and disadvantage such as ASIC.Obviously, it is suitable for all possible task without a kind of hardware technology and answers
With scene, but it is same it is ensured that the role of artificial intelligence accelerator is just becoming further important.It is provided with wikipedia
Define consistent, artificial intelligence accelerator is generally interpreted as a kind of microprocessor, arithmetic system, even certain microchip, this
Kind microchip is designed to the hardware accelerator of artificial intelligence application.In view of the above-mentioned feature and advantage referred to, relative to
For artificial intelligence accelerator field, the present invention is a great innovation contribution.It should be noted that artificial intelligence accelerator
It usually oneself will not individually work, and work together with the other equipment of given system and/or subsystem.These equipment and subsystem
System includes but is not limited to: CPU, GPU, FPGA, ASIC etc..It should also be noted that this innovation algorithm of the present invention and its correlation can
Flexible distributed structure/architecture can execute in a given application in such a way that hardware, software or hardware-software combine.
The present invention improves the breakthrough of existing intelligent algorithm and concurrent operation framework field.The present invention can
It is applied to: (1) generally improves the validity of search engine technique;(2) the real-time analysis and monitoring of catenet are supported, it is such as complete
State or regional analysis of electric power consumption;(3) Speeding up development and deployment machine learning (critical subsets of artificial intelligence) are in all trades and professions, packet
Include the application of wireless communication field;(4) support be cooperative association and Government Analytical critical data, and by almost in real time in a manner of pass
Broadcast result.These analysis output datas may include scientific achievement, data related with international crisis situation etc..
In short, central task mentioned by the present invention is the key that realize " decorrelation " and base in a manner of large-scale parallel
This function.New algorithm related to the present invention and relevant concurrent operation system are properly named as, and " artificial intelligence accelerates
Device ".In addition, " Cyclic Symmetry " this phrase is used, with symmetrical spy the most key in this innovation operation framework of accurate description
Point.
It is this can simultaneously and the innovations of all possible arrangements that input data vector is effectively treated, to practical artificial
The exploitation of intelligent solution has important and delicate influence.These actual artificial intelligence solutions are developed to be usually directed to
Certain important engineerings and mathematical concept, such as improve convergence rate, possess computing redundancy to reduce potential information as far as possible
It loses, monitoring and tracking etc. is executed to intermediate output data.A key breakthrough related to the present invention is, it is resulting simultaneously
Row processing framework is that pipelining, modularization, scalable and structure are efficient completely, does not waste any " data handling procedure real estate "
, this is the critical consideration of any artificial intelligence accelerator microchip designs.
Due to flexibility of the invention, scalability and parallel processing, wire location is carried out using feedback control loop more
Entire framework, is compressed into a more compact physical structure, it is made to be more suitable for certain places by the obvious modification and fine tuning of kind form
Scene etc. is managed, will be become possible.In addition, these possible modifications and variations, it would be possible in signal processing and apply number by those
There is experience personnel proposition in field.It is, therefore, understood that the above-mentioned all main points and opinion referred to, are provided to cover this
The modifications and variations that may be proposed in the future a bit, here it is really core purposes of the invention.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of artificial intelligence accelerator with Cyclic Symmetry decorrelation framework, which is characterized in that including being integrated in microchip
Interior Cyclic Symmetry decorrelation framework, the Cyclic Symmetry decorrelation framework include N input channel, N-1 layers of decorrelative transformation
Unit and 2N output channel, N are integer and N >=3, and the decorrelative transformation unit of adjacent layer is successively staggered, and every layer is gone phase
Closing processing unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each decorrelation unit structure block
Left input terminal, right input terminal, left output end and right output end are all had, the Cyclic Symmetry decorrelation framework is two-dimensional surface frame
Structure or three-dimensional cylinder framework.
2. the artificial intelligence accelerator according to claim 1 with Cyclic Symmetry decorrelation framework, which is characterized in that institute
State the left input terminal input of decorrelative transformation building unit blockRight input terminal inputLeft output end outputRight output end output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data sample in each output channel data sequence,
K is the sum of the data sample in given output channel data sequence, and * is complex conjugate.
3. the artificial intelligence accelerator according to claim 2 with Cyclic Symmetry decorrelation framework, which is characterized in that when
Front layer decorrelative transformation unit is relative to upper one layer of decorrelative transformation unit offset setting distance to the right.
4. the artificial intelligence accelerator according to claim 3 with Cyclic Symmetry decorrelation framework, which is characterized in that the
In one layer of decorrelative transformation unit, left input terminal and the nearest input channel in left side of each decorrelation unit structure block connect
Connect, the right input terminal input channel nearest with right side is connect, the left input terminal of the decorrelation unit structure block at beginning point also with position
In the right input terminal connection of the decorrelation unit structure block of end.
5. the artificial intelligence accelerator according to claim 4 with Cyclic Symmetry decorrelation framework, which is characterized in that In
In the decorrelative transformation unit of the second layer to N-2 layers, the left input terminal of the decorrelation unit structure block of current layer with upper one layer
The right output end of the nearest decorrelation unit structure block in left side connects, the right input terminal decorrelation unit nearest with upper one layer of right side
The left output end of structure block connects, the correlation unit structure of the left output end of the decorrelation unit structure block at beginning and next layer of end
The right input terminal connection of block is built, the left input terminal of the right output end correlation unit structure block nearest with next layer of right side is connect, when
The right input of the left output end of remaining the correlation unit structure block decorrelation unit structure block nearest with next layer of left side in front layer
End connection.
6. the artificial intelligence accelerator according to claim 5 with Cyclic Symmetry decorrelation framework, which is characterized in that In
In N-1 layers of decorrelative transformation unit, the left output end and right output end of each decorrelation unit structure block are as output
Channel.
7. according to claim 1 to described in 6 any one with Cyclic Symmetry decorrelation framework artificial intelligence accelerator,
It is characterized in that, the N input channel connects preprocessing process, and the 2N output channel connects last handling process, described pre-
Treatment process is directly connect with the last handling process, or handles intermediate output process and the last handling process by activity
Connection.
8. according to claim 1 to described in 6 any one with Cyclic Symmetry decorrelation framework artificial intelligence accelerator,
It is characterized in that, while the formula for handling the required cylinder sum of all possible arrangements is as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910642363.8A CN110471883A (en) | 2019-07-16 | 2019-07-16 | Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910642363.8A CN110471883A (en) | 2019-07-16 | 2019-07-16 | Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110471883A true CN110471883A (en) | 2019-11-19 |
Family
ID=68508776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910642363.8A Pending CN110471883A (en) | 2019-07-16 | 2019-07-16 | Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110471883A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112926730A (en) * | 2019-12-06 | 2021-06-08 | 三星电子株式会社 | Method and apparatus for processing data |
WO2022121756A1 (en) * | 2020-12-08 | 2022-06-16 | Huawei Technologies Co.,Ltd. | System, method and apparatus for intelligent caching |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4941117A (en) * | 1988-09-07 | 1990-07-10 | General Electric Company | Multiple channel adaptive correlation/decorrelation processor |
CN109525288A (en) * | 2018-11-28 | 2019-03-26 | 广州市高峰科技有限公司 | For wirelessly communicating the parallel processing architecture of decorrelation operation |
CN109725937A (en) * | 2018-11-28 | 2019-05-07 | 广州市高峰科技有限公司 | Parallel processing architecture and IC chip for machine learning decorrelation operation |
-
2019
- 2019-07-16 CN CN201910642363.8A patent/CN110471883A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4941117A (en) * | 1988-09-07 | 1990-07-10 | General Electric Company | Multiple channel adaptive correlation/decorrelation processor |
CN109525288A (en) * | 2018-11-28 | 2019-03-26 | 广州市高峰科技有限公司 | For wirelessly communicating the parallel processing architecture of decorrelation operation |
CN109725937A (en) * | 2018-11-28 | 2019-05-07 | 广州市高峰科技有限公司 | Parallel processing architecture and IC chip for machine learning decorrelation operation |
Non-Patent Citations (1)
Title |
---|
王斌等: "自适应天线阵正交预处理器的研制", 《通信学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112926730A (en) * | 2019-12-06 | 2021-06-08 | 三星电子株式会社 | Method and apparatus for processing data |
WO2022121756A1 (en) * | 2020-12-08 | 2022-06-16 | Huawei Technologies Co.,Ltd. | System, method and apparatus for intelligent caching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Meng et al. | Population-based incremental learning algorithm for a serial colored traveling salesman problem | |
Wang et al. | A multi-order distributed HOSVD with its incremental computing for big services in cyber-physical-social systems | |
Kim et al. | SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices | |
CN110471883A (en) | Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework | |
Burgholzer et al. | Improved DD-based equivalence checking of quantum circuits | |
CN105678401A (en) | Global optimization method based on strategy adaptability differential evolution | |
CN103942753A (en) | Multi-dimensional quantum colored image geometric transformation design and achieving method | |
Liu et al. | A surrogate-assisted two-stage differential evolution for expensive constrained optimization | |
CN101364245B (en) | Electromagnetic environment prediction system for multipole database | |
Xu et al. | E $^ 2$ DNet: An Ensembling Deep Neural Network for Solving Nonconvex Economic Dispatch in Smart Grid | |
Wang et al. | A matrix approach for the static correction problem of asynchronous sequential machines | |
CN109725937A (en) | Parallel processing architecture and IC chip for machine learning decorrelation operation | |
Chen et al. | Automatic and exact symmetry recognition of structures exhibiting high-order symmetries | |
Praba et al. | Semiring on roughsets | |
Dong et al. | Research on modeling method of power system network security risk assessment based on object-oriented Bayesian network | |
He et al. | Ensemble learning for wind profile prediction with missing values | |
Krasnobayev et al. | Information Security of the National Economy Based on an Effective Data Control Method | |
Liu et al. | A set-based discrete differential evolution algorithm | |
Gu et al. | Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic | |
CN110361702A (en) | The processing method of Radar jam signal | |
Shang et al. | Circularly searching core nodes based label propagation algorithm for community detection | |
Pan et al. | Further remark on P systems with active membranes and two polarizations | |
Dieudonné et al. | Swing words to make circle formation quiescent | |
CN103761379B (en) | A kind of earth observation satellite multidisciplinary optimization system based on envelope Conjugate Search Algorithm system | |
Su et al. | The attack efficiency of PageRank and HITS algorithms on complex networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |