CN110471883A - Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework - Google Patents

Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework Download PDF

Info

Publication number
CN110471883A
CN110471883A CN201910642363.8A CN201910642363A CN110471883A CN 110471883 A CN110471883 A CN 110471883A CN 201910642363 A CN201910642363 A CN 201910642363A CN 110471883 A CN110471883 A CN 110471883A
Authority
CN
China
Prior art keywords
decorrelation
framework
structure block
input terminal
artificial intelligence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910642363.8A
Other languages
Chinese (zh)
Inventor
袁闻峰
陈玉镇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Gaofeng Technology Co Ltd
Original Assignee
Guangzhou Gaofeng Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Gaofeng Technology Co Ltd filed Critical Guangzhou Gaofeng Technology Co Ltd
Priority to CN201910642363.8A priority Critical patent/CN110471883A/en
Publication of CN110471883A publication Critical patent/CN110471883A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention discloses a kind of artificial intelligence accelerators with Cyclic Symmetry decorrelation framework, including the Cyclic Symmetry decorrelation framework being integrated in microchip, the Cyclic Symmetry decorrelation framework includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, N is integer and N >=3, the decorrelative transformation unit of adjacent layer is successively staggered, every layer of decorrelative transformation unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each decorrelation unit structure block all has left input terminal, right input terminal, left output end and right output end, the Cyclic Symmetry decorrelation framework is two-dimensional surface framework or three-dimensional cylinder framework.The present invention can improve operation efficiency, the space flexibility for improving hardware execution, can effectively remove data broadcasting route, can handle all possible arrangement of input data vector simultaneously.

Description

Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework
Technical field
The present invention relates to artificial intelligence field, in particular to a kind of artificial intelligence with Cyclic Symmetry decorrelation framework adds Fast device.
Background technique
Artificial intelligence (Artificial Intelligence, abbreviation AI) is the key that a positive remodeling global economy skill Art field.Since over half a century, artificial intelligence field has put into a large amount of R&D fund to people before.Naturally, it has taken Some mathematics and technical breakthrough were obtained, many of these breakthroughs have been widely used.Therefore, many innovations of artificial intelligence May, all it is probably derived from the idea for improving existing artificial intelligence computing and related hardware design concept.Certainly, artificial intelligence The strong innovation of energy, it is necessary to which all related mathematical concepts and theory established since century more than one to the past are consistent.
The development and progress of artificial intelligence, just changes all trades and professions at present.In in the past few decades, artificial intelligence, The development in the fields such as wireless communication, signal processing is actually complementary.This can be possibly realized, and be because involved in it Mathematical theory and computing are closely similar.
The background and fundamental cause of artificial intelligence accelerator are as follows: this term of artificial intelligence accelerator typically refers to certain Microchip designs are specifically used to accelerate the processing speed of artificial intelligence processor active task.These accelerators are usually with one kind to most The conventional central processor (Central Processing Units, abbreviation CPU) of number desktop computer and laptop is infeasible Mode execute their particular task.Therefore in general, artificial intelligence accelerator will provide better performance and higher Power efficiency is to promote the completions of certain special duties.For more broadly angle, it also may refer to it is some must be with non- Necessity that often fast speed is completed and large-scale " artificial intelligence pretreatment ".Many artificial intelligence tasks really can be extensive It is parallel to execute.
" decorrelation " is critically important, also very common, be because it be one in extensive science and engineering problem must and Crucial calculating process.
The mathematical background of artificial intelligence is considerably complicated, relates in essence to a linear algebra field mathematically whole generation The developing history of discipline.A very simple background can only be provided here by the DM Development Milestone for looking back some keys to be situated between Continue: linear algebra has become the essential tool of machine learning, and machine learning be currently in artificial intelligence field it is most important, most flow Capable subdomains.In linear algebra, singular value decomposition (Singular Value Decomposition, abbreviation SVD) is represented One mathematical framework, and under this mathematical framework, machine learning are made to solve a series of engineering problem and general Optimization task.Singular value decomposition is reputed as " bright spot " of linear algebra by many people in nearly centenary history.
Singular value decomposition has a variety of different options in actual operation level.The generally acknowledged implementation method of one of which is " QR Algorithm "." QR algorithm " is last original in the 1950s by John G.F.Francis and Vera N.Kublanovskaya A kind of algorithm.It is developed in the conceptive application iteration of basic " QR decomposition ".Basic QR decomposition concept is in singular value Decomposition algorithm development has consequence in history, it is with Francis and Kublanovskaya " the QR algorithm " invented Different.
There are 4 kinds of putative QR decomposition algorithms in history, is gram-Schimidt orthogonalization (the Gram- respectively Schmidt process), Givens rotation (Givens'Rotations), Householder transformation (Householder Transformation) and improvement gram-Schimidt orthogonalization algorithm (Modified Gram-Schmidt algorithm).Between in the past few decades, this 4 kinds of QR decomposition algorithms and its relevant advantage and disadvantage are by linear algebra academia Further investigation.The gram of improvement-Schimidt orthogonalization algorithm, which is typically considered, holds a given input signal channel It is the most structuring of row " mutually orthogonal " process task of (be conceptually equal to implement a whole set of " decorrelation " operation), intuitive And the method for numerical stability.One group of input signal channel is usually indicated with one group of column vector of matrix.
Fig. 1 illustrates gram-Schmidt's framework structure block schematic diagram of improvement, and Fig. 2 is then its corresponding integrated stand Composition.Label is the structure block of GS " (i.e. the English acronym of gram-Schmidt) in Fig. 1 and Fig. 2, is one simple Decorrelative transformation device.Lead to as shown in Figure 1, this decorrelative transformation device (gram-Schmidt's structure block) possesses two inputs Road and an output channel.In Fig. 1,
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
The gram no matter improved-Schimidt orthogonalization algorithm (Modified Gram-Schmidt algorithm) has Multithread row, how many useful modularized processing framework, there are still certain defects for it, such as: (1) none symmetrical processing Framework, this will affect operation efficiency;(2) space that hardware executes is inflexible;(3) it usually requires to use longer data communication Route, it means that cannot be by using the adjacent members block acquisition advantage etc. in the entire framework of connection as short as possible. The all possible arrangement of input data vector cannot be handled simultaneously in the prior art.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of can improve operation Efficiency, can effectively remove data broadcasting route, can handle input data vector institute simultaneously the space flexibility for improving hardware execution The artificial intelligence accelerator with Cyclic Symmetry decorrelation framework of possible arrangement.
The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of with Cyclic Symmetry decorrelation framework Artificial intelligence accelerator, including the Cyclic Symmetry decorrelation framework being integrated in microchip, the Cyclic Symmetry decorrelation frame Structure includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, and N is integer and N >=3, and adjacent layer is gone Dependent processing unit is successively staggered, and every layer of decorrelative transformation unit includes N number of decorrelation list arranged in the horizontal direction First structure block, each decorrelation unit structure block all have left input terminal, right input terminal, left output end and right output end, The Cyclic Symmetry decorrelation framework is two-dimensional surface framework or three-dimensional cylinder framework.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the decorrelative transformation The left input terminal of building unit block inputsRight input terminal inputLeft output end outputRight output End output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, at current layer decorrelation Unit is managed relative to upper one layer of decorrelative transformation unit offset setting distance to the right.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the decorrelation of first layer In processing unit, the input channel that the left input terminal of each decorrelation unit structure block is nearest with left side is connect, right input terminal with The nearest input channel connection in right side, the left input terminal of the decorrelation unit structure block at beginning point also with the decorrelation that is located at end The right input terminal of building unit block connects.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, in the second layer to N- In 2 layers of decorrelative transformation unit, the left input terminal of the decorrelation unit structure block of current layer with upper one layer of left side is nearest goes The right output end of correlation unit structure block connects, and a left side for the right input terminal decorrelation unit structure block nearest with upper one layer of right side is defeated The right input of the correlation unit structure block of outlet connection, the left output end of the decorrelation unit structure block at beginning and next layer of end The left input terminal of end connection, the right output end correlation unit structure block nearest with next layer of right side is connect, remaining phase in current layer The left output end for closing building unit block is connect with the right input terminal of decorrelation unit structure block nearest on the left of in the of next layer.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, in N-1 layers go In dependent processing unit, the left output end and right output end of each decorrelation unit structure block are used as output channel.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, the N item input is logical Road connects preprocessing process, and the 2N output channel connects last handling process, and the preprocessing process is directly located with described afterwards The connection of reason process, or intermediate output process is handled by activity and is connect with the last handling process.
In the artificial intelligence accelerator of the present invention with Cyclic Symmetry decorrelation framework, at the same handle it is all can The formula of cylinder sum needed for capable of arranging is as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
Implement the artificial intelligence accelerator with Cyclic Symmetry decorrelation framework of the invention, has the advantages that It include the Cyclic Symmetry decorrelation framework being integrated in microchip due to being equipped with, Cyclic Symmetry decorrelation framework includes the input of N item Channel, N-1 layers of decorrelative transformation unit and 2N output channel, N are the integer greater than 1, the decorrelative transformation unit of adjacent layer It is successively staggered, every layer of decorrelative transformation unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each Decorrelation unit structure block all has left input terminal, right input terminal, left output end and right output end, Cyclic Symmetry decorrelation framework For two-dimensional surface framework or three-dimensional cylinder framework, the present invention exports the gram of improvement-Schmidt's algorithm dual input list Framework structure block is changed to the design of two-output impulse generator, and the main purpose of the design is the symmetry using structure block, once make With this new structure block, seeking maximization parallel processing manner for entire operation framework will become more controllable, can be easy to Observe that this innovates the scalability of framework, modularization, perfect concurrency and clearly structural regularity, the present invention exist in ground Data broadcasting route is removed in the gram of improvement-Schimidt orthogonalization algorithm basic framework, while not being influenced in the process This mathematics target of one group of mutually orthogonal output vector is generated, therefore the present invention can improve operation efficiency, improve hardware execution Space flexibility, data broadcasting route can be effectively removed, all possible arrangement of input data vector can be handled simultaneously.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is gram-Schmidt's structure block structural schematic diagram in background technique;
Fig. 2 is the corresponding integrated stand composition of gram-Schmidt's structure block;
Fig. 3 is that the present invention has decorrelation list in artificial intelligence accelerator one embodiment of Cyclic Symmetry decorrelation framework The structural schematic diagram of first structure block;
Fig. 4 is the structural schematic diagram for having 3 input channels in the embodiment in Cyclic Symmetry decorrelation framework;
Fig. 5 is the structural schematic diagram for having 4 input channels in the embodiment in Cyclic Symmetry decorrelation framework;
Fig. 6 is the relational graph that preprocessing process in the embodiment, last handling process and activity handle intermediate output process;
Fig. 7 is the schematic diagram for carrying out roll up operation in the embodiment to Fig. 5;
Fig. 8 is the schematic diagram of cylinder sum needed for handling all possible arrangements in the embodiment simultaneously.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In the artificial intelligence accelerator embodiment that the present invention has Cyclic Symmetry decorrelation framework, this has Cyclic Symmetry The artificial intelligence accelerator of decorrelation framework includes the Cyclic Symmetry decorrelation framework being integrated in microchip, which goes Related framework includes N input channel, N-1 layers of decorrelative transformation unit and 2N output channel, and N is integer and N >=3, Fig. 4 are There is the structural schematic diagram of 3 input channels in the present embodiment in Cyclic Symmetry decorrelation framework;Fig. 5 is circulation pair in the present embodiment Claim the structural schematic diagram for there are 4 input channels in decorrelation framework.Fig. 4 and Fig. 5 depicts 3 input channels and 4 as an example Input channel, in practical application, the item number of input channel can carry out corresponding change with specific requirements.
The decorrelative transformation unit of adjacent layer is successively staggered, and every layer of decorrelative transformation unit includes N number of in level The decorrelation unit structure block of direction arrangement, each decorrelation unit structure block all have left input terminal, right input terminal, left output End and right output end, Cyclic Symmetry decorrelation framework are two-dimensional surface framework or three-dimensional cylinder framework.The present invention can improve fortune Efficiency is calculated, the space flexibility for improving hardware execution, data broadcasting route can be effectively removed, input data vector can be handled simultaneously All possible arrangement.
Fig. 3 is the structural schematic diagram of decorrelation unit structure block in the present embodiment, a left side for decorrelative transformation building unit block Input terminal inputRight input terminal inputLeft output end outputRight output end output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data in each output channel data sequence Sample, K are the sum of the data sample in given output channel data sequence, and * is complex conjugate.
The gram of improvement-Schmidt's algorithm dual input list output framework structure block is changed to dual input lose-lose by the present invention Design out.The newly-designed main purpose seeks to the symmetry using structure block.Once being using this new structure block Entire operation framework, which seeks maximization parallel processing manner, will become more controllable.
The basic and key concept of " decorrelation " is usually related to signal processing, also general with " orthogonalization " in linear algebra Read closely related, this has been well-known since century more than one." decorrelation " function is used as structure block to construct big rule Mould is parallel, distributed and " Cyclic Symmetry " operation framework, this is the important breakthrough of artificial intelligence and machine learning field.In order to Brief description, Fig. 4 illustrate the situation using only 3 input channels.Fig. 5 then illustrates the case where 4 input channels.By right According to the variation between Fig. 4 and Fig. 5, scalability, the module of this Cyclic Symmetry decorrelation framework can be easily observed Change, perfect concurrency and clearly structural regularity.
Use the structure block (rather than the original dual input list of Fig. 1 export structure block) of this two-output impulse generator of Fig. 3 Purpose is to obtain most efficient communication and data flow between the framework of " closest " (such as adjacent structure block).In Fig. 4 and figure In 5 this innovation framework, in the case where not introducing data program circuit, smooth, structuring data flow is realized, it is defeated to 3 It is all both that an important design object and one are important for entering channel and 4 input channel both of these case respectively Performance objective.
In addition to realizing scalability, concurrency, structural regularity etc., another importance of the invention is effective and ties The monitoring mechanism of structure, and real time signal processing adjustment is carried out based on intermediate output data automatically, allow data very convenient It is extracted from framework on ground.Fig. 6 is the pass that preprocessing process in the present embodiment, last handling process and activity handle intermediate output process System schemes, and in Fig. 6, N input channel connects preprocessing process, and 2N output channel connects last handling process, and preprocessing process is straight It connects and is connect with last handling process, or intermediate output process is handled by activity and is connect with last handling process.Fig. 6 is Fig. 5 One extended version, wherein also clearly illustrating pretreatment, post-processing and the intermediate output (Dynamic of activity processing Processing of Intermediate Outputs, abbreviation DPIO) etc. processes.
Pretreatment and the general audient of last handling process know about.And activity handle concept involved in intermediate output process and Task is then not as well known, and is heavily dependent on the demand of Setting signal processing problem.It can be said that at activity Manage intermediate output process for many advanced signals processing problem (including in radar and wireless communication field to it is anti-interference relevant Problem) it is vital.
From Fig. 4 and Fig. 5 as can be seen that current layer decorrelative transformation unit relative to upper one layer of decorrelative transformation unit to Right avertence moves set distance.In the decorrelative transformation unit of first layer, the left input terminal of each decorrelation unit structure block and left side Nearest input channel connection, the right input terminal input channel nearest with right side are connect, the decorrelation unit structure block at beginning Left input terminal point is also connect with the right input terminal for the decorrelation unit structure block for being located at end.
In the decorrelative transformation unit of the second layer to N-2 layers, the left input of the decorrelation unit structure block of current layer End connect with the right output end of the nearest decorrelation unit structure block in upper one layer of left side, right input terminal and it is one layer upper on the right side of recently The left output end of decorrelation unit structure block connects, the left output end of the decorrelation unit structure block at beginning and next layer of end The right input terminal of correlation unit structure block connects, the left input of the right output end correlation unit structure block nearest with next layer of right side End connects, the left output end of remaining the correlation unit structure block decorrelation unit structure block nearest with next layer of left side in current layer Right input terminal connection.
In N-1 layers of decorrelative transformation unit, the left output end and right output end of each decorrelation unit structure block It is used as output channel.
In Fig. 5, there are 4 input channels, generates 8 output channels.It need to only check relevant to any specific output channel Processing sequence is appreciated that entire implementation.It herein can be with output channel X3 4(k;3,2,1) for, to illustrate related place Manage sequence:
Subscript " 3 " represents this specific output channel, once crosses " decorrelation " processing of 3 levels.(note: defeated in Fig. 5 Enter the subscript " 0 " in channel symbol, represent in the input channel initial stage, be also not carried out any " decorrelation " processing).
Coefficient " k " in bracket, represents the specific data sample in each output channel data sequence." k " is arranged from 1 to K Sequence, K are the sum of the data sample in given output channel data sequence.
Subscript " 4 " and X3 4(k;3,2,1) other remaining symbols in, reflect this specific output channel, are logical Cross complete the 4th input channel and other input channels it is all may with generated after necessary " decorrelation " operation.
Since the present invention relates to the exploitation of " N input -2N output " processing framework, each input channels corresponding two A output channel.In the present embodiment, X3 4(k;And X 3,2,1)3 4(k;1,2,3) all with same, i.e. the 4th article of input channel be opposite It answers, all develops in other words from this channel.Theoretically, this two output channels will generate same output data sequence. Note: although their theoretical output valve is identical, due to system noise and other reasons, they may not necessarily generate identical Output data sequence.
And in the present embodiment, X3 4(k;And X 3,2,1)3 4(k;1,2,3) the main distinction is because they execute " decorrelation " When different order caused by.That is: X3 4(k;3,2,1) generation of output channel starts from the 4th article and the 3rd article of input channel " decorrelation ", and X3 4(k;1,2,3) generation of output channel then starts from " decorrelation " of the 4th article with the 1st article of input channel, And so on.In other words, the number order in bracket represents the sequence of " decorrelation " generation.Therefore this numbering system Design, all " decorrelation " steps in entire processing framework can be allowed to be more clear clear.Understand and utilizes " decorrelation " Sequentially, extremely important to signal specific processing scene.In addition, above-mentioned number and coefficient system design, can also support to use or It is related to the engineering development of the application of this processing architecture invention.
From above-mentioned about output channel X3 4(k;And X 3,2,1)3 4(k;1,2,3) explanation can be seen that this from one Specific input channel generates the unique ability of two output channels, from the design of symmetrical structure block and perfect structuring system System design.It can be seen that there is very big difference in the gram of new processing framework and improvement-Schmidt's framework.
It is detailed the step of being related to large-scale parallel " decorrelation " process (from one layer of framework to another layer) above Explanation.It is emphasized that the activity in Fig. 6 handles intermediate output process, and it is actually very efficient, because entirely handling All centres " decorrelation " output is generated in framework, is all fairly regular and structuring.
After explaining above-mentioned calculating process and step, can go to below certain given input channel of processing (such as: To given input data vector) all possible arrangement this task on.Firstly, one has giving for N input channel All number of permutations summations for determining system are equal to N!.Therefore, the case where being related to 3 input channels this for Fig. 4, the sum of arrangement It is 3!=6, and it is this for Fig. 5 have the case where 4 channels, the sum of arrangement is 4!=24, so analogize.
In order to better illustrate, be listed below Fig. 5 it is this include 4 input channels all possible arrangements it is as follows:
First group:
(X1,X2,X3,X4)(X2,X3,X4,X1)(X3,X4,X1,X2)(X4,X1,X2,X3)
(X4,X3,X2,X1)(X3,X2,X1,X4)(X2,X1,X4,X3)(X1,X4,X3,X2)
Second group:
(X2,X1,X3,X4)(X1,X3,X4,X2)(X3,X4,X2,X1)(X4,X2,X1,X3)
(X4,X3,X1,X2)(X3,X1,X2,X4)(X1,X2,X4,X3)(X2,X4,X3,X1)
Third group:
(X3,X1,X2,X4)(X1,X2,X4,X3)(X2,X4,X3,X1)(X4,X3,X1,X2)
(X4,X2,X1,X3)(X2,X1,X3,X4)(X1,X3,X4,X2)(X3,X4,X2,X1)
Before above three groups of arrangements are described in detail and why to be in this way grouped them, Fig. 7 is first had a look, figure 7 illustrate the case where carrying out " rolling " operation to Fig. 5.This two-dimensional surface framework that would look like plane becomes three-dimensional cylinder " rolling " operation is possible, because having found the feature of its symmetry in process of innovation of the invention.
The cylindrical structure of Fig. 7 can be used as visualization tool, can easily see that first group of arrangement listed above The logic of list behind.Above first group, from (X1,X2,X3,X4) this arrangement starts, then can be regarded as one and followed Circle permutation, naturally next arrangement is just from X2Start, i.e. (X2,X3,X4,X1), so analogize.This is arranged in order.It connects down Coming us can be reversed this process of repetition, from X4Start, formation sequence: (X4,X3,X2,X1),(X3,X2,X1,X4) etc..Thus It is clearly understood that the method for the 8 kinds of arrangements of first group of the cylindrical structure of 4 input channels of identification.
And in the case where 4 input channels, a total of 24 kinds of arrangements.Therefore, if it is desirable to handle all 24 kinds simultaneously Arrangement, then need 3 cylinders.Three groups are clearly outlined above, and every group of each 8 kinds of arrangement is to show and need at 3 cylinders The special circumstances of reason.
Calculate the basic formula N of all possible arrangement sums for the system for having N input channel!On, according to above-mentioned explanation With and its characteristics of Cyclic Symmetry, calculate the exact formulas of " while cylinder sum needed for handling all possible arrangements " are as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
Therefore as N=3, while handling the required cylinder sum of all possible arrangements is 1;
As N=4, while handling the required cylinder sum of all possible arrangements is 3;
And as N=5, while handling the required cylinder sum of all possible arrangements is 12;So analogize.
Fig. 8 graphically summarizes above situation.
It must be stressed that the calculating and realization of the above-mentioned cylinder sum needed for handling all possible arrangements simultaneously Details is refered in particular to " while handling " such case, that is, is discussed: and if only if it is all possible arrangement be all concurrent operation 's.Certainly, in many actual conditions or special applications, this is not actual requirement.However, the present invention is directed to propose one Innovation and " complete " solution.A other application or solution no doubt can be simplified or be determined as the case may be System.
Current all main artificial intelligence hardware companies be all committed to various optimisation strategies and with CPU, GPU, FPGA, The relevant service efficiency problem of the advantage and disadvantage such as ASIC.Obviously, it is suitable for all possible task without a kind of hardware technology and answers With scene, but it is same it is ensured that the role of artificial intelligence accelerator is just becoming further important.It is provided with wikipedia Define consistent, artificial intelligence accelerator is generally interpreted as a kind of microprocessor, arithmetic system, even certain microchip, this Kind microchip is designed to the hardware accelerator of artificial intelligence application.In view of the above-mentioned feature and advantage referred to, relative to For artificial intelligence accelerator field, the present invention is a great innovation contribution.It should be noted that artificial intelligence accelerator It usually oneself will not individually work, and work together with the other equipment of given system and/or subsystem.These equipment and subsystem System includes but is not limited to: CPU, GPU, FPGA, ASIC etc..It should also be noted that this innovation algorithm of the present invention and its correlation can Flexible distributed structure/architecture can execute in a given application in such a way that hardware, software or hardware-software combine.
The present invention improves the breakthrough of existing intelligent algorithm and concurrent operation framework field.The present invention can It is applied to: (1) generally improves the validity of search engine technique;(2) the real-time analysis and monitoring of catenet are supported, it is such as complete State or regional analysis of electric power consumption;(3) Speeding up development and deployment machine learning (critical subsets of artificial intelligence) are in all trades and professions, packet Include the application of wireless communication field;(4) support be cooperative association and Government Analytical critical data, and by almost in real time in a manner of pass Broadcast result.These analysis output datas may include scientific achievement, data related with international crisis situation etc..
In short, central task mentioned by the present invention is the key that realize " decorrelation " and base in a manner of large-scale parallel This function.New algorithm related to the present invention and relevant concurrent operation system are properly named as, and " artificial intelligence accelerates Device ".In addition, " Cyclic Symmetry " this phrase is used, with symmetrical spy the most key in this innovation operation framework of accurate description Point.
It is this can simultaneously and the innovations of all possible arrangements that input data vector is effectively treated, to practical artificial The exploitation of intelligent solution has important and delicate influence.These actual artificial intelligence solutions are developed to be usually directed to Certain important engineerings and mathematical concept, such as improve convergence rate, possess computing redundancy to reduce potential information as far as possible It loses, monitoring and tracking etc. is executed to intermediate output data.A key breakthrough related to the present invention is, it is resulting simultaneously Row processing framework is that pipelining, modularization, scalable and structure are efficient completely, does not waste any " data handling procedure real estate " , this is the critical consideration of any artificial intelligence accelerator microchip designs.
Due to flexibility of the invention, scalability and parallel processing, wire location is carried out using feedback control loop more Entire framework, is compressed into a more compact physical structure, it is made to be more suitable for certain places by the obvious modification and fine tuning of kind form Scene etc. is managed, will be become possible.In addition, these possible modifications and variations, it would be possible in signal processing and apply number by those There is experience personnel proposition in field.It is, therefore, understood that the above-mentioned all main points and opinion referred to, are provided to cover this The modifications and variations that may be proposed in the future a bit, here it is really core purposes of the invention.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of artificial intelligence accelerator with Cyclic Symmetry decorrelation framework, which is characterized in that including being integrated in microchip Interior Cyclic Symmetry decorrelation framework, the Cyclic Symmetry decorrelation framework include N input channel, N-1 layers of decorrelative transformation Unit and 2N output channel, N are integer and N >=3, and the decorrelative transformation unit of adjacent layer is successively staggered, and every layer is gone phase Closing processing unit includes N number of decorrelation unit structure block arranged in the horizontal direction, each decorrelation unit structure block Left input terminal, right input terminal, left output end and right output end are all had, the Cyclic Symmetry decorrelation framework is two-dimensional surface frame Structure or three-dimensional cylinder framework.
2. the artificial intelligence accelerator according to claim 1 with Cyclic Symmetry decorrelation framework, which is characterized in that institute State the left input terminal input of decorrelative transformation building unit blockRight input terminal inputLeft output end outputRight output end output
Wherein, m is decorrelation level, and i, j are channel index, and k is the specific data sample in each output channel data sequence, K is the sum of the data sample in given output channel data sequence, and * is complex conjugate.
3. the artificial intelligence accelerator according to claim 2 with Cyclic Symmetry decorrelation framework, which is characterized in that when Front layer decorrelative transformation unit is relative to upper one layer of decorrelative transformation unit offset setting distance to the right.
4. the artificial intelligence accelerator according to claim 3 with Cyclic Symmetry decorrelation framework, which is characterized in that the In one layer of decorrelative transformation unit, left input terminal and the nearest input channel in left side of each decorrelation unit structure block connect Connect, the right input terminal input channel nearest with right side is connect, the left input terminal of the decorrelation unit structure block at beginning point also with position In the right input terminal connection of the decorrelation unit structure block of end.
5. the artificial intelligence accelerator according to claim 4 with Cyclic Symmetry decorrelation framework, which is characterized in that In In the decorrelative transformation unit of the second layer to N-2 layers, the left input terminal of the decorrelation unit structure block of current layer with upper one layer The right output end of the nearest decorrelation unit structure block in left side connects, the right input terminal decorrelation unit nearest with upper one layer of right side The left output end of structure block connects, the correlation unit structure of the left output end of the decorrelation unit structure block at beginning and next layer of end The right input terminal connection of block is built, the left input terminal of the right output end correlation unit structure block nearest with next layer of right side is connect, when The right input of the left output end of remaining the correlation unit structure block decorrelation unit structure block nearest with next layer of left side in front layer End connection.
6. the artificial intelligence accelerator according to claim 5 with Cyclic Symmetry decorrelation framework, which is characterized in that In In N-1 layers of decorrelative transformation unit, the left output end and right output end of each decorrelation unit structure block are as output Channel.
7. according to claim 1 to described in 6 any one with Cyclic Symmetry decorrelation framework artificial intelligence accelerator, It is characterized in that, the N input channel connects preprocessing process, and the 2N output channel connects last handling process, described pre- Treatment process is directly connect with the last handling process, or handles intermediate output process and the last handling process by activity Connection.
8. according to claim 1 to described in 6 any one with Cyclic Symmetry decorrelation framework artificial intelligence accelerator, It is characterized in that, while the formula for handling the required cylinder sum of all possible arrangements is as follows:
(N-1)!/2
Wherein, N is the item number of input channel, and N >=3.
CN201910642363.8A 2019-07-16 2019-07-16 Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework Pending CN110471883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910642363.8A CN110471883A (en) 2019-07-16 2019-07-16 Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910642363.8A CN110471883A (en) 2019-07-16 2019-07-16 Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework

Publications (1)

Publication Number Publication Date
CN110471883A true CN110471883A (en) 2019-11-19

Family

ID=68508776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910642363.8A Pending CN110471883A (en) 2019-07-16 2019-07-16 Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework

Country Status (1)

Country Link
CN (1) CN110471883A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926730A (en) * 2019-12-06 2021-06-08 三星电子株式会社 Method and apparatus for processing data
WO2022121756A1 (en) * 2020-12-08 2022-06-16 Huawei Technologies Co.,Ltd. System, method and apparatus for intelligent caching

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4941117A (en) * 1988-09-07 1990-07-10 General Electric Company Multiple channel adaptive correlation/decorrelation processor
CN109525288A (en) * 2018-11-28 2019-03-26 广州市高峰科技有限公司 For wirelessly communicating the parallel processing architecture of decorrelation operation
CN109725937A (en) * 2018-11-28 2019-05-07 广州市高峰科技有限公司 Parallel processing architecture and IC chip for machine learning decorrelation operation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4941117A (en) * 1988-09-07 1990-07-10 General Electric Company Multiple channel adaptive correlation/decorrelation processor
CN109525288A (en) * 2018-11-28 2019-03-26 广州市高峰科技有限公司 For wirelessly communicating the parallel processing architecture of decorrelation operation
CN109725937A (en) * 2018-11-28 2019-05-07 广州市高峰科技有限公司 Parallel processing architecture and IC chip for machine learning decorrelation operation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王斌等: "自适应天线阵正交预处理器的研制", 《通信学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926730A (en) * 2019-12-06 2021-06-08 三星电子株式会社 Method and apparatus for processing data
WO2022121756A1 (en) * 2020-12-08 2022-06-16 Huawei Technologies Co.,Ltd. System, method and apparatus for intelligent caching

Similar Documents

Publication Publication Date Title
Meng et al. Population-based incremental learning algorithm for a serial colored traveling salesman problem
Wang et al. A multi-order distributed HOSVD with its incremental computing for big services in cyber-physical-social systems
Kim et al. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices
CN110471883A (en) Artificial intelligence accelerator with Cyclic Symmetry decorrelation framework
Burgholzer et al. Improved DD-based equivalence checking of quantum circuits
CN105678401A (en) Global optimization method based on strategy adaptability differential evolution
CN103942753A (en) Multi-dimensional quantum colored image geometric transformation design and achieving method
Liu et al. A surrogate-assisted two-stage differential evolution for expensive constrained optimization
CN101364245B (en) Electromagnetic environment prediction system for multipole database
Xu et al. E $^ 2$ DNet: An Ensembling Deep Neural Network for Solving Nonconvex Economic Dispatch in Smart Grid
Wang et al. A matrix approach for the static correction problem of asynchronous sequential machines
CN109725937A (en) Parallel processing architecture and IC chip for machine learning decorrelation operation
Chen et al. Automatic and exact symmetry recognition of structures exhibiting high-order symmetries
Praba et al. Semiring on roughsets
Dong et al. Research on modeling method of power system network security risk assessment based on object-oriented Bayesian network
He et al. Ensemble learning for wind profile prediction with missing values
Krasnobayev et al. Information Security of the National Economy Based on an Effective Data Control Method
Liu et al. A set-based discrete differential evolution algorithm
Gu et al. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic
CN110361702A (en) The processing method of Radar jam signal
Shang et al. Circularly searching core nodes based label propagation algorithm for community detection
Pan et al. Further remark on P systems with active membranes and two polarizations
Dieudonné et al. Swing words to make circle formation quiescent
CN103761379B (en) A kind of earth observation satellite multidisciplinary optimization system based on envelope Conjugate Search Algorithm system
Su et al. The attack efficiency of PageRank and HITS algorithms on complex networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191119