CN109725937A

CN109725937A - Parallel processing architecture and IC chip for machine learning decorrelation operation

Info

Publication number: CN109725937A
Application number: CN201811435477.7A
Authority: CN
Inventors: 袁闻峰
Original assignee: Guangzhou Gaofeng Technology Co Ltd
Current assignee: Guangzhou Gaofeng Technology Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2019-05-07

Abstract

The invention discloses a kind of parallel processing architectures and IC chip for machine learning decorrelation operation, including decorrelation unit, including two input channels and two output channels, it is used to after carrying out the data vector that input channel inputs decorrelation operation export through output channel；Parallel processing architecture includes N-1 layers of decorrelation unit, and each layer includes N number of decorrelation unit, faces a layer decorrelation unit and shifts to install, interconnection；The input terminal of first floor decorrelation unit is equipped with N number of input data vector, and each input data vector inputs adjacent or two decorrelation units of head and the tail；The output end of last layer decorrelation unit is equipped with 2N output data vector, and N is the integer greater than 2.The present invention maximizes the quantity for the output channel that can be created；In pipelined fashion, using and realize the symmetrical of maximum quantity, to execute all necessary calculating；For cross-check final output data accuracy and conformity provide practical, system method.

Description

Parallel processing architecture and IC chip for machine learning decorrelation operation

Technical field

The present invention relates to a kind of parallel processing architectures and IC chip for machine learning decorrelation operation.

Background technique

Artificial intelligence changes the looks of all trades and professions in a manner of unprecedented.It is well known that artificial intelligence is related The range that is related to of application it is very wide: belong to wherein from intelligent stock exchange software to the system of control automatic driving.People Work intelligence and machine learning are two very popular terms, are often used interchangeably.However, based on the solution to machine learning It releases, i.e. " we simply can allow machine to touch data, themselves is allowed to learn " this viewpoint, machine learning should be considered as A kind of implementation method of artificial intelligence.Closely related mathematical concept and algorithm have much with machine learning, Linear Algebra In " singular value decomposition " (Singular Value Decomposition, abbreviation SVD) may be most popular and most important It is a kind of.

With the arrival of big data era, the ability that people collected and obtained data is more and more stronger.What these data had Feature is: high-dimensional, extensive and complicated.The high-dimensional efficiency that can seriously affect data mining algorithm of data.Therefore " drop Dimension " becomes the top priority of big data excavation and machine learning, and " singular value decomposition " then becomes the key tool of " dimensionality reduction ".More than It is the method from simple, the non-mathematics of one kind of information retrieval and data mining angle introduction " singular value decomposition "." singular value point There are also different titles for solution ", such as principal component analysis (Principal Components Analysis, abbreviation PCA), orthogonalization Functional Analysis etc..Generally speaking, other than the highly useful and important mathematical tool of one kind as machine learning, " singular value The purposes of decomposition " has extended also to many different subsciences, including psychology and sociology, weather and atmospheric science, with And astronomy.

" singular value decomposition " and the key concept " feature decomposition " (i.e. the calculating of characteristic value and feature vector) in mathematics are close Cut phase is closed.(note: singular value relevant to " singular value decomposition " is actually the square root of characteristic value.) have at present it is several very well Calculation method, " singular value decomposition " can be calculated.One of more known method is " iteration QR algorithm " (IterativeQR Algorithm).(note: " iteration QR algorithm " refers to John G.F.Francis and Vera N.Kublanovskaya in the 1950s end according to QR decompose global concept the mathematics mistake of independent invention using iteration Journey.In addition, this is related to the iteration of some macroscopic aspects by the iteration QR algorithm of Francis and Kublanovskaya exploitation Thought should not be obscured with following QR decomposition algorithms referred to.) there are four types of QR to decompose (QR Decomposition) calculation in history Method, i.e. " classical Gram-Schmidt algorithm " (Classical Gram-Schmidt), " Givens rotation " (Givens Rotation), " Householder transformation " (Householder Transformation) and " improvement Gram Schmidt calculation Method " (Modified Gram Schmidt).In in the past few decades, linear algebra field to these four QR decomposition algorithms and Its advantage and disadvantage conducts in-depth research." improvement Gram Schmidt algorithm " be generally considered to input signal channel ( Exactly given matrix column vector) execute " mutually orthogonal " process (being equivalent to the decorrelation operation for executing complete set) most Structuring, the most intuitive and most stable of method of numerical value.And on the other hand, " improvement Gram Schmidt algorithm " still has certain lack It falls into, such as causes to influence computational efficiency without symmetrical treatment framework, need using interminable tie line, it can not be by whole In a processing framework using most short communication line connection adjacent processing units acquisition advantage etc..

Summary of the invention

The present invention proposes a kind of parallel processing architecture and IC chip for machine learning decorrelation operation, solves Cause to influence computational efficiency without symmetrical treatment framework in the prior art, need using interminable tie line, it can not The problem of obtaining advantage and connecting adjacent processing units using most short communication line in entire processing framework.

The technical scheme of the present invention is realized as follows:

A kind of parallel processing architecture for machine learning decorrelation operation, including

Decorrelation unit, including two input channels and two output channels are used for the data inputted to input channel Vector exports after carrying out decorrelation operation through output channel；

The parallel processing architecture includes N-1 layers of decorrelation unit, and each layer includes N number of decorrelation unit, faces layer and goes phase It closes unit to shift to install, interconnection；

The input terminal of first floor decorrelation unit is equipped with N number of input data vector, the input of each input data vector it is adjacent or Two decorrelation units from beginning to end；

The output end of last layer decorrelation unit is equipped with 2N output data vector, and N is the integer greater than 2.

Preferably, the left side input data vector of the decorrelation unit isRight side input data vector isLeft side output data vector isRight side output data vector is

Wherein, m is decorrelation level；

I, j are channel index；

K is the specific data sample in each output channel data sequence；

It * is complex conjugate；

K is the sum of the specific data sample in each output channel data sequence.

Preferably, the input data vector inputs the left side input channel of a decorrelation unit, inputs another decorrelation The right side input channel of unit.

Preferably, the parallel processing architecture is planar structure.

Preferably, the parallel processing architecture is three dimensional cylinder structure.

A kind of IC chip, including described in any item parallel processing framves for machine learning decorrelation operation Structure.

The beneficial effects of the present invention are:

1) quantity for the output channel that can be created is maximized；

2) in pipelined fashion, using and realize the symmetrical of maximum quantity, to execute all necessary calculating；

3) for cross-check final output data accuracy and conformity provide practical, system method.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the structural schematic diagram of decorrelation unit one embodiment；

Fig. 2 is a kind of structural schematic diagram of the parallel processing architecture for machine learning decorrelation operation of the present invention；

Fig. 3 is the functional block diagram of pretreatment stage, parallel processing architecture and post-processing stages；

Fig. 4 is a kind of three dimensional structure diagram of the parallel processing architecture for machine learning decorrelation operation of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

As shown in Figure 1, the invention proposes a kind of parallel processing architectures for machine learning decorrelation operation, including

Decorrelation unit, including two input channels and two output channels are used for the data inputted to input channel Vector exports after carrying out decorrelation operation through output channel；In the present embodiment, decorrelation unit Decorrelation Cell, letter Claim DC.Decorrelation unit is theoretical basis of the invention, and two input data vectors respectively have K data point, two output datas Vector respectively has K data point, and in the present embodiment, K is the integer equal to N.

Parallel processing architecture includes N-1 layers of decorrelation unit, and each layer includes N number of decorrelation unit, faces a layer decorrelation list Member shifts to install, interconnection；

The common calculation formula of decorrelation operation is listed below, but because of the difference of data normalization and other factors, no It can be slightly different with the decorrelation operation between application.

If the left side input data vector of decorrelation unit isRight side input data vector is Left side output data vector isRight side output data vector isDecorrelation unit passes through processing Left side input data channel generates right side output data channel, to remove left side input data channel and right side input data Associated component between channel.When processing terminate, it is between final right side output data channel and right side input data channel Decorrelation, left side output data channel is generated by processing right side input data channel, in the process, right side input Associated component between data channel and left side input data channel is removed.When processing terminate, final left side exports number According to being also decorrelation between channel and left side input data channel.The decorrelation operation of decorrelation unit is equivalent to orthogonalization Operation.

Wherein, m is decorrelation level；

I, j are channel index；

K is the specific data sample in each output channel data sequence；

It * is complex conjugate；

K is the sum of the specific data sample in each output channel data sequence.

In the embodiment of the present invention, in the decorrelation unit of first layer, input data vector inputs a decorrelation unit Left side input channel, input the right side input channel of another decorrelation unit.

As shown in Fig. 2, by using X³ ₄(k；3,2,1) this output channel illustrates processing sequence of the invention:

Subscript " 3 " represents X³ ₄(k；3,2,1) this output channel once crosses the decorrelation operation of 3 levels.Note: in Fig. 2 Subscript " 0 " in input channel symbol represents in the input channel initial stage, is also not carried out any decorrelation operation.

Coefficient " k " in bracket, represents the specific data sample in each output channel data sequence." k " is arranged from 1 to K Sequence, K are the sum of the data sample in given output channel data sequence.

Subscript " 4 " and X³ ₄(k；3,2,1) other remaining symbols in, reflect this specific output channel, are logical Cross complete the 4th input channel and other input channels it is all may with generated after necessary decorrelation operation.

Since the present invention relates to the exploitation of " N input -2N output " processing framework, each input channels corresponding two A output channel.In this example for elaboration:

X³ ₄(k；And X 3,2,1)³ ₄(k；1,2,3) all with same, i.e. the 4th article of input channel is corresponding, in other words all from This channel develops.

Theoretically, this two output channels will generate same output data sequence.But due to system noise and other originals Cause, they may not necessarily generate identical output data sequence.And in this example, X³ ₄(k；And X 3,2,1)³ ₄(k；1,2,3) The main distinction, be because they execute decorrelation operations when different order caused by.That is: X³ ₄(k；3,2,1) output channel Generation, start from the decorrelation operation of the 4th article with the 3rd article of input channel, and X³ ₄(k；1,2,3) generation of output channel, then The decorrelation operation of the 4th article with the 1st article of input channel is started from, and so on.In other words, the number order in bracket represents Decorrelation operation occur sequence.Therefore the design of this numbering system, can allow all decorrelations in entire processing framework Calculation step is more clear clear.The sequence for understanding and utilizing decorrelation operation, to signal specific treatment conditions and machine learning Using extremely important.In addition, above-mentioned number and coefficient system design, can also support to use or be related to this processing architecture invention The engineering development of application.

As shown in figure 3, the input terminal of parallel processing architecture of the invention further includes pretreatment stage, after output end is equipped with The reason stage, the reasonable arrangement that pretreatment stage is related to the preparation of input data and carries out to it, and post-process and be then related to exporting The conversion of data in a particular application.For example, singular value decomposition, QR decompose the substantial connection between the present invention.Between them Correlation is related in the conversion task of the output data that post-processing stages execute in a particular application.

As shown in Fig. 2, parallel processing architecture of the present invention is planar structure.

As shown in figure 4, parallel processing architecture of the present invention is three dimensional cylinder structure.

Parallel processing architecture of the invention can be converted mutually between two and three dimensions, convenient for flexibly setting.

The invention also provides a kind of IC chips, including any one for machine learning decorrelation operation and Row processing framework.Flexible visualization between this 2D to 3D and 3D to 2D of the present invention, for chip area minimum and other cores Piece designs relevant optimization task and provides unlimited possibility.Because this 3D cylindrical structure can be driven plain in different directions, It rolls, scale.

The present invention describes a kind of parallel, modularization and the operation framework that can flexibly stretch in detail, for handling various people Work intelligence and data and signal in machine learning application.It is logical to be usually directed to multiple inputs with traditional data or signal processing applications Road is different with single output channel, and the present invention relates to multiple input channels and multiple output channels.It is many advanced, diversified Using, such as adaptive pulse Doppler processing relevant to real time radar signal processing, improvement cerebral nerve magnetic signal quality, It requires using this " Multiple input-output " processing framework.This novel concurrent operation framework is expected to as next-generation artificial intelligence Energy chip design, which provides, clearly to be instructed, and accelerates the theory and application innovation in machine learning field.

The beneficial effects of the present invention are:

1) quantity for the output channel that can be created is maximized；

Above-mentioned technical proposal discloses improvement of the invention, the technology contents not being disclosed in detail, can be by art technology Personnel are achieved by the prior art.

The above is merely preferred embodiments of the present invention, be not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of parallel processing architecture for machine learning decorrelation operation, it is characterised in that: including

Decorrelation unit, including two input channels and two output channels are used for the data vector inputted to input channel It is exported after carrying out decorrelation operation through output channel；

The parallel processing architecture includes N-1 layers of decorrelation unit, and each layer includes N number of decorrelation unit, faces a layer decorrelation list Member shifts to install, interconnection；

The input terminal of first floor decorrelation unit is equipped with N number of input data vector, and each input data vector inputs adjacent or head and the tail Two decorrelation units；

2. the parallel processing architecture according to claim 1 for machine learning decorrelation operation, it is characterised in that: described The left side input data vector of decorrelation unit isRight side input data vector isLeft side exports number It is according to vectorRight side output data vector is

Wherein, m is decorrelation level；

I, j are channel index；

K is the specific data sample in each output channel data sequence；

It * is complex conjugate；

K is the sum of the specific data sample in each output channel data sequence.

3. the parallel processing architecture according to claim 1 for machine learning decorrelation operation, it is characterised in that: described Input data vector inputs the left side input channel of a decorrelation unit, inputs the right side input channel of another decorrelation unit.

4. the parallel processing architecture according to claim 1 for machine learning decorrelation operation, it is characterised in that: described Parallel processing architecture is planar structure.

5. the parallel processing architecture according to claim 1 for machine learning decorrelation operation, it is characterised in that: described Parallel processing architecture is three dimensional cylinder structure.

6. a kind of IC chip, it is characterised in that: go phase for machine learning including claim 1-5 is described in any item Close the parallel processing architecture of operation.