CN110837891B

CN110837891B - Self-organizing mapping method and system based on SIMD (Single instruction multiple data) architecture

Info

Publication number: CN110837891B
Application number: CN201911014330.5A
Authority: CN
Inventors: 李丽; 张衡; 傅玉祥; 黄延; 何国强; 何书专
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2022-05-17
Anticipated expiration: 2039-10-23
Also published as: CN110837891A

Abstract

The invention relates to a self-organizing mapping method and a self-organizing mapping system based on a SIMD architecture, wherein the method comprises the following steps: the distance calculation module is used for finding the best matching competition layer neuron with the minimum distance by calculating the Manhattan distance between the input vector and the weight vector of the corresponding competition layer neuron; the learning rate and field radius calculation module is used for updating the learning rate and the field radius through shift operation; a cooperation module in which neurons winning in competition are not individually excited but are excited collectively by determining a field in which the neurons winning in competition are centered; and the weight updating module is used for updating the corresponding weights of the neurons in the inner part of the radius of the field of the winning neuron and storing the calculated weights back to the SRAM on the chip. The method supports the arbitrary classification of samples and the operation of arbitrary characteristics, and can meet the requirements of low complexity and high precision.

Description

Self-organizing mapping method and system based on SIMD (Single instruction multiple data) architecture

Technical Field

The invention relates to a hardware implementation of an unsupervised neural network algorithm, in particular to a hardware implementation of a self-organizing mapping algorithm based on a SIMD architecture, which can represent input data in a low-dimensional space and can also be used in applications such as data visualization, clustering and the like.

Background

The self-organizing map (SOM) algorithm is an unsupervised learning algorithm for clustering and high-dimensional visualization, and is an artificial neural network developed by simulating the characteristics of human brain on signal processing. The SOM algorithm was proposed by Teuvo Kohonen in 1981, and has now become the most widely applied self-organizing neural network method, in which the wta (winner Takes all) competition mechanism reflects the most fundamental feature of the self-organizing map learning algorithm.

The SOM algorithm has good clustering effect due to its characteristics such as topology structure maintenance, probability distribution maintenance, instructor-free learning and visualization, and thus receives wide attention, and various results of application and research on the SOM algorithm are emerging, and are widely applied to a plurality of information processing fields such as voice recognition, image processing, classification and clustering, combinatorial optimization (such as TSP problem), data analysis and prediction.

Compared with other neural networks, the structure of the self-organizing neural network and the learning rule thereof have own characteristics. In the network structure, the self-organizing neural network is a two-layer network formed by an input layer and a competition layer; the neurons between the two layers realize bidirectional connection, and the self-organizing neural network has no hidden layer. The competition layer can be a one-dimensional, two-dimensional or three-dimensional array, and the input layer is the sample characteristic quantity.

In the SOM network model, each weight coefficient vector can be regarded as an internal representation of the input vector in the neural network, or it is a mapping of the input vector, and the purpose of the self-organizing function of the SOM network model is to make the neural network converge on a representation form by adjusting the weight coefficients. In this representation, each neuron is only specifically matched or sensitive to a certain input pattern. In other words, the purpose of the SOM is to allow the morphological representation of the weight coefficients of the neurons to indirectly mimic the input pattern.

Disclosure of Invention

The purpose of the invention is as follows: the self-organizing mapping method based on the SIMD architecture is provided, so that a good clustering effect of a self-organizing mapping algorithm is realized under limited resources, and the training speed and the decision speed of the algorithm are improved.

The technical scheme is as follows: a self-organizing mapping method based on SIMD architecture includes the following steps:

s1: storing the initialization weight W and the input neurons into a storage unit according to a rule, and initializing a field h, a learning rate and the maximum training times of the network;

s2: taking one sample from the total samples as network input, and taking corresponding input features and weights from a storage unit;

s3: calculating Manhattan distances from the input neurons to all the neurons in the competition layer, determining position coordinates of the winning neurons according to the minimum distance, calculating the distance from the input neurons to the neurons in the competition layer with coordinates (i, j), finding out the minimum distance value, and determining a calculation formula of the winning neurons

S4: taking out the input and weight from the memory unit, and adjusting the network weight of the winning neuron and the neuron in the field, wherein the formula of the weight is W_i(n+1)＝W_i(n)+η(n)h_j,i(n)(X-W_i(n)), restoring the updated weight to the storage unit, and overwriting the weight before updating;

s5: when all learning samples are not calculated, repeating the steps of S2-S5, otherwise executing S6;

s6: adjusting learning rate function and domain function, learning rate

Wherein eta₀Is the initial learning rate; radius of area of

Wherein

S7: if the current training times do not reach the set maximum training times, returning to S2, otherwise, terminating the loop and ending the learning process.

In a further embodiment, the method comprises the following steps: the data storage unit comprises 8N subunits, wherein 4N subunits store input neuron data, randomly generated weights are stored in the other 2N subunits in a ping-pong manner, winning neurons obtained through training are stored in the other 2N subunits, and the winning neurons obtained through next training cover the last winning neuron data.

In a further embodiment, the method comprises the following steps: the whole input sample can be traversed by D batches, the total training frequency is K, and the frequency of ping-pong operation required by the weight is DxK.

In a further embodiment, the method comprises the following steps: under the condition that computing resources and storage resources are fixed, the same set of resources are used, one controller is adopted to control a plurality of processors, and the spatial parallel operation is realized by executing the operation of the same operation respectively.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method as described above are implemented when the computer program is executed by the processor.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.

A SIMD architecture based self-organizing map system comprising:

the data storage module comprises a plurality of storage units with preset depths, and each storage unit comprises a plurality of subunits which are respectively used for storing input neuron data, randomly generated weights and trained winning neuron data;

the distance calculation module is used for respectively taking out the data of each input neuron and the corresponding weight from the storage unit, calculating the Manhattan distance sum of the input neuron and the weight corresponding to each neuron of the competition layer, calculating the minimum distance by a bubble sorting method, obtaining the coordinate of a winning neuron, and finally storing the coordinate into the storage module;

the learning rate and field radius calculation module is used for calculating the learning rate and the field radius by using shift operation and adopting two registers for storage, and the calculation result is written back to the original register again each time;

the weight updating module and the cooperation module are used for taking out the neuron data and the weights from the storage unit and carrying out updating operation, and the updated weights are stored in the storage unit according to ping-pong storage.

In a further embodiment, the data storage unit comprises 8N subunits, wherein 4N subunits store input neuron data, randomly generated weights are ping-pong stored in another 2N subunits, the winning neuron obtained from the training is stored in another 2N subunits, and the winning neuron obtained from the next training overwrites the last winning neuron data.

Has the advantages that:

first, the learning rate and the field radius calculation of the invention both use shift operation, thereby greatly reducing the operation resources and the operation time.

Secondly, the learning rate and the field radius are gradually converged along with the training times, so that the clustering effect can be better improved.

Thirdly, calculating the Manhattan distance sum of the input neuron and each neuron of the competition layer, and selecting the winning neuron by a bubble sorting method, thereby saving the operation resource and improving the calculation speed.

Fourthly, the same set of resources are adopted in the reasoning link and the training process, and the utilization rate of the operation resources and the storage resources can be improved by adopting multi-path parallel computation.

In conclusion, the invention can effectively save operation resources and improve the calculation speed, has wide application prospect and has good application value for different occasions.

Drawings

Fig. 1 is a hardware implementation architecture diagram of the self-organizing map algorithm in the present invention.

Fig. 2 is a network architecture diagram of the self-organizing map algorithm.

FIG. 3 is a schematic diagram of data arrangement in the present invention.

Fig. 4 is a schematic diagram of hardware modules in the present invention.

FIG. 5 is a schematic diagram of a computing unit design according to the present invention.

FIG. 6 is a flowchart illustrating the detailed process of updating weights in the present invention.

Fig. 7 is a flow chart of a hardware implementation of the self-organizing map algorithm of the present invention.

Detailed Description

SOMs typically include the following three links:

and (3) competition links: i.e. the selection of the winning neuron. According to the principle of WTA (Winner Takes all), the discrimination function value of each input vector is calculated, the minimum distance is compared, and the neuron used as a competitive winner is obtained according to the minimum distance.

The cooperation link, also called the coordination link: the neurons that win the competition are not individually stimulated, but a neighborhood centered on the winning neuron is first determined, and all neurons in the neighborhood are stimulated together according to a certain rule within the radius of the domain of the winning neuron.

And (3) self-adaptive link: the weights of all neurons in the neighborhood of the winning neuron are adjusted appropriately to increase their discrimination function values for the input pattern. Even if the weight vectors of the output layer neurons change with the input vectors.

The SOM network can map any dimension input mode into a one-dimensional, two-dimensional or three-dimensional graph on an output layer, and keep the topological structure of the graph unchanged; the network can make the weight vector space and the probability distribution of the input mode tend to be consistent through repeated learning of the input mode, namely probability retentivity. The neurons of the competition layer of the network compete for the response opportunity to the input mode, and the weights related to the winning neuron are adjusted towards the direction more beneficial to the competition of the winning neuron, namely, the winning neuron is used as the center of a circle, excitatory side feedback is shown for the adjacent neurons, inhibitory side feedback is shown for the far adjacent neurons, the neighbors mutually stimulate, and the far adjacent neurons mutually inhibit. In general, neighbor is a competition layer neuron within a continuously decreasing field radius with increasing training times, from a neuron emitting a signal as a center of a circle; distal neighbors refer to competing layer neurons with radii outside the radius of the field. Neurons further than the far neighbours exhibit weak excitatory effects, and this interaction is also known as the "mexican hat" because the curve of the interaction resembles the hat worn by mexicans.

The learning steps of the SOM algorithm are as follows:

1. and initializing, assigning a network weight to a random value between [0 and 1], and setting an initial value of a learning rate, a neighborhood radius and a total learning frequency T.

2. Sampling, randomly selecting an input mode from the data set, normalizing and inputting the input mode into the neural network.

3. And competing, calculating the distances between all the neurons and the input mode, and finding out the winning neuron corresponding to the input mode.

4. And cooperatively determining a neighborhood range according to a neighborhood function of the neural network.

5. And self-adapting, and updating the weight of the neuron in the neighborhood.

6. And judging whether all samples are input into the neural network, if so, jumping to 7, and otherwise, jumping to 2.

7. The learning rate and neighborhood functions are updated.

8. And stopping the condition, judging whether the iteration number n exceeds T, if so, finishing the algorithm, and otherwise, jumping to 2.

A hardware implementation of a self-organizing map algorithm based on SIMD architecture,

the acceleration core comprises a distance calculation module, a learning rate and field radius calculation module, a cooperation module, a weight value updating module and a data storage module.

In the design framework, under the condition that computing resources and storage resources are fixed, the same set of resources are used in the reasoning link and the training link of the algorithm, and the computing resources are utilized as much as possible by adopting multi-path parallel computing. In the hardware implementation design process, the reasoning process and the training process both adopt a full-flow design mode, and the neuron of the competition layer is supported to be a one-dimensional or two-dimensional graph.

The on-chip SRAM is designed to have storage resources divided into 8N banks, the depth of each bank is M, 4N banks are allocated to store data of all input neurons, and ping-pong access is carried out on randomly generated weights in the other 2N banks; and storing the winning neurons obtained by the last sequential training to the other 2N banks, wherein the winning neurons obtained by the next training cover the data of the last winning neurons. And storing the weights in 2N banks, wherein the whole input sample can be traversed only by D times of calculation, and the total training time is K, and the time of ping-pong operation required by the weights is DxK. The distance calculation module respectively takes out the data of each input neuron and the corresponding weight from the bank, then calculates the Manhattan distance sum of the input neuron and the weight corresponding to each neuron of the competition layer, calculates the minimum distance by a bubble sorting method, obtains the coordinate of the winning neuron, and finally stores the coordinate into the bank.

The learning rate and the field radius calculation module are reduced along with the change of the training times, the learning rate and the field radius calculation both use shift operation, the learning rate and the field radius are stored by two registers, and the calculation result is written back to the original register every time.

The weight updating module respectively takes the input neurons and the weights from the bank, and the weight is W_i(n+1)＝W_i(n)+η(n)h_j,i(n)(X-W_i(n)), h within the radius of the domain according to the cooperative modular approach_j,i(n) is 1, h outside the radius of the domain_j,iAnd (n) is 0, and the updated weight is stored in the bank according to ping-pong storage.

And the cooperation module calculates Euclidean distances between the winning neuron and other competition layer neurons, and the competition layer neurons within the radius of the field of the winning neuron are jointly excited according to a certain rule.

According to one aspect of the invention, a hardware implementation of a self-organizing map algorithm based on a SIMD architecture is provided, the acceleration core of the invention comprising: the device comprises a data storage module, a distance calculation module, a learning rate and field radius calculation module, a weight value updating module and a cooperation module.

Under the condition that computing resources and storage resources are fixed, the same set of resources are used in the reasoning link and the training link of the algorithm, the SIMD architecture can adopt one controller to control a plurality of processors, and spatial parallel operation can be realized for the operation of respectively executing the same operation.

Specifically, the on-chip data storage module sets storage resources to be 32N banks, the depth of each bank is M, 16N banks are allocated to store data of all input neurons, and ping-pong access is performed on randomly generated weights in the other 8N banks; and storing the winning neurons obtained by the last sequential training into the other 8N banks, covering the data of the last winning neuron by the winning neurons obtained by the next training, storing the weights in the 8N banks, traversing the whole input sample by D batches (banks), wherein the total training frequency is K, and the frequency of ping-pong operation required by the weights is D multiplied by K.

The distance calculation module is obtained by judging the Manhattan distance between the input and each competitive neuron in the competitive layer, the competitive neuron with the minimum Manhattan distance is the winning neuron, the Manhattan distance is solved by obtaining the distance difference only by using an adder, then the size is judged by a comparator, 8N paths of parallel calculation can be carried out under the condition that the storage resources are set to be 32N banks, and the winning neuron is obtained by a bubble sorting method.

The learning rate and the field radius calculation module are reduced along with the change of the training times, and the learning rate

Wherein eta₀Is the initial learning rate; radius of area of

Wherein

The learning rate and the field radius are calculated by using shift operation, the learning rate and the field radius are stored by using two registers after right shift is performed once every iteration, and the calculation result is written back to the original register again every time.

The weight updating module respectively takes the input neurons and the weights from the bank, and the weight is W_i(n+1)＝W_i(n)+η(n)h_j,i(n)(X-W_iAnd (N)), according to the cooperation module flow, the updated weight is stored in the bank according to a ping-pong storage mode, and under the condition that the storage resources are set to 32N banks, 16N paths of parallel calculation can be realized.

The cooperation module calculates the Euclidean distance between the winning neuron and other competition layer neurons, can remove the root of the Euclidean distance, and performs square calculation on the field radius, so that the logic resource and the operation time are reduced, under the condition that the storage resource is set to be 32N banks, 8N paths of parallel calculation are realized, and the competition layer neurons within the field radius of the winning neuron are jointly excited according to a certain rule.

The hardware architecture shown in fig. 1 is an example of the present invention, and the example implements the construction of a self-organizing map algorithm, including a control module, a distance calculation module, a learning rate and domain radius calculation module, a cooperation module, a weight value update module, and the transportation and storage of data.

Fig. 2 is a network structure diagram of the self-organizing map algorithm, wherein each circle of the input layer represents a feature, each circle of the competition layer represents a classification, but the actual classification situation is less than the number of circles of the competition layer. Each input has a weight for the competition layer neurons.

Fig. 3 is a schematic diagram of data arrangement, where the storage resource is set to 32N banks, each bank has a depth of M, 16N banks are allocated to store data of all input neurons, and a randomly generated weight is used for ping-pong access in another 8N banks.

Fig. 4 is a schematic diagram of hardware modules, and the process is as follows:

s1: after receiving the starting signal, the address generation module generates a source data address, and the controller reads the source data according to the signal;

s2: the training unit is used for training, iteration is carried out for multiple times to obtain updated weights, and the trained weights are stored on the SRAM by the controller;

s3: the reasoning unit classifies the source data according to the trained weight;

the process of the design schematic diagram of the computing unit of the collaboration module shown in FIG. 5 is as follows:

s1: calculating the domain radius of the winning neuron, and updating the domain radius through a shifting operation;

s2: giving the fixed coordinate position of the neuron of the competition layer, and calculating the square of the Euclidean distance between the winning neuron and the neuron of the competition layer through 8 paths in parallel;

s3: comparing the square of Euclidean distance between the winning neuron and the neuron in the competition layer with the square of the radius of the domain of the winning neuron, wherein the neuron updates the weight according to the rule within the radius of the domain;

the specific process of updating the weights shown in fig. 6 is as follows:

s1: taking out input neurons and weights from the bank for weight updating;

s2: obtaining a competition layer neuron needing to update the weight value through a cooperation module according to W_i(n+1)＝W_i(n)+η(n)h_j,i(n)(X-W_i(n)) updating the weights, wherein the weight updating is performed through 16 paths of parallel calculation;

s3: the updated weight is stored in the bank again, and the weight before updating is covered;

example 1

Through the above description of the modules, the hardware implementation process of the self-organizing map algorithm shown in fig. 7 is as follows:

s1: storing the initialization weight W and the input neurons into an SRAM according to a rule, and initializing a field h, a learning rate and the maximum training times of a network;

s2: firstly, taking a sample from the total sample as network input, and taking corresponding input characteristics and weights from an SRAM;

s3: and calculating Manhattan distances from the input neuron to all the competitive layer neurons, and determining the position coordinates of the winning neuron according to the minimum distance. Calculating the distance between the neuron in competition layer with coordinates (i, j), finding the minimum distance value, and determining the calculation formula of winning neuron

S4: taking input and weight from SRAM, and adjusting network weight of winning neuron and neuron in field, wherein weight formula is W_i(n+1)＝W_i(n)+η(n)h_j,i(n)(X-W_i(n)), restoring the updated weight toIn the bank, covering the weight before updating;

s6: adjusting learning rate function and domain function, learning rate

Wherein eta₀Is the initial learning rate; radius of area of

Wherein

The self-organizing mapping algorithm based on the SIMD architecture supports the operation of any sample classification and any sample size, reduces the source data calculation amount of the traditional hardware implementation mode, balances calculation and storage resources to realize the maximized multi-path parallel, and can effectively save operation resources and improve the calculation speed. As a typical clustering algorithm in the field of machine learning, the method has wide application prospect and good application value for different occasions.

Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the embodiments, and various equivalent modifications can be made within the technical spirit of the present invention, and the scope of the present invention is also within the scope of the present invention.

Claims

1. The self-organizing mapping method based on the SIMD architecture is characterized in that under the condition that computing resources and storage resources are fixed, the same set of resources are used, one controller is adopted to control a plurality of processors, and the operations of the same operation are respectively executed, so that the parallel operation on the space is realized; the method comprises the following steps:

s1: storing the initialization weight W and the input neurons into a storage unit according to a rule, and initializing a field h, a learning rate and the maximum training times of the network; the storage unit comprises 8N subunits, wherein 4N subunits store input neuron data, randomly generated weights are stored in the other 2N subunits in a ping-pong manner, winning neurons obtained by training are stored in the other 2N subunits, and the winning neurons obtained by the next training cover the last winning neuron data;

；

S4: taking out the input and weight from the storage unit, and adjusting the network weight of the winning neuron and the neuron in the field, wherein the weight is expressed by

The updated weight is stored in the storage unit again to cover the weight before updating;

s6: adjusting learning rate function and domain function, learning rate

Wherein

Is the initial learning rate; radius of area of

Wherein

；

S7: if the current training times do not reach the set maximum training times, returning to the step S2, otherwise, terminating the circulation and ending the learning process.

2. A SIMD architecture based self-organizing map method according to claim 1, comprising the steps of: the whole input sample can be traversed by D batches, the total training frequency is K, and the frequency of ping-pong operation required by the weight is DxK.

3. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 2 are implemented by the processor when executing the computer program.

4. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 2.

5. A SIMD architecture based self-organizing map system comprising:

the data storage module comprises a plurality of storage units with preset depths, each storage unit comprises 8N subunits, wherein 4N subunits store input neuron data, randomly generated weights are stored in the other 2N subunits in a ping-pong manner, the winning neurons obtained by training are stored in the other 2N subunits, and the winning neurons obtained by the next training cover the last winning neuron data;