CN101183405A - Microcosm algorithm hardware platform realizing method based on FPGA - Google Patents

Microcosm algorithm hardware platform realizing method based on FPGA Download PDF

Info

Publication number
CN101183405A
CN101183405A CNA2007101884194A CN200710188419A CN101183405A CN 101183405 A CN101183405 A CN 101183405A CN A2007101884194 A CNA2007101884194 A CN A2007101884194A CN 200710188419 A CN200710188419 A CN 200710188419A CN 101183405 A CN101183405 A CN 101183405A
Authority
CN
China
Prior art keywords
module
individual
fitness
microcosm
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007101884194A
Other languages
Chinese (zh)
Other versions
CN100530202C (en
Inventor
杜海峰
张进华
庄健
杨斌
陈永森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CNB2007101884194A priority Critical patent/CN100530202C/en
Publication of CN101183405A publication Critical patent/CN101183405A/en
Application granted granted Critical
Publication of CN100530202C publication Critical patent/CN100530202C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a hardware system based on a small-world algorithm of a field programmable gate array (FPGA). The principle based on the small-world algorithm comprises an individual optimization level and a global optimization level which are connected by an individual information module. The individual optimization part comprises a random number generator, a position restructuring module, a fitness calculation module and a comparison module which are connected in series. The neighborhood search process of individuals, the fitness calculation process and the optimization process are completed by the individual optimization part. The global optimization part is formed by connecting an individual fitness comparison module and a system dispatch module in order. The individual fitness comparison module can select the optimum individual among all individuals and supply the individual for the system dispatch module to judge whether a terminal condition is met and whether a next generation search is launched. The invention has the advantages that: the hardware can be realized easily, the parallelism is high, local extreme value problem can be avoided effectively, the deceptive problems can be avoided to a certain extent due to the diversity of the solution spaces, furthermore, the convergence rate and stability are higher compared with the prior art.

Description

Microcosm algorithm hardware platform implementation method based on FPGA
Technical field
The invention belongs to artificial intelligence optimization's algorithm application field, be specifically related to the hardware platform implementation method of the worldlet optimized Algorithm on the FPGA.
Background technology
In scientific research and engineering practice, often need to solve various optimization problems, this need use various optimization methods.At present, optimization method mainly comprises classic optimization method and heuristic optimized Algorithm.Classic optimization method mainly comprises linear programming, gradient method, Newton method, interior point method etc.Heuritic approach is another important branch, comprising genetic algorithm, artificial neural network algorithm, ant group algorithm, particle cluster algorithm and simulated annealing etc.Because the classic optimisation algorithm has more requirement to function character, often needs derivative information, brings many restrictions to its application; And heuritic approach does not have too many requirement to function, only needs the information of functional value, therefore, is widely used in fields such as information science, computer science, system science and the engineering practice thereof.Yet a remarkable shortcoming of heuritic approach is that calculated amount is big.Though following Moore's Law, the computer hardware performance improves fast, but the optimization problem computation complexity also is being increased sharply, cause heuristic the finding the solution of some optimization problem to finish in effective time, in the application scenario that needs are handled in real time, the general calculation machine platform still is difficult in time handle the magnanimity computing that heuritic approach is brought.And, need to use embedded system to carry out computing under a lot of situations, universal computer platform obviously has been difficult to be competent at.Therefore, the design specialized integrated circuit realizes that heuritic approach becomes a focus of present research.
Function optimization Research of Hardware Implementation based on special IC mainly concentrates on genetic algorithm and artificial neural network aspect at present, and the hardware of ant group algorithm and particle cluster algorithm realizes also having a spot of research.Realize that by hardware the operation use time of these algorithms significantly reduces, real-time is stronger, and level of integrated system is higher, is more suitable for concrete engineering and uses.But these algorithm design complexity, and have the puzzlement of local extremum mostly, bring very big restriction for the application and the popularization of algorithm.
And the worldlet optimized Algorithm that comes from the proposition of sociology worldlet phenomenon is simply efficient, not only can improve speed of convergence, and can well keep the characteristics of diversity of individuals in the solution space, and be difficult for being absorbed in local extremum, effectively overcome fraud problem to a certain extent.But the present application of this algorithm is less, and calculated amount is still bigger, and its realization mainly concentrates on the PC, is unfavorable for the raising of level of integrated system.
Summary of the invention
In order to overcome classic optimization method function character is required the insoluble local extremum problem that harsh and existing other hardware optimization platform design based on special IC are complicated and exist, the objective of the invention is to, a kind of implementation method of new Optimization Platform based on scale programmable logic device is provided.This method not only can improve level of integrated system greatly, and the algorithm realization is simple efficient, has massive parallelism, is easy to the hardware Parallel Implementation, and by keeping the diversity of solution space, has more effectively overcome local extremum problem.
In order to realize above-mentioned task, the present invention adopts following technical solution:
A kind of hardware system of the microcosm algorithm based on field programmable gate array is characterized in that this hardware system comprises:
On the hardware platform that makes up by field programmable gate array, make up individual optimizing layer and global optimizing layer, connect by the individual information layer between individual optimizing layer and the global optimizing layer, a plurality of individual information modules are arranged in the individual information layer, constitute the hardware system of microcosm algorithm; Wherein:
Individual optimizing layer comprises random generator, position length journey recombination module, fitness computing module and comparison module; Wherein, randomizer and position length journey recombination module carry out neighborhood search, obtain the neighbours position, then the fitness computing module is sent in the neighbours position and calculate neighbours' fitness, and neighbours' fitness is sent into comparison module select the superior to operate with current ideal adaptation degree;
The global optimizing layer comprises interconnective ideal adaptation degree comparison module and system scheduling module; Wherein, select in ideal adaptation degree comparison module all individual information modules in the individual information layer,, judge by system scheduling module with optimum individual feed system scheduler module.
The present invention owing to adopted different development platforms and optimization mechanism, makes itself and existing method ratio according to the characteristic of FPGA and the principle of microcosm algorithm, has following characteristics:
1. the construction cycle is short, cost of development is low, at optimization problem can be flexible and changeable, be suitable for solving different optimization problems.
2. microcosm algorithm is solidificated in the FPGA the inside, improves the integrated level of system greatly, and be well suited for volume production.
3. combine the concurrency and the FPGA parallel computing characteristics of microcosm algorithm well, make it have higher real-time, be suitable for online in real time and calculate.
4. make full use of microcosm algorithm Stability Analysis of Structures, computing characteristic of simple, make it succinct more efficient with the FPGA exploitation.
5. utilize microcosm algorithm can well keep the characteristics of diversity of individuals in the solution space, be difficult for being absorbed in local extremum, effectively overcome fraud problem to a certain extent, and improved speed of convergence.
Description of drawings
The present invention is further described below in conjunction with drawings and Examples.
Fig. 1 is a system principle diagram of the present invention.
Fig. 2 is the structured flowchart of embodiment.
Fig. 3 is the LFSR structural drawing.
Fig. 4 is a position recombination module structural drawing.
Fig. 5 is an iteration structure CORDIC structural drawing.
Fig. 6 is the pipelined cordic structural drawing.
Fig. 7 is the structural drawing of comparer.
Fig. 8 is the finite state machine structure of scheduler module.
Embodiment
The present invention adopts the principle of microcosm algorithm to realize optimizing computing on the hardware platform that is made up by FPGA (field programmable gate array).Concrete scheme is as described below.
Because microcosm algorithm needs " individual optimizing " process and " global optimizing " two steps, individual searching process was finished by " neighborhood search ", " fitness calculating ", " selecting the superior " three steps, the global optimizing process is carried out the overall situation according to qualifications to each individuality again, and totally dispatches.So, as shown in Figure 1, need make up the structure of following two levels at the microcosm algorithm of FPGA realization:
1, ground floor is finished individual searching process, is made up of randomizer module, position recombination module, fitness computing module and comparison module.Randomizer produces a random number, with current body position (being the independent variable of solution space), supplying with the position recombination module and carries out neighborhood search, produces new length journey position, is called " neighbours " position herein.The fitness computing module calculates neighbours' fitness, and comparison module is the fitness of itself and former individuality relatively, if more excellent, then upgrades current individual information with neighbours position and fitness.
2, the second layer is finished the global optimizing process, is made up of ideal adaptation degree comparison module and system scheduling module.Ideal adaptation degree comparison module is selected optimum individual in all individualities, the feed system scheduler module judges whether to reach end condition and whether initiates search of future generation etc.
Connecting this two-layer bridge is " individual information " module, mainly is individual position and fitness information of storage.
The calculating process of system is as follows:
1) individual initialization is finished by system scheduling module, and the initial position that is evenly distributed on whole solution space is latched in the position in the individual information.
2) choose current individuality from n individuality carries out individual optimizing to system scheduling module in proper order, divides " neighborhood search ", " fitness calculating ", " selecting the superior " three steps to finish.Carry out neighborhood search earlier, obtain the neighbours position by randomizer and position recombination module.It selects short distance search or long-range search according to random number and a certain threshold ratio result, and the probability that long and short journey search takes place is by threshold value control.Short distance search realizes by some position of the current individuality of negate at random, and the long-range search is directly finished with the isometric position section of current individuality in the copy random number.Obtain after the neighbours position it being sent into fitness computing module calculating fitness, send into comparison module and current ideal adaptation degree after finishing and select the superior to operate,, then upgrade the information in the individual control module if more excellent.
3) after each individuality of n is all finished the process of selecting the superior, ask for optimum individual by individuality comparison fitness comparison module.
4) system scheduling module judges whether to reach the error of setting and whether has reached calculating algebraically, is to initiate next round to search for or withdraw from decision.
In addition, according to studies show that, fitness calculates the most times that occupied whole optimizing process, so, can be when the fitness computing module be taken current location away and is calculated, promptly carry out next individual neighbours' position reorganization, like this " neighborhood search " and " fitness calculating " both just can walk abreast and carry out the raising counting yield.
Each module of hardware system of the present invention is built and is finished by hardware description language (Verilog HDL or VHDL).In conjunction with the characteristics and the raising fitness computation rate of microcosm algorithm, present embodiment designs for each module for better.At the specific implementation in the present embodiment (as shown in Figure 2), each Module Design is elaborated below.
In the present embodiment, randomizer utilizes linear feedback shift register (Liner FeedbackShift Register, LFSR is as Fig. 3) to design, and the proper polynomial that it uses is p (x)=x 49+ x 9+ 1, the length that can produce equally distributed pseudo-random number sequence is 2 50-1, satisfy the requirement of long-time computing fully to the pseudo random number quality.Its hardware configuration has used 49 triggers as shown in Figure 3, so just can obtain 32 scale-of-two pseudo random numbers after a clock period.
The position recombination module, the neighbours position that is used to produce current individuality, as shown in Figure 4.At first produce the size of random number, decide and adopt still long-range search of short distance search according to randomizer.Select q position random number and threshold values m as the case may be for use, and to establish the decimal number that k position random number represents be p, then when p<m, adopt the short distance search strategy, otherwise adopt the long-range search strategy.Obviously, the ratio of length journey searching probability is (2 q-m): m, can set length journey searching probability ratio arbitrarily by the value that changes q and m.Short distance when search, according to some value of random number, with in the individual body position arbitrarily the negate of 1-3 position obtain, the long-range search is then replaced former individual positional information with another random number series of equal length.Like this, just finished the length journey reorganization of position.
Studies show that: it is the most time-consuming part in the basic microcosm algorithm that fitness calculates, and therefore, the main way that improves algorithm speed is exactly to improve the efficient of function calculation.Present embodiment adopts rotational coordinates computing machine (the Coordinate Rotation Digital Computer of pipeline organization, CORDIC) finish the calculating of transcendental function, its structure such as Fig. 5, shown in Figure 6, wherein the pipeline organization of Fig. 6 has increased about 23 times FPGA area consumption than the iteration structure of Fig. 5, but speed has also improved about 23 times accordingly.In addition, because 24 division arithmetic need expend a large amount of time, simultaneously in order to mate with the CORDIC arithmetic speed, divider in the present embodiment has also adopted 8 stage pipeline structure, make its maximum running frequency reach 80MHz, identical substantially with the CORDIC module, and, improved the stability of design for calculating provides sufficient surplus.Only need and organically to make up with upper module, and, can finish various forms of fitness function calculation requirements, and improve efficient greatly in conjunction with the corresponding kit of FPGA.
Comparison module is made up of digital comparator, uses the interpolation streamline to realize in order to improve its travelling speed, as shown in Figure 7.
The individual information module is mainly stored individual position and fitness, considers that FPGA internal RAM space is limited, therefore adopts trigger to store data.
System scheduling module adopts finite state machine (FSM) to realize the sequential control of each module, and as shown in Figure 8, end condition wherein judges that the employing digital comparator is finished equally.
Six, comparing result
This example adopts Verilog HDL language description, realizes on the Cyclone of ALTERA II2C35 FPGA.Here select more typical two functions, write the phase code and carried out on-line testing.Simultaneously, this paper has carried out 10 random tests and has verified its performance.
f 1(x,y)=1+x×sin(4πx)-y×sin(4πy+π) x,y∈[-1,1] (5)
f 2(x,y)=100×(x 2+y) 2+(1-x) 2 x,y∈[-2,2] (6)
The parameter that test is chosen is as follows:
Number of individuals: 20; Code length: 48; The long probability that connects: 25%; Neighborhood size: 1.
System clock is 50MHz, evolves to error less than 2 -14In time, stop to calculate, and 10 times the test average result is as follows.
Table 1 test result
Function Implementation method Optimal value The average time spent (ms) Area (LEs)
f 1 PC is realized -2.2592 5596.9 5178(16%)
Hardware 0.1098
f 2 PC is realized 1.9121×10 -4 608.7 6708(20%)
Hardware 0.06588
Annotate 1: this place's software realization program adopts Matlab7.0 establishment, computer main frequency 3.06G, internal memory 1G.
The total area (logic unit numbers) of annotating 2:Cyclone II2C35 is about 35000.
Operational precision and computer software that the test result viewing hardware is realized are suitable, but can also improve by increasing the hard-wired precision of figure place; Aspect the speed of calculating, hardware is realized having improved 4 orders of magnitude than software, as further consider between FPGA hardware dominant frequency 50MHz and the computer main frequency 3.06G 60 times greatest differences, hardware realizes having undoubtedly high efficient, on arithmetic speed, also have very big room for promotion, demonstrated the clear superiority of real-time processing aspect.

Claims (8)

1. the hardware system based on the microcosm algorithm of field programmable gate array is characterized in that, this hardware system comprises:
On the hardware platform that makes up by field programmable gate array, make up individual optimizing layer and global optimizing layer, connect by the individual information layer between individual optimizing layer and the global optimizing layer, a plurality of individual information modules are arranged in the individual information layer, constitute the hardware system of microcosm algorithm; Wherein:
Individual optimizing layer comprises random generator, position length journey recombination module, fitness computing module and comparison module; Wherein, randomizer and position length journey recombination module carry out neighborhood search, obtain the neighbours position, then the fitness computing module is sent in the neighbours position and calculate neighbours' fitness, and neighbours' fitness is sent into comparison module select the superior to operate with current ideal adaptation degree;
The global optimizing layer comprises interconnective ideal adaptation degree comparison module and system scheduling module; Wherein, select in ideal adaptation degree comparison module all individual information modules in the individual information layer,, judge by system scheduling module with optimum individual feed system scheduler module.
2. the hardware system of the described microcosm algorithm based on field programmable gate array of claim 1 is characterized in that, the implementation during the hardware system work of described microcosm algorithm is as follows:
A. individual initialization is finished by system scheduling module, and the initial position that is evenly distributed on whole solution space is latched in the position in the individual information;
B. choose current individuality from a plurality of individual information modules carries out individual optimizing to system scheduling module in proper order, divides " neighborhood search ", " fitness calculating ", " selecting the superior " three steps to finish; Carry out neighborhood search earlier, obtain the neighbours position by randomizer and position length journey recombination module; It selects short distance search or long-range search according to random number and a certain threshold ratio result, and the probability that long and short journey search takes place is by threshold value control; Short distance search realizes by some position of the current individuality of negate at random, and the long-range search is directly finished with the isometric position section of current individuality in the copy random number; Obtain after the neighbours position it being sent into fitness computing module calculating fitness, send into comparison module and current ideal adaptation degree after finishing and select the superior to operate,, then upgrade the information in the individual control module if more excellent;
C. after a plurality of individual information modules are all finished the process of selecting the superior, ask for optimum individual by ideal adaptation degree comparison module;
D. system scheduling module judges whether to reach the error of setting and whether has reached calculating algebraically, is to initiate next round to search for or withdraw from decision.
3. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1 is characterized in that, described randomizer uses linear feedback shift register to design.
4. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1, it is characterized in that, described position recombination module uses the random number size to control the search of length journey, neighbours' use of its medium or short range search is carried out a negate to current individuality at random and is produced, and the long-range search adopts the bit string of the corresponding length of copy random number to realize.
5. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1, it is characterized in that, described fitness computing module, use 24 level production line rotational coordinates computing machines to finish the computing of trigonometric function and radical sign etc., and be used in combination the calculating that 24 multiplier units, 8 level production line divider units are finished different fitness functions.
6. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1 is characterized in that described comparison module is made up of digital comparator, and uses the interpolation streamline to realize.
7. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1 is characterized in that, described individual information module is mainly finished the individual position and the storage of fitness information, and adopts trigger to finish the storage data.
8. the hardware system of the microcosm algorithm based on field programmable gate array as claimed in claim 1 is characterized in that, described system scheduling module adopts finite state machine FSM to realize the sequential control of each module.
CNB2007101884194A 2007-11-30 2007-11-30 Microcosm algorithm hardware platform realizing method based on FPGA Expired - Fee Related CN100530202C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101884194A CN100530202C (en) 2007-11-30 2007-11-30 Microcosm algorithm hardware platform realizing method based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101884194A CN100530202C (en) 2007-11-30 2007-11-30 Microcosm algorithm hardware platform realizing method based on FPGA

Publications (2)

Publication Number Publication Date
CN101183405A true CN101183405A (en) 2008-05-21
CN100530202C CN100530202C (en) 2009-08-19

Family

ID=39448678

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101884194A Expired - Fee Related CN100530202C (en) 2007-11-30 2007-11-30 Microcosm algorithm hardware platform realizing method based on FPGA

Country Status (1)

Country Link
CN (1) CN100530202C (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799412A (en) * 2012-07-09 2012-11-28 上海大学 CORDIC (coordinate rotation digital computer) accelerator based on parallel pipeline design
CN103810322A (en) * 2013-12-24 2014-05-21 西安电子科技大学 Integrated circuit layout method based on best fit heuristic sequence and organizational evolutionary algorithms
CN108011716A (en) * 2016-10-31 2018-05-08 航天信息股份有限公司 A kind of encryption apparatus and implementation method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799412A (en) * 2012-07-09 2012-11-28 上海大学 CORDIC (coordinate rotation digital computer) accelerator based on parallel pipeline design
CN103810322A (en) * 2013-12-24 2014-05-21 西安电子科技大学 Integrated circuit layout method based on best fit heuristic sequence and organizational evolutionary algorithms
CN103810322B (en) * 2013-12-24 2017-01-25 西安电子科技大学 Integrated circuit layout method based on best fit heuristic sequence and organizational evolutionary algorithms
CN108011716A (en) * 2016-10-31 2018-05-08 航天信息股份有限公司 A kind of encryption apparatus and implementation method
CN108011716B (en) * 2016-10-31 2021-04-16 航天信息股份有限公司 Cipher device and implementation method

Also Published As

Publication number Publication date
CN100530202C (en) 2009-08-19

Similar Documents

Publication Publication Date Title
Tu et al. Deep convolutional neural network architecture with reconfigurable computation patterns
Hao et al. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge
Wang et al. PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks
Geng et al. FPDeep: Acceleration and load balancing of CNN training on FPGA clusters
CN108564168B (en) Design method for neural network processor supporting multi-precision convolution
Fang et al. swdnn: A library for accelerating deep learning applications on sunway taihulight
Kim et al. A novel zero weight/activation-aware hardware architecture of convolutional neural network
Ma et al. Automatic compilation of diverse CNNs onto high-performance FPGA accelerators
Ma et al. End-to-end scalable FPGA accelerator for deep residual networks
Geng et al. O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference
Motamedi et al. PLACID: A platform for FPGA-based accelerator creation for DCNNs
Ding et al. A FPGA-based accelerator of convolutional neural network for face feature extraction
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
CN100530202C (en) Microcosm algorithm hardware platform realizing method based on FPGA
Zhang et al. Dna: Differentiable network-accelerator co-search
CN105740200B (en) Systems, devices and methods for K nearest neighbor search
CN114418072A (en) Convolution operator mapping method for multi-core memristor storage and calculation integrated platform
Chang et al. A parallel implicit hole-cutting method based on background mesh for unstructured Chimera grid
Russo et al. MEDEA: A multi-objective evolutionary approach to DNN hardware mapping
Sridharan et al. X-former: In-memory acceleration of transformers
CN106484532A (en) GPGPU parallel calculating method towards SPH fluid simulation
CN102004627A (en) Multiplication rounding implementation method and device
Pedram et al. Low-power RT-level synthesis techniques: a tutorial
RU2294561C2 (en) Device for hardware realization of probability genetic algorithms
Hsiao et al. Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090819

Termination date: 20111130