WO2017124809A1 - Particle swarm optimization method and system based on gpu operation of mobile terminal - Google Patents

Particle swarm optimization method and system based on gpu operation of mobile terminal Download PDF

Info

Publication number
WO2017124809A1
WO2017124809A1 PCT/CN2016/103892 CN2016103892W WO2017124809A1 WO 2017124809 A1 WO2017124809 A1 WO 2017124809A1 CN 2016103892 W CN2016103892 W CN 2016103892W WO 2017124809 A1 WO2017124809 A1 WO 2017124809A1
Authority
WO
WIPO (PCT)
Prior art keywords
particle
gpu
thread
speed
particles
Prior art date
Application number
PCT/CN2016/103892
Other languages
French (fr)
Chinese (zh)
Inventor
刘小东
Original Assignee
上海斐讯数据通信技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海斐讯数据通信技术有限公司 filed Critical 上海斐讯数据通信技术有限公司
Publication of WO2017124809A1 publication Critical patent/WO2017124809A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models

Definitions

  • the present invention relates to the field of computers, and more particularly to a particle swarm optimization method based on GPU operation of a mobile terminal.
  • GPU graphics processor
  • the technical problem to be solved by the present invention is to provide a particle swarm optimization method based on the operation of a GPU of a mobile terminal, which can improve the processing efficiency of the GPU, thereby improving the processing efficiency of the GPU cooperating CPU.
  • a particle swarm optimization method based on GPU operation of a mobile terminal comprising:
  • the GPU sends the optimal particle to the CPU.
  • the calculation model of the particle group is a star topology, in which each particle in the particle group is connected to each other.
  • the calculation model of the particle group is a ring topology in which each particle in the particle group is connected to its adjacent particles.
  • the velocity and position update equations of the particles in the particle population are as follows:
  • v ij (t+1) ⁇ v ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t))
  • v i (t) (v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimension velocity;
  • w is the inertia weight 0.2-0.9;
  • c 1 is the cognitive coefficient,
  • c 2 is the social coefficient;
  • r 1 and r 2 are random numbers obeying the uniformly distributed [0, 1];
  • p i (t) (p i 1,p i 2,...,p i d) is the historical optimal position of the ith particle;
  • p gj (t) (p g 1, p g 2, . . . , p g d) is the optimal position of the group history.
  • the method operates based on a CUDA programming model comprising: a thread, a thread block, and a thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory All threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.
  • the historical optimal particle and the global optimal particle are obtained by the CUDA programming model.
  • the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function.
  • the speed and position of the updated particle are completed by the CUDA programming model.
  • one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
  • the determination of the operational efficiency and convergence of the particle swarm optimization method comprises determining one or more of the following seven functions:
  • D represents the dimension of the particle and the range of x i represents the search space.
  • the step of updating the velocity and position of the particle according to the acquired historical optimal particle and the global optimal particle is entered.
  • the invention also discloses a particle swarm optimization system based on operation of a mobile terminal GPU, the system comprising:
  • An initialization module for initializing the position and velocity of particles in the particle group on the CPU side
  • a sending module for sending the position and speed of the particle to the GPU
  • the first update module updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
  • a second update module that updates global optimal particles according to the speed and position of the updated particles
  • the determining module is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.
  • the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the GPU The performance-limited algorithm structure is not maximized.
  • the particle swarm optimization algorithm is used to improve the operation speed and operation efficiency of the GPU, thereby greatly improving the performance of the GPU, thereby greatly improving the GPU as a cooperative processor to assist the CPU.
  • the ability to perform calculations improves the overall computing power and computational efficiency.
  • the CPU used in mobile terminals is generally less efficient, and the method is applied.
  • the computing efficiency of the mobile terminal can be significantly improved, the computing speed and efficiency of the mobile terminal can be greatly improved, and the operating efficiency of the CPU platform of the conventional mobile terminal can be solved and the convergence speed is slow.
  • the problem is that
  • FIG. 1 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to another embodiment of the present invention
  • FIG. 3 is a schematic diagram of a star topology and a ring topology according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a CUDA programming model according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of optimization of a PSO method according to an embodiment of the present invention.
  • FIG. 6 is a convergence graph of a function f1 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention
  • FIG. 7 is a convergence graph of a function f2 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention
  • FIG. 8 is a schematic diagram of a particle swarm optimization system based on a GPU operation of a mobile terminal according to Embodiment 2 of the present invention.
  • FIG. 9 is a schematic diagram of a particle swarm optimization system based on a mobile terminal GPU according to another embodiment of the present invention.
  • 201 initialization module, 202, sending module, 203, first update module, 204, second update module, 205, decision module, 206, output module.
  • Computer device also referred to as “computer” in the context, is meant an intelligent electronic device that can perform predetermined processing, such as numerical calculations and/or logical calculations, by running a predetermined program or instruction, which can include a processor and The memory is executed by the processor to execute a predetermined process pre-stored in the memory to execute a predetermined process, or is executed by hardware such as an ASIC, an FPGA, a DSP, or the like, or a combination of the two.
  • Computer devices include, but are not limited to, servers, personal computers, notebook computers, tablets, smart phones, and the like.
  • the computer device includes a user device and a network device.
  • the user equipment or client includes, but is not limited to, a computer, a smart phone, a PDA, etc.;
  • the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing based A cloud consisting of a large number of computers or network servers, where cloud computing is a type of distributed computing, a super-virtual computer composed of a group of loosely coupled computers.
  • the computer device can be operated separately to implement the present invention, and can also access the network and implement the present invention by interacting with other computer devices in the network.
  • the network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
  • the user equipment, the client, the network device, the network, and the like are merely examples, and other existing or future computer devices or networks may be applicable to the present invention, and should also be included in the scope of the present invention.
  • first first
  • second second
  • first first
  • second second
  • the method of the embodiment is mainly implemented by a mobile terminal or an electronic device; the mobile terminal or the electronic device, such as a mobile phone or a tablet, may also have a network function, and may communicate with other terminals, but is limited to the above.
  • the mobile terminal or the electronic device is only an example, and other existing or future network devices and clients may be applicable to the present invention, and are also included in the scope of the present invention and are included by reference. herein. Further, the embodiments of the present invention are merely illustrative, and the embodiments and the embodiments may be freely combined, and a plurality of embodiments may be combined as long as the effects of the present invention can be achieved.
  • a particle swarm optimization method based on operation of a GPU of a mobile terminal includes:
  • the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the performance-limited algorithm structure of the GPU is not maximized.
  • the particle swarm optimization algorithm is used to improve the computing speed and operation of the GPU.
  • Efficiency greatly improving the performance of the GPU, thereby greatly improving the ability of the GPU to assist the CPU in computing as a cooperative processor, improving the overall computing power and computing efficiency, in particular, in mobile terminals, because of limitations in volume and heat dissipation, etc.
  • the CPU used in the mobile terminal has low computational efficiency, and the method is applied to mobile devices such as mobile phones and tablets, which can significantly improve the computing efficiency of the mobile terminal and make the computing speed and efficiency of the mobile terminal large. The increase in amplitude can also solve the problem of low operating efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
  • a particle swarm optimization method based on GPU operation of a mobile terminal the method comprising:
  • the present invention selects one or both of a star topology and a ring topology.
  • the left side of the figure is the star topology and the right side is the ring topology.
  • all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself.
  • each particle in the particle group in the ring topology communicates with its n(n ⁇ 1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it.
  • Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.
  • the existing PSO method is easy to fall into the disadvantage of local extremum, in order to improve the PSO method, including parameter adjustment, change the search network space, and mix other algorithms.
  • the performance of the PSO method has a strong dependence on the network structure. Therefore, the change of the topology plays an important role in the convergence of the PSO algorithm.
  • the PSO method of the ring topology and the star topology is improved, combined with PSO.
  • the parallelism of the algorithm solves the technical problem of low efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
  • the GPU significantly accelerates the convergence speed of the particle group.
  • the particle swarm optimization algorithm in this embodiment is based on a CUDA (Compute Unified Device Architecture) programming model
  • the CUDA programming model includes: a thread, a thread block, and Thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory, all threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.
  • the historical optimal particle and the global optimal particle are obtained by the CUDA programming model.
  • the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function.
  • the speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
  • CUDA In the CUDA programming model, the CPU acts as the host, called the Host, and the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler.
  • SIMD Single Instruction Multiple Thread
  • CUDA uses three vectors to organize threads into three different levels: threads, Thread blocks and grids. The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.
  • each particle exchanges information only when updating the global optimum, and the other steps are independent of each other.
  • the history is optimal on CUDA, one thread corresponds to one particle, and each thread calls the preset function at the same time; when updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
  • the host side initializes the position and velocity of the particles, copies the data from the CPU to the GPU, iterates over the device, and finally copies the optimal solution to the CPU output.
  • GPU performance is excellent, its performance is largely limited by the algorithm structure.
  • data structures and access to memory have a significant impact on GPU performance. Performance optimization is mainly considered from four aspects: 1) maximum optimized parallel execution; 2) optimized memory for maximum memory bandwidth; 3) optimized instruction usage for maximum instruction throughput; 4) thread block and thread number setting.
  • the speed and position update equations of the standard PSO method are as follows:
  • v ij (t+1) ⁇ v ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t)) (1)
  • v ij (t) represents the jth of the ith particle Dimensional velocity
  • c 1 is the cognitive coefficient
  • r 1 and r 2 are obeying a uniformly distributed random number between [0, 1]
  • p gj (t) (p g 1,p g 2,.. , p g d)
  • the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods.
  • the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x i represents the search space. This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm.
  • the running time of the star structure PSO algorithm on the CPU and CPU + GPU is shown in Table 1.
  • the running time of the ring structure PSO algorithm on the CPU and CPU + GPU is shown in Table 2.
  • Table 1 and Table 2 above can see the comparison of computational efficiency. When the parameters are set consistently, compare the performance of the two algorithms. Comparing the acceleration ratios in Tables 1 and 2, you can see that the star algorithm is better than the ring algorithm. . In the running time of the CPU, the difference between the two algorithms is not obvious, and on the CPU+GPU, the efficiency of the star algorithm is significantly higher than that of the ring algorithm.
  • Figure 6 and Figure 7 depict the PSO algorithms of the two topologies for solving the convergence curves of f 1 and f 2 on the CPU and GPU respectively.
  • Figure 6 and Figure 7 show the convergence of the convergence.
  • the present embodiment discloses a particle swarm optimization system based on a GPU running on a mobile terminal, where the system is used to run the method in the first embodiment, and the system includes:
  • An initialization module 201 configured to initialize a position and a velocity of a particle in the particle group on the CPU side;
  • the sending module 202 is configured to send the position and speed of the particle to the GPU end;
  • the first update module 203 updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
  • the second update module 204 updates the global optimal particle according to the updated particle speed and position
  • the determining module 205 is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.
  • the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the performance-limited algorithm structure of the GPU is not maximized.
  • the particle swarm optimization algorithm is used to improve the computing speed and operation of the GPU.
  • Efficiency greatly improving the performance of the GPU, thereby greatly improving the ability of the GPU to assist the CPU in computing as a cooperative processor, improving the overall computing power and computing efficiency, in particular, in mobile terminals, because of limitations in volume and heat dissipation, etc.
  • the CPU used in the mobile terminal has low computational efficiency, and the method is applied to mobile devices such as mobile phones and tablets, which can significantly improve the computing efficiency of the mobile terminal and make the computing speed and efficiency of the mobile terminal large. The increase in amplitude can also solve the problem of low operating efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
  • the determining module 205 is further configured to: if the current global optimal particle does not satisfy the iterative condition, proceed to the first update module 203, according to the acquired historical optimal particle. And the global optimal particle updates the speed and position of the particle, and then proceeds to the second update module 204, the decision module 205 until the current global optimal particle satisfies the iteration condition.
  • the iteration condition may be that the speed of the particle reaches a preset threshold, or the position reaches a preset condition, or the speed and position reach a preset condition, and the like.
  • the system shown is also The output module 206 is configured to output the optimal particles to the CPU end on the GPU side, and the CPU end can output the result.
  • the method is as follows: a particle swarm optimization system based on the operation of a GPU of a mobile terminal, the system is used to run the method in the first embodiment, the system includes:
  • An initialization module 201 configured to initialize a position and a velocity of a particle in the particle group on the CPU side;
  • the sending module 202 is configured to send the position and speed of the particle to the GPU end;
  • the first update module 203 updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
  • the second update module 204 updates the global optimal particle according to the updated particle speed and position
  • the determining module 205 is configured to determine whether the current global optimal particle satisfies an iteration condition, and if yes, the GPU sends the optimal particle to the CPU; otherwise, enters the first update module 203 to perform the next iteration;
  • the output module 106 outputs the optimal particles to be sent to the CPU.
  • the present invention selects one or two of a star topology and a ring topology, which may be combined with FIG.
  • the left side of the figure is the star topology and the right side is the ring topology.
  • all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself. To share the best solution for the entire particle swarm.
  • each particle in the particle group in the ring topology communicates with its n(n ⁇ 1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it. Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.
  • the existing PSO method is easy to fall into the disadvantage of local extremum, in order to improve the PSO method, including parameter adjustment, change the search network space, and mix other algorithms.
  • the performance of the PSO method has a strong dependence on the network structure. Therefore, the change of the topology plays an important role in the convergence of the PSO algorithm.
  • the PSO method of the ring topology and the star topology is improved, combined with PSO.
  • the parallelism of the algorithm solves the technical problem of low efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
  • the GPU significantly accelerates the convergence speed of the particle group.
  • the particle swarm optimization algorithm in this embodiment is based on CUDA (Compute Unified Device Architecture) programming.
  • the CUDA programming model includes: threads, thread blocks, and thread grids; each thread has a unique thread number and a private register; each thread block has a shared memory, and all threads in each thread block Access to shared memory, at least two thread blocks form a thread grid.
  • the historical optimal particle and the global optimal particle are obtained by the CUDA programming model.
  • one thread corresponds to one particle, and each thread simultaneously calls a preset function.
  • the speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
  • the CPU acts as the host, called the Host
  • the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler.
  • a complete CUDA programming model. Kernel parallel processing is based on the Single Instruction Multiple Thread (SIMD) computing model.
  • SIMD Single Instruction Multiple Thread
  • CUDA uses three vectors to organize threads into three different levels: threads, threads, and grids. ). The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.
  • each particle exchanges information only when updating the global optimum, and the other steps are independent of each other.
  • the history is optimal on CUDA, one thread corresponds to one particle, and each thread calls the preset function at the same time; when updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
  • the host side initializes the position and velocity of the particles, copies the data from the CPU to the GPU, iterates over the device, and finally copies the optimal solution to the CPU output.
  • GPU performance is excellent, its performance is largely limited by the algorithm structure.
  • data structures and access to memory have a significant impact on GPU performance. Performance optimization is mainly considered from four aspects: 1) maximum optimized parallel execution; 2) optimized memory for maximum memory bandwidth; 3) optimized instruction usage for maximum instruction throughput; 4) thread block and thread number setting.
  • the speed and position update equations of the standard PSO method are as follows:
  • v ij (t+1) ⁇ v ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t)) (1)
  • c 1 is the cognitive coefficient,
  • r 1 and r 2 are obeying a uniformly distributed random number between [0, 1];
  • x i (t) (x i 1 , x i 2, ..., x i d) for the location of the i-th particle generation t;
  • p gj (t) ( p g 1, p g 2,
  • the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods.
  • the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x i represents the search space.
  • This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm.
  • the specific functions are as follows:
  • the running time of the star structure PSO algorithm on the CPU and CPU + GPU is shown in Table 1.
  • the running time of the ring structure PSO algorithm on the CPU and CPU + GPU is shown in Table 2.
  • Table 1 and Table 2 above can see the comparison of computational efficiency. When the parameters are set consistently, compare the performance of the two algorithms. Comparing the acceleration ratios in Tables 1 and 2, you can see that the star algorithm is better than the ring algorithm. . In the running time of the CPU, the difference between the two algorithms is not obvious, and on the CPU+GPU, the efficiency of the star algorithm is significantly higher than that of the ring algorithm.
  • the optimization process of the PSO algorithm of the ring and star topology in the Unified Computing Device Architecture is compared with the computing time of the mobile phone CPU by solving several benchmark functions. Calculation results. The results show that the PSO method of the two topologies is higher in computational efficiency than the CPU.
  • the GPU significantly accelerates the convergence speed of the particle swarm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Telephone Function (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A particle swarm optimization method based on a GPU operation of a mobile terminal. The method comprises: initializing, at a CPU end, the position and speed of a particle in a particle swarm (S101); sending the position and speed of the particle to a GPU end (S102); updating the speed and position of the particle according to an acquired historical optimal particle and an overall optimal particle (S103); updating the overall optimal particle according to the updated speed and position of the particle (S104); and if the current overall optimal particle satisfies an iteration condition, the GPU end sending the optimal particle to the CPU end (S105). By means of a particle swarm optimization algorithm, the operation speed and operation efficiency of a GPU are improved, and the performance of the GPU is greatly improved such that the capability of the GPU, as a coprocessor, in assisting a CPU to perform an operation is greatly improved, and the whole operation capability and operation efficiency are improved.

Description

一种基于移动终端GPU运行的粒子群优化方法和系统Particle swarm optimization method and system based on mobile terminal GPU operation 技术领域Technical field
本发明涉及计算机领域,更具体的说,涉及一种基于移动终端GPU运行的粒子群优化方法。The present invention relates to the field of computers, and more particularly to a particle swarm optimization method based on GPU operation of a mobile terminal.
背景技术Background technique
目前,CPU(中央处理器)核心频率已达极至,海量数据的实时处理,已经远远超出了CPU的负荷,这一发展无疑为大数据问题的解决带来新途径,GPU(图形处理器)起初只是负责图形渲染,直到NVIDIA公布了GeForce系列GPU,GPU才开始应用于通用计算;CUDA(Compute Unified Device Architecture,统一计算设备架构)的推出使得开发可以很快地从CPU过渡到GPU。GPU和CPU的协同工作,现已被广泛应用于石油勘探,天文计算,流体力学模拟,生物计算等领域,都取得了几倍甚至几十倍的加速比。At present, the CPU (central processing unit) core frequency has reached the extreme, the real-time processing of massive data has far exceeded the load of the CPU. This development will undoubtedly bring new ways to solve the big data problem, GPU (graphic processor) At first, it was only responsible for graphics rendering. Until NVIDIA announced the GeForce series of GPUs, GPUs began to be used in general-purpose computing; the introduction of CUDA (Compute Unified Device Architecture) allowed development to quickly transition from CPU to GPU. The collaborative work of GPU and CPU has been widely used in oil exploration, astronomical calculation, fluid mechanics simulation, biological computing, etc., and has achieved several times or even dozens of times of acceleration ratio.
然而,在现有的移动终端中,GPU作为协同处理的效率提升有限,协助CPU进行处理的工作效率并不理想,因此,亟需一种更加高效率的运行方法。However, in the existing mobile terminals, the efficiency of the GPU as a cooperative processing is limited, and the work efficiency of assisting the CPU for processing is not ideal, and therefore, a more efficient operation method is urgently needed.
发明内容Summary of the invention
本发明所要解决的技术问题是提供一种基于移动终端GPU运行的粒子群优化方法,该方法可提高GPU的处理效率,进而提高GPU协同CPU的处理效率。The technical problem to be solved by the present invention is to provide a particle swarm optimization method based on the operation of a GPU of a mobile terminal, which can improve the processing efficiency of the GPU, thereby improving the processing efficiency of the GPU cooperating CPU.
本发明的目的是通过以下技术方案来实现的:The object of the present invention is achieved by the following technical solutions:
一种基于移动终端GPU运行的粒子群优化方法,所述方法包括:A particle swarm optimization method based on GPU operation of a mobile terminal, the method comprising:
在CPU端初始化粒子群中粒子的位置和速度;Initializing the position and velocity of the particles in the particle group on the CPU side;
将粒子的位置和速度发送到GPU端;Send the position and speed of the particle to the GPU;
根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;Updating the velocity and position of the particle based on the obtained historical optimal particle and the global optimal particle;
根据更新的粒子的速度和位置更新全局最优粒子;Updating global optimal particles based on the speed and position of the updated particles;
若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。 If the current global optimal particle satisfies the iterative condition, the GPU sends the optimal particle to the CPU.
优选的,所述粒子群的计算模型为星型拓扑结构,所述星型拓扑结构中,所述粒子群中的每个粒子均相互连接。Preferably, the calculation model of the particle group is a star topology, in which each particle in the particle group is connected to each other.
优选的,所述粒子群的计算模型为环型拓扑结构,所述环形拓扑结构中,所述粒子群中的每个粒子与其相邻的粒子连接。Preferably, the calculation model of the particle group is a ring topology in which each particle in the particle group is connected to its adjacent particles.
优选的,所述粒子群中的粒子的速度和位置更新方程如下:Preferably, the velocity and position update equations of the particles in the particle population are as follows:
vij(t+1)=ωvij(t)+c1*r1*(pij(t)-xij(t))+c2*r2*(pgj(t)-xij(t))v ij (t+1)=ωv ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t))
xij(t+1)=xij(t)+vij(t+1)x ij (t+1)=x ij (t)+v ij (t+1)
其中,vi(t)=(vi1,vi2,...,vid)为第i个粒子在第t代的速度,vij(t)表示第i个粒子的第j维速度;w为惯性权重0.2~0.9;c1为认知系数,c2为社会系数;r1和r2为服从均匀分布的[0,1]之间的随机数;pi(t)=(pi1,pi2,...,pid)为第i个粒子的历史最优位置;xi(t)=(xi1,xi2,...,xid)为第i个粒子在第t代的位置;pgj(t)=(pg1,pg2,...,pgd)为群体历史最优位置。Where v i (t)=(v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimension velocity; w is the inertia weight 0.2-0.9; c 1 is the cognitive coefficient, c 2 is the social coefficient; r 1 and r 2 are random numbers obeying the uniformly distributed [0, 1]; p i (t) =(p i 1,p i 2,...,p i d) is the historical optimal position of the ith particle; x i (t)=(x i 1,x i 2,...,x i d) is the position of the i-th particle at the t-th generation; p gj (t)=(p g 1, p g 2, . . . , p g d) is the optimal position of the group history.
优选的,所述方法基于CUDA编程模型运行,所述CUDA编程模型包括:线程、线程块以及线程格;每个线程都有一个唯一的线程编号和一个私有寄存器;每个线程块拥有一个共享存储器,每个线程块中所有线程均可访问共享内存,至少两个线程块构成一个线程格。Advantageously, the method operates based on a CUDA programming model comprising: a thread, a thread block, and a thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory All threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.
优选的,所述历史最优粒子和全局最优粒子通过所述CUDA编程模型获取,在CUDA编程模型上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数。Preferably, the historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function.
优选的,所述更新粒子的速度和位置通过所述CUDA编程模型完成,更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。Preferably, the speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
优选的,所述粒子群优化方法的运行效率和收敛性的确定包括使用以下七种函数的一种或几种确定:Preferably, the determination of the operational efficiency and convergence of the particle swarm optimization method comprises determining one or more of the following seven functions:
(1)Sphere函数(1) Sphere function
Figure PCTCN2016103892-appb-000001
Figure PCTCN2016103892-appb-000001
(2)Ackley函数(2) Ackley function
Figure PCTCN2016103892-appb-000002
Figure PCTCN2016103892-appb-000002
(3)Schwefel函数 (3) Schwefel function
Figure PCTCN2016103892-appb-000003
Figure PCTCN2016103892-appb-000003
(4)Levy函数(4) Levy function
Figure PCTCN2016103892-appb-000004
Figure PCTCN2016103892-appb-000004
(5)Griewank函数(5) Griewank function
Figure PCTCN2016103892-appb-000005
Figure PCTCN2016103892-appb-000005
(6)Rastrigin函数(6) Rastrigin function
Figure PCTCN2016103892-appb-000006
Figure PCTCN2016103892-appb-000006
(7)Rosenbrock函数(7) Rosenbrock function
Figure PCTCN2016103892-appb-000007
Figure PCTCN2016103892-appb-000007
其中D表示粒子的维数,xi的范围表示搜索空间。Where D represents the dimension of the particle and the range of x i represents the search space.
优选的,若当前的全局最优粒子不满足迭代条件,则进入到所述根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置的步骤。Preferably, if the current global optimal particle does not satisfy the iterative condition, the step of updating the velocity and position of the particle according to the acquired historical optimal particle and the global optimal particle is entered.
本发明还公开一种基于移动终端GPU运行的粒子群优化系统,所述系统包括:The invention also discloses a particle swarm optimization system based on operation of a mobile terminal GPU, the system comprising:
初始化模块,用于在CPU端初始化粒子群中粒子的位置和速度;An initialization module for initializing the position and velocity of particles in the particle group on the CPU side;
发送模块,用于将粒子的位置和速度发送到GPU端;a sending module for sending the position and speed of the particle to the GPU;
第一更新模块,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;The first update module updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
第二更新模块,根据更新的粒子的速度和位置更新全局最优粒子;a second update module that updates global optimal particles according to the speed and position of the updated particles;
判定模块,用于若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。The determining module is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.
现有技术中,GPU作为协同处理器让CPU的运算能力增强,但是GPU 的性能受限算法结构,没有得到最大限度的发挥,本实施例中,利用粒子群优化算法提高GPU的运算速度和运算效率,大大提高GPU的性能,从而大大提高了GPU作为协同处理器协助CPU进行运算的能力,提高了整体的运算能力和运算效率,特别的,在移动终端中,因为受限于体积和散热等,通常移动终端中使用的CPU计算效率都较低,而将本方法运用在移动终端如手机、平板等电子设备上,能够显著的提高移动终端计算效率,让移动终端的运算速度和效率得到大幅度提高,也可以解决在传统移动终端CPU平台运行效率低及收敛速度慢的问题。In the prior art, the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the GPU The performance-limited algorithm structure is not maximized. In this embodiment, the particle swarm optimization algorithm is used to improve the operation speed and operation efficiency of the GPU, thereby greatly improving the performance of the GPU, thereby greatly improving the GPU as a cooperative processor to assist the CPU. The ability to perform calculations improves the overall computing power and computational efficiency. In particular, in mobile terminals, because of the limited size and heat dissipation, the CPU used in mobile terminals is generally less efficient, and the method is applied. In mobile devices such as mobile phones, tablets and other electronic devices, the computing efficiency of the mobile terminal can be significantly improved, the computing speed and efficiency of the mobile terminal can be greatly improved, and the operating efficiency of the CPU platform of the conventional mobile terminal can be solved and the convergence speed is slow. The problem.
附图说明DRAWINGS
图1是本发明实施例一的基于移动终端GPU运行的粒子群优化方法的流程图;1 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to Embodiment 1 of the present invention;
图2是本发明实施例一另一种方式的基于移动终端GPU运行的粒子群优化方法的流程图;2 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to another embodiment of the present invention;
图3是本发明实施例的星型拓扑结构和环型拓扑结构示意图;3 is a schematic diagram of a star topology and a ring topology according to an embodiment of the present invention;
图4是本发明实施例的CUDA编程模型示意图;4 is a schematic diagram of a CUDA programming model according to an embodiment of the present invention;
图5是本发明实施例的PSO方法的寻优流程图;FIG. 5 is a flowchart of optimization of a PSO method according to an embodiment of the present invention; FIG.
图6是本发明实施例的星型拓扑结构和环型拓扑结构PSO方法的CPU和GPU+CPU上的函数f1收敛曲线图;6 is a convergence graph of a function f1 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention;
图7是本发明实施例的星型拓扑结构和环型拓扑结构PSO方法的CPU和GPU+CPU上的函数f2收敛曲线图;7 is a convergence graph of a function f2 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention;
图8是本发明实施例二的基于移动终端GPU运行的粒子群优化系统的示意图;8 is a schematic diagram of a particle swarm optimization system based on a GPU operation of a mobile terminal according to Embodiment 2 of the present invention;
图9是本发明实施例二另一种方式的基于移动终端GPU运行粒子群优化系统的示意图。FIG. 9 is a schematic diagram of a particle swarm optimization system based on a mobile terminal GPU according to another embodiment of the present invention.
其中:201、初始化模块,202、发送模块,203、第一更新模块,204、第二更新模块,205、判定模块,206、输出模块。Wherein: 201, initialization module, 202, sending module, 203, first update module, 204, second update module, 205, decision module, 206, output module.
具体实施方式detailed description
在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。 此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。Before discussing the exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as a process or method depicted as a flowchart. Although the flowcharts describe various operations as a sequential process, many of the operations can be implemented in parallel, concurrently or concurrently. In addition, the order of operations can be rearranged. The process may be terminated when its operation is completed, but may also have additional steps not included in the figures. The processing may correspond to methods, functions, procedures, subroutines, subroutines, and the like.
在上下文中所称“计算机设备”,也称为“电脑”,是指可以通过运行预定程序或指令来执行数值计算和/或逻辑计算等预定处理过程的智能电子设备,其可以包括处理器与存储器,由处理器执行在存储器中预存的存续指令来执行预定处理过程,或是由ASIC、FPGA、DSP等硬件执行预定处理过程,或是由上述二者组合来实现。计算机设备包括但不限于服务器、个人电脑、笔记本电脑、平板电脑、智能手机等。By "computer device", also referred to as "computer" in the context, is meant an intelligent electronic device that can perform predetermined processing, such as numerical calculations and/or logical calculations, by running a predetermined program or instruction, which can include a processor and The memory is executed by the processor to execute a predetermined process pre-stored in the memory to execute a predetermined process, or is executed by hardware such as an ASIC, an FPGA, a DSP, or the like, or a combination of the two. Computer devices include, but are not limited to, servers, personal computers, notebook computers, tablets, smart phones, and the like.
所述计算机设备包括用户设备与网络设备。其中,所述用户设备或客户端包括但不限于电脑、智能手机、PDA等;所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量计算机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。其中,所述计算机设备可单独运行来实现本发明,也可接入网络并通过与网络中的其他计算机设备的交互操作来实现本发明。其中,所述计算机设备所处的网络包括但不限于互联网、广域网、城域网、局域网、VPN网络等。The computer device includes a user device and a network device. The user equipment or client includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing based A cloud consisting of a large number of computers or network servers, where cloud computing is a type of distributed computing, a super-virtual computer composed of a group of loosely coupled computers. Wherein, the computer device can be operated separately to implement the present invention, and can also access the network and implement the present invention by interacting with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
需要说明的是,所述用户设备、客户端、网络设备和网络等仅为举例,其他现有的或今后可能出现的计算机设备或网络如可适用于本发明,也应包含在本发明保护范围以内,并以引用方式包含于此。It should be noted that the user equipment, the client, the network device, the network, and the like are merely examples, and other existing or future computer devices or networks may be applicable to the present invention, and should also be included in the scope of the present invention. Within, and by reference.
后面所讨论的方法(其中一些通过流程图示出)可以通过硬件、软件、固件、中间件、微代码、硬件描述语言或者其任意组合来实施。当用软件、固件、中间件或微代码来实施时,用以实施必要任务的程序代码或代码段可以被存储在机器或计算机可读介质(比如存储介质)中。(一个或多个)处理器可以实施必要的任务。The methods discussed below, some of which are illustrated by flowcharts, can be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to carry out the necessary tasks can be stored in a machine or computer readable medium, such as a storage medium. The processor(s) can perform the necessary tasks.
这里所公开的具体结构和功能细节仅仅是代表性的,并且是用于描述本发明的示例性实施例的目的。但是本发明可以通过许多替换形式来具体实现,并且不应当被解释成仅仅受限于这里所阐述的实施例。The specific structural and functional details disclosed are merely representative and are for the purpose of describing exemplary embodiments of the invention. The present invention may, however, be embodied in many alternative forms and should not be construed as being limited only to the embodiments set forth herein.
应当理解的是,虽然在这里可能使用了术语“第一”、“第二”等等来描述各个单元,但是这些单元不应当受这些术语限制。使用这些术语仅仅是为了将一个单元与另一个单元进行区分。举例来说,在不背离示例性实 施例的范围的情况下,第一单元可以被称为第二单元,并且类似地第二单元可以被称为第一单元。这里所使用的术语“和/或”包括其中一个或更多所列出的相关联项目的任意和所有组合。It should be understood that although the terms "first," "second," etc. may be used herein to describe the various elements, these elements should not be limited by these terms. These terms are used only to distinguish one unit from another. For example, without departing from the example In the case of the scope of the embodiment, the first unit may be referred to as a second unit, and similarly the second unit may be referred to as a first unit. The term "and/or" used herein includes any and all combinations of one or more of the associated listed items.
应当理解的是,当一个单元被称为“连接”或“耦合”到另一单元时,其可以直接连接或耦合到所述另一单元,或者可以存在中间单元。与此相对,当一个单元被称为“直接连接”或“直接耦合”到另一单元时,则不存在中间单元。应当按照类似的方式来解释被用于描述单元之间的关系的其他词语(例如“处于...之间”相比于“直接处于...之间”,“与...邻近”相比于“与...直接邻近”等等)。It will be understood that when a unit is referred to as "connected" or "coupled" to another unit, it can be directly connected or coupled to the other unit, or an intermediate unit can be present. In contrast, when a unit is referred to as being "directly connected" or "directly coupled" to another unit, there is no intermediate unit. Other words used to describe the relationship between the units should be interpreted in a similar manner (eg "between" and "directly between" and "adjacent to" Than "directly adjacent to", etc.).
这里所使用的术语仅仅是为了描述具体实施例而不意图限制示例性实施例。除非上下文明确地另有所指,否则这里所使用的单数形式“一个”、“一项”还意图包括复数。还应当理解的是,这里所使用的术语“包括”和/或“包含”规定所陈述的特征、整数、步骤、操作、单元和/或组件的存在,而不排除存在或添加一个或更多其他特征、整数、步骤、操作、单元、组件和/或其组合。The terminology used herein is for the purpose of describing the particular embodiments, The singular forms "a", "an", It is also to be understood that the terms "comprising" and """ Other features, integers, steps, operations, units, components, and/or combinations thereof.
还应当提到的是,在一些替换实现方式中,所提到的功能/动作可以按照不同于附图中标示的顺序发生。举例来说,取决于所涉及的功能/动作,相继示出的两幅图实际上可以基本上同时执行或者有时可以按照相反的顺序来执行。It should also be noted that, in some alternative implementations, the functions/acts noted may occur in a different order than that illustrated in the drawings. For example, two figures shown in succession may in fact be executed substantially concurrently or sometimes in the reverse order, depending on the function/acts involved.
下面结合附图和较佳的实施例对本发明作进一步说明。The invention will now be further described with reference to the drawings and preferred embodiments.
其中,本实施例的方法主要通过移动终端或电子设备来实现;移动终端或电子设备如手机、平板等,也可以具有网络功能,可以与其他终端进行通讯连接,但并限于上述几种。The method of the embodiment is mainly implemented by a mobile terminal or an electronic device; the mobile terminal or the electronic device, such as a mobile phone or a tablet, may also have a network function, and may communicate with other terminals, but is limited to the above.
需要说明的是,移动终端或电子设备仅为举例,其他现有的或今后可能出现的网络设备和客户端如可适用于本发明,也应包含在本发明保护范围以内,并以引用方式包含于此。另外,本发明中的实施例仅为举例说明,实施例与实施例之间可以自由组合,多个实施例之间也可以组合,只要能够实现本发明中的效果即可。It should be noted that the mobile terminal or the electronic device is only an example, and other existing or future network devices and clients may be applicable to the present invention, and are also included in the scope of the present invention and are included by reference. herein. Further, the embodiments of the present invention are merely illustrative, and the embodiments and the embodiments may be freely combined, and a plurality of embodiments may be combined as long as the effects of the present invention can be achieved.
下面结合附图和较佳的实施例对本发明作进一步说明。The invention will now be further described with reference to the drawings and preferred embodiments.
实施例一 Embodiment 1
如图1所示,本实施例中公开一种基于移动终端GPU运行的粒子群优化方法,所述方法包括: As shown in FIG. 1 , in this embodiment, a particle swarm optimization method based on operation of a GPU of a mobile terminal is disclosed, and the method includes:
S101、在CPU端初始化粒子群中粒子的位置和速度;S101. Initialize the position and velocity of the particles in the particle group on the CPU side;
S102、将粒子的位置和速度发送到GPU端;S102. Send the position and speed of the particle to the GPU.
S103、根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;S103. Update the velocity and position of the particle according to the obtained historical optimal particle and the global optimal particle;
S104、根据更新的粒子的速度和位置更新全局最优粒子;S104. Update the global optimal particle according to the updated particle speed and position;
S105、若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。S105. If the current global optimal particle satisfies the iteration condition, the GPU sends the optimal particle to the CPU.
现有技术中,GPU作为协同处理器让CPU的运算能力增强,但是GPU的性能受限算法结构,没有得到最大限度的发挥,本实施例中,利用粒子群优化算法提高GPU的运算速度和运算效率,大大提高GPU的性能,从而大大提高了GPU作为协同处理器协助CPU进行运算的能力,提高了整体的运算能力和运算效率,特别的,在移动终端中,因为受限于体积和散热等,通常移动终端中使用的CPU计算效率都较低,而将本方法运用在移动终端如手机、平板等电子设备上,能够显著的提高移动终端计算效率,让移动终端的运算速度和效率得到大幅度提高,也可以解决在传统移动终端CPU平台运行效率低及收敛速度慢的问题。In the prior art, the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the performance-limited algorithm structure of the GPU is not maximized. In this embodiment, the particle swarm optimization algorithm is used to improve the computing speed and operation of the GPU. Efficiency, greatly improving the performance of the GPU, thereby greatly improving the ability of the GPU to assist the CPU in computing as a cooperative processor, improving the overall computing power and computing efficiency, in particular, in mobile terminals, because of limitations in volume and heat dissipation, etc. Generally, the CPU used in the mobile terminal has low computational efficiency, and the method is applied to mobile devices such as mobile phones and tablets, which can significantly improve the computing efficiency of the mobile terminal and make the computing speed and efficiency of the mobile terminal large. The increase in amplitude can also solve the problem of low operating efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
本实施例中,根据一个示例,如图2所示,若当前的全局最优粒子不满足迭代条件,则进入到所述步骤S103,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置,接下来继续进行步骤S104、S105,直到当前的全局最优粒子满足迭代条件。迭代条件可以是粒子的速度达到预设的阈值,或者位置到达预设条件,或者速度和位置都达到预设的条件等。另外,本实施例中,在GPU端将最优粒子发送至CPU端后,CPU端即可将结果进行输出。具体阐述如下,一种基于移动终端GPU运行的粒子群优化方法,所述方法包括:In this embodiment, according to an example, as shown in FIG. 2, if the current global optimal particle does not satisfy the iterative condition, then the process proceeds to step S103, and the particle is updated according to the acquired historical optimal particle and the global optimal particle. Speed and position, then proceed to steps S104, S105 until the current global optimal particle satisfies the iteration condition. The iteration condition may be that the speed of the particle reaches a preset threshold, or the position reaches a preset condition, or the speed and position reach a preset condition, and the like. In addition, in this embodiment, after the optimal particle is sent to the CPU end on the GPU side, the CPU end can output the result. Specifically described as follows, a particle swarm optimization method based on GPU operation of a mobile terminal, the method comprising:
S101、在CPU端初始化粒子群中粒子的位置和速度;S101. Initialize the position and velocity of the particles in the particle group on the CPU side;
S102、将粒子的位置和速度发送到GPU端;S102. Send the position and speed of the particle to the GPU.
S103、根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;S103. Update the velocity and position of the particle according to the obtained historical optimal particle and the global optimal particle;
S104、根据更新的粒子的速度和位置更新全局最优粒子;S104. Update the global optimal particle according to the updated particle speed and position;
S105、判断当前的全局最优粒子是否满足迭代条件,若满足则GPU端将最优粒子发送至CPU端;否则进入到所述步骤S103进行下一次迭代;S105. Determine whether the current global optimal particle satisfies an iteration condition. If yes, the GPU sends the optimal particle to the CPU; otherwise, the process proceeds to step S103 to perform the next iteration;
S106、将发送至CPU端的最优粒子进行输出。 S106. Output the optimal particles sent to the CPU.
根据另一个示例,如图3所示,为提高PSO(Particle Swarm Optimization,粒子群优化算法)方法的性能,本发明选取星型拓扑结构和环型拓扑结构中的其中一种或两种。图中左边的是星型拓扑结构,右边是环型拓扑结构。星型PSO方法中,粒子群中所有粒子全部相联,即所述星型拓扑结构中,粒子群中的每个粒子均相互连接,每个粒子都可以同除自己以外的其他所有粒子通信,以共享整个粒子群的最佳解。所述环形拓扑结构中,环形拓扑结构中粒子群中的每个粒子跟它的n(n≥1)个邻居粒子相通信,即环形拓扑结构中,粒子群中的每个粒子与其相邻的粒子连接,每个粒子向邻域内的最优位置靠拢,来更新自己的位置,每个粒子只是共享所在邻域内的最优解。According to another example, as shown in FIG. 3, in order to improve the performance of the PSO (Particle Swarm Optimization) method, the present invention selects one or both of a star topology and a ring topology. The left side of the figure is the star topology and the right side is the ring topology. In the star PSO method, all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself. To share the best solution for the entire particle swarm. In the ring topology, each particle in the particle group in the ring topology communicates with its n(n≥1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it. Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.
现有的PSO方法易陷入局部极值的缺点,为了改进PSO方法,包括参数调整,改变搜索网络空间,混合其它算法等。PSO方法性能对网络结构具有强烈的依赖性,因此拓扑结构的改变对PSO算法的收敛性有重要作用,本实施例中对环型拓扑结构和星型拓扑结构的PSO方法进行了改进,结合PSO算法的并行性,解决在传统移动终端CPU平台运行效率低及收敛速度慢的技术问题,对于星型拓扑结构PSO算法,GPU显著地加快了粒子群的收敛速度。The existing PSO method is easy to fall into the disadvantage of local extremum, in order to improve the PSO method, including parameter adjustment, change the search network space, and mix other algorithms. The performance of the PSO method has a strong dependence on the network structure. Therefore, the change of the topology plays an important role in the convergence of the PSO algorithm. In this embodiment, the PSO method of the ring topology and the star topology is improved, combined with PSO. The parallelism of the algorithm solves the technical problem of low efficiency and slow convergence speed of the traditional mobile terminal CPU platform. For the star topology PSO algorithm, the GPU significantly accelerates the convergence speed of the particle group.
根据另一个示例,如图4所示,本实施例中粒子群优化算法是在基于CUDA(Compute Unified Device Architecture,统一计算设备架构)编程模型的,所述CUDA编程模型包括:线程、线程块以及线程格;每个线程都有一个唯一的线程编号和一个私有寄存器;每个线程块拥有一个共享存储器,每个线程块中所有线程均可访问共享内存,至少两个线程块构成一个线程格。所述历史最优粒子和全局最优粒子通过所述CUDA编程模型获取,在CUDA编程模型上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数。所述更新粒子的速度和位置通过所述CUDA编程模型完成,更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。According to another example, as shown in FIG. 4, the particle swarm optimization algorithm in this embodiment is based on a CUDA (Compute Unified Device Architecture) programming model, and the CUDA programming model includes: a thread, a thread block, and Thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory, all threads in each thread block can access shared memory, and at least two thread blocks form a thread grid. The historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function. The speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
在CUDA编程模型中,CPU作为主机,称为Host端,GPU作为协处理器,称为Device端,通过执行大量线程来处理任务,二者各自拥有独立的存储器,和各自的编译器。一个完整的CUDA编程模型。Kernel并行处理部分基于单指令多线程(Single Instruction Multiple Thread,SIMD)计算模型,CUDA利用三个向量将线程组织成为三个不同的层次:线程(thread)、 线程块(block)以及线程格(grid)。thread和block的启动数量由用户设置,本实施例中启动了16*16个线程块,每个块中启动了16*16个线程。每个线程都有一个唯一的线程编号和一个容量较小但速度快的私有寄存器;每个线程块拥有一个共享存储器,块中所有线程均可快速访问共享内存,若干个块构成一个线程格。In the CUDA programming model, the CPU acts as the host, called the Host, and the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler. A complete CUDA programming model. Kernel parallel processing is based on the Single Instruction Multiple Thread (SIMD) computing model. CUDA uses three vectors to organize threads into three different levels: threads, Thread blocks and grids. The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.
PSO算法寻优过程,各粒子只是在更新全局最优时互相交换信息,其他步骤均相互独立。在CUDA上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数;更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。In the PSO algorithm optimization process, each particle exchanges information only when updating the global optimum, and the other steps are independent of each other. When the history is optimal on CUDA, one thread corresponds to one particle, and each thread calls the preset function at the same time; when updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
CUDA编程模型中,主机端初始化粒子的位置和速度,将数据从CPU复制到GPU上,在设备上迭代寻优,最后将最优解复制到CPU输出。GPU性能虽然出色,但其发挥在很大程度上受限于算法结构。在CUDA的使用中,数据结构和对内存的访问对GPU性能的发挥有极大的影响。性能优化主要从四个方面考虑:1)最大优化并行执行;2)优化内存以获得最大的内存带宽;3)优化指令使用以获得最大指令的吞吐量;4)线程块和线程的数量设置。In the CUDA programming model, the host side initializes the position and velocity of the particles, copies the data from the CPU to the GPU, iterates over the device, and finally copies the optimal solution to the CPU output. Although GPU performance is excellent, its performance is largely limited by the algorithm structure. In the use of CUDA, data structures and access to memory have a significant impact on GPU performance. Performance optimization is mainly considered from four aspects: 1) maximum optimized parallel execution; 2) optimized memory for maximum memory bandwidth; 3) optimized instruction usage for maximum instruction throughput; 4) thread block and thread number setting.
根据另一个示例,标准PSO方法的速度和位置更新方程如下:According to another example, the speed and position update equations of the standard PSO method are as follows:
vij(t+1)=ωvij(t)+c1*r1*(pij(t)-xij(t))+c2*r2*(pgj(t)-xij(t))         (1)v ij (t+1)=ωv ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t)) (1)
xij(t+1)=xij(t)+vij(t+1)          (2)x ij (t+1)=x ij (t)+v ij (t+1) (2)
其中,vi(t)=(vi1,vi2,...,vid)为第i个粒子在第t代的速度,vij(t)表示第i个粒子的第j维速度;w为惯性权重0.2~0.9,本实施例中取w=0.5;c1为认知系数,c2为社会系数,常取c1=c2=2;r1和r2为服从均匀分布的[0,1]之间的随机数;pi(t)=(pi1,pi2,...,pid)为第i个粒子的历史最优位置;xi(t)=(xi1,xi2,...,xid)为第i个粒子在第t代的位置;pgj(t)=(pg1,pg2,...,pgd)为群体历史最优位置,PSO方法的寻优过程如图5所示。Where v i (t)=(v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimensional velocity; w is inertia weight 0.2-0.9, in this embodiment, w=0.5; c 1 is the cognitive coefficient, c 2 is the social coefficient, often taking c 1 =c 2 =2; r 1 and r 2 are obeying a uniformly distributed random number between [0, 1]; p i (t) = (p i 1, p i 2, ..., p i d) is the historical optimal position of the i-th particle; x i (t)=(x i 1,x i 2,...,x i d) is the position of the ith particle at the tth generation; p gj (t)=(p g 1,p g 2,.. , p g d) is the optimal position of the group history, and the optimization process of the PSO method is shown in Fig. 5.
本实施例中,设计的并行PSO算法,用以下几个常用benchmark函数进行了测试,以分析和计算,便于比较各种方法的运行速度和运行效率,同时,粒子群优化方法的运行效率和收敛性的确定可以使用以下七种函数的一种或几种确定。其中D表示粒子的维数,xi的范围表示搜索空间,本 实施例主要从计算效率和收敛性两方面与串行PSO算法作了比较分析。In this embodiment, the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods. At the same time, the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x i represents the search space. This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm.
(1)Sphere函数(1) Sphere function
Figure PCTCN2016103892-appb-000008
Figure PCTCN2016103892-appb-000008
(2)Ackley函数(2) Ackley function
Figure PCTCN2016103892-appb-000009
Figure PCTCN2016103892-appb-000009
(3)Schwefel函数(3) Schwefel function
Figure PCTCN2016103892-appb-000010
Figure PCTCN2016103892-appb-000010
(4)Levy函数(4) Levy function
Figure PCTCN2016103892-appb-000011
Figure PCTCN2016103892-appb-000011
(5)Griewank函数(5) Griewank function
Figure PCTCN2016103892-appb-000012
Figure PCTCN2016103892-appb-000012
(6)Rastrigin函数(6) Rastrigin function
Figure PCTCN2016103892-appb-000013
Figure PCTCN2016103892-appb-000013
(7)Rosenbrock函数(7) Rosenbrock function
星型结构PSO算法在CPU和CPU+GPU上的运行时间如表一所示,测试时令N=1000,D=50,M=5000。环型结构PSO算法在CPU和CPU+GPU上的运行时间如表二所示,测试时令N=1000,D=50,M=5000。 The running time of the star structure PSO algorithm on the CPU and CPU + GPU is shown in Table 1. The test time is N=1000, D=50, M=5000. The running time of the ring structure PSO algorithm on the CPU and CPU + GPU is shown in Table 2. The test time is N=1000, D=50, M=5000.
表一 N=1000,D=50,M=5000时星型结构PSO算法求解函数f1~f7的时间Table 1 N=1000, D=50, M=5000 time when the star structure PSO algorithm solves the functions f1~f7
Figure PCTCN2016103892-appb-000015
Figure PCTCN2016103892-appb-000015
表二 N=1000,D=50,M=5000时环型结构PSO算法求解函数f1~f7的时间Table 2 Time when the ring structure PSO algorithm solves the functions f1 to f7 when N=1000, D=50, M=5000
Figure PCTCN2016103892-appb-000016
Figure PCTCN2016103892-appb-000016
计算效率和收敛性对比Comparison of computational efficiency and convergence
上述表一和表二可以看到计算效率的对比,当各参数设置一致的情况下,比较两算法性能,对比表一和表二中加速比部分,可以看到星型算法要好于环型算法。在CPU上的运行时间,二种算法差距并不明显,而在CPU+GPU上,星型算法效率明显高于环型算法。Table 1 and Table 2 above can see the comparison of computational efficiency. When the parameters are set consistently, compare the performance of the two algorithms. Comparing the acceleration ratios in Tables 1 and 2, you can see that the star algorithm is better than the ring algorithm. . In the running time of the CPU, the difference between the two algorithms is not obvious, and on the CPU+GPU, the efficiency of the star algorithm is significantly higher than that of the ring algorithm.
图6和图7描绘了两种拓扑结构的PSO算法分别在CPU和GPU上求解f1和f2收敛曲线,图6和图7可以看出收敛性的对比,参数取值情况为:N=500,D=50,M从0逐渐增大。Figure 6 and Figure 7 depict the PSO algorithms of the two topologies for solving the convergence curves of f 1 and f 2 on the CPU and GPU respectively. Figure 6 and Figure 7 show the convergence of the convergence. The parameter values are: N= 500, D = 50, M gradually increases from 0.
实现了环型和星型拓扑结构的PSO算法在统一计算设备架构(CUDA,Compute Unified Device Architecture)上的寻优过程,通过几个benchmark函数的求解,对比了与在手机CPU上的计算时间和计算结果。本发明结果 显示,两种拓扑结构的PSO方法在计算效率均高于CPU,对于星型结构PSO方法,GPU显著地加快粒子群的收敛速度。The optimization process of the PSO algorithm of the ring and star topology in the Unified Computing Device Architecture (CUDA, Compute Unified Device Architecture) is compared with the computing time of the mobile phone CPU by solving several benchmark functions. Calculation results. Result of the invention It is shown that the PSO method of the two topologies is higher in computational efficiency than the CPU. For the star-structured PSO method, the GPU significantly accelerates the convergence speed of the particle swarm.
随着人类物质生活水平的提高,计算机技术的应用日益普遍,在解决最优化问题的过程中,最优化理论和算法得以迅速发展,形成了如今一系列经典的传统优化理论,广泛应用于科学研究、工程技术、分子生物学、控制系统、金融、交通运输、网络通信、能源等众多领域。With the improvement of human material living standards, the application of computer technology is becoming more and more popular. In the process of solving the optimization problem, the optimization theory and algorithm have developed rapidly, forming a series of classical traditional optimization theories, which are widely used in scientific research. , engineering technology, molecular biology, control systems, finance, transportation, network communications, energy and many other fields.
实施例二Embodiment 2
如图8所示,本实施例中公开一种基于移动终端GPU运行的粒子群优化系统,该系统用于运行实施例一中的方法,所述系统包括:As shown in FIG. 8 , the present embodiment discloses a particle swarm optimization system based on a GPU running on a mobile terminal, where the system is used to run the method in the first embodiment, and the system includes:
初始化模块201,用于在CPU端初始化粒子群中粒子的位置和速度;An initialization module 201, configured to initialize a position and a velocity of a particle in the particle group on the CPU side;
发送模块202,用于将粒子的位置和速度发送到GPU端;The sending module 202 is configured to send the position and speed of the particle to the GPU end;
第一更新模块203,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;The first update module 203 updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
第二更新模块204,根据更新的粒子的速度和位置更新全局最优粒子;The second update module 204 updates the global optimal particle according to the updated particle speed and position;
判定模块205,用于若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。The determining module 205 is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.
现有技术中,GPU作为协同处理器让CPU的运算能力增强,但是GPU的性能受限算法结构,没有得到最大限度的发挥,本实施例中,利用粒子群优化算法提高GPU的运算速度和运算效率,大大提高GPU的性能,从而大大提高了GPU作为协同处理器协助CPU进行运算的能力,提高了整体的运算能力和运算效率,特别的,在移动终端中,因为受限于体积和散热等,通常移动终端中使用的CPU计算效率都较低,而将本方法运用在移动终端如手机、平板等电子设备上,能够显著的提高移动终端计算效率,让移动终端的运算速度和效率得到大幅度提高,也可以解决在传统移动终端CPU平台运行效率低及收敛速度慢的问题。In the prior art, the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the performance-limited algorithm structure of the GPU is not maximized. In this embodiment, the particle swarm optimization algorithm is used to improve the computing speed and operation of the GPU. Efficiency, greatly improving the performance of the GPU, thereby greatly improving the ability of the GPU to assist the CPU in computing as a cooperative processor, improving the overall computing power and computing efficiency, in particular, in mobile terminals, because of limitations in volume and heat dissipation, etc. Generally, the CPU used in the mobile terminal has low computational efficiency, and the method is applied to mobile devices such as mobile phones and tablets, which can significantly improve the computing efficiency of the mobile terminal and make the computing speed and efficiency of the mobile terminal large. The increase in amplitude can also solve the problem of low operating efficiency and slow convergence speed of the traditional mobile terminal CPU platform.
本实施例中,根据一个示例,如图9所示,判定模块205还用于,若当前的全局最优粒子不满足迭代条件,则进入到第一更新模块203,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置,接下来继续进行入到第二更新模块204、判定模块205,直到当前的全局最优粒子满足迭代条件。迭代条件可以是粒子的速度达到预设的阈值,或者位置到达预设条件,或者速度和位置都达到预设的条件等。本实施例中,所示系统还 包括输出模块206,输出模块206用于在GPU端将最优粒子发送至CPU端后,CPU端即可将结果进行输出。具体阐述如下,一种基于移动终端GPU运行的粒子群优化系统,该系统用于运行实施例一中的方法,所述系统包括:In this embodiment, according to an example, as shown in FIG. 9, the determining module 205 is further configured to: if the current global optimal particle does not satisfy the iterative condition, proceed to the first update module 203, according to the acquired historical optimal particle. And the global optimal particle updates the speed and position of the particle, and then proceeds to the second update module 204, the decision module 205 until the current global optimal particle satisfies the iteration condition. The iteration condition may be that the speed of the particle reaches a preset threshold, or the position reaches a preset condition, or the speed and position reach a preset condition, and the like. In this embodiment, the system shown is also The output module 206 is configured to output the optimal particles to the CPU end on the GPU side, and the CPU end can output the result. Specifically, the method is as follows: a particle swarm optimization system based on the operation of a GPU of a mobile terminal, the system is used to run the method in the first embodiment, the system includes:
初始化模块201,用于在CPU端初始化粒子群中粒子的位置和速度;An initialization module 201, configured to initialize a position and a velocity of a particle in the particle group on the CPU side;
发送模块202,用于将粒子的位置和速度发送到GPU端;The sending module 202 is configured to send the position and speed of the particle to the GPU end;
第一更新模块203,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;The first update module 203 updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
第二更新模块204,根据更新的粒子的速度和位置更新全局最优粒子;The second update module 204 updates the global optimal particle according to the updated particle speed and position;
判定模块205,用于判断当前的全局最优粒子是否满足迭代条件,若满足则GPU端将最优粒子发送至CPU端;否则进入到所述第一更新模块203进行下一次迭代;The determining module 205 is configured to determine whether the current global optimal particle satisfies an iteration condition, and if yes, the GPU sends the optimal particle to the CPU; otherwise, enters the first update module 203 to perform the next iteration;
输出模块106、将发送至CPU端的最优粒子进行输出。The output module 106 outputs the optimal particles to be sent to the CPU.
根据另一个示例,为提高PSO(Particle Swarm Optimization,粒子群优化算法)方法的性能,本发明选取星型拓扑结构和环型拓扑结构中的其中一种或两种,可结合图3所示,图中左边的是星型拓扑结构,右边是环型拓扑结构。。星型PSO方法中,粒子群中所有粒子全部相联,即所述星型拓扑结构中,粒子群中的每个粒子均相互连接,每个粒子都可以同除自己以外的其他所有粒子通信,以共享整个粒子群的最佳解。所述环形拓扑结构中,环形拓扑结构中粒子群中的每个粒子跟它的n(n≥1)个邻居粒子相通信,即环形拓扑结构中,粒子群中的每个粒子与其相邻的粒子连接,每个粒子向邻域内的最优位置靠拢,来更新自己的位置,每个粒子只是共享所在邻域内的最优解。According to another example, in order to improve the performance of the PSO (Particle Swarm Optimization) method, the present invention selects one or two of a star topology and a ring topology, which may be combined with FIG. The left side of the figure is the star topology and the right side is the ring topology. . In the star PSO method, all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself. To share the best solution for the entire particle swarm. In the ring topology, each particle in the particle group in the ring topology communicates with its n(n≥1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it. Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.
现有的PSO方法易陷入局部极值的缺点,为了改进PSO方法,包括参数调整,改变搜索网络空间,混合其它算法等。PSO方法性能对网络结构具有强烈的依赖性,因此拓扑结构的改变对PSO算法的收敛性有重要作用,本实施例中对环型拓扑结构和星型拓扑结构的PSO方法进行了改进,结合PSO算法的并行性,解决在传统移动终端CPU平台运行效率低及收敛速度慢的技术问题,对于星型拓扑结构PSO算法,GPU显著地加快了粒子群的收敛速度。The existing PSO method is easy to fall into the disadvantage of local extremum, in order to improve the PSO method, including parameter adjustment, change the search network space, and mix other algorithms. The performance of the PSO method has a strong dependence on the network structure. Therefore, the change of the topology plays an important role in the convergence of the PSO algorithm. In this embodiment, the PSO method of the ring topology and the star topology is improved, combined with PSO. The parallelism of the algorithm solves the technical problem of low efficiency and slow convergence speed of the traditional mobile terminal CPU platform. For the star topology PSO algorithm, the GPU significantly accelerates the convergence speed of the particle group.
根据另一个示例,可结合图4所示,本实施例中粒子群优化算法是在基于CUDA(Compute Unified Device Architecture,统一计算设备架构)编程 模型的,所述CUDA编程模型包括:线程、线程块以及线程格;每个线程都有一个唯一的线程编号和一个私有寄存器;每个线程块拥有一个共享存储器,每个线程块中所有线程均可访问共享内存,至少两个线程块构成一个线程格。所述历史最优粒子和全局最优粒子通过所述CUDA编程模型获取,在CUDA编程模型上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数。所述更新粒子的速度和位置通过所述CUDA编程模型完成,更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。According to another example, as shown in FIG. 4, the particle swarm optimization algorithm in this embodiment is based on CUDA (Compute Unified Device Architecture) programming. Model, the CUDA programming model includes: threads, thread blocks, and thread grids; each thread has a unique thread number and a private register; each thread block has a shared memory, and all threads in each thread block Access to shared memory, at least two thread blocks form a thread grid. The historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function. The speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
在CUDA编程模型中,CPU作为主机,称为Host端,GPU作为协处理器,称为Device端,通过执行大量线程来处理任务,二者各自拥有独立的存储器,和各自的编译器。如图3所示,一个完整的CUDA编程模型。Kernel并行处理部分基于单指令多线程(Single Instruction Multiple Thread,SIMD)计算模型,CUDA利用三个向量将线程组织成为三个不同的层次:线程(thread)、线程块(block)以及线程格(grid)。thread和block的启动数量由用户设置,本实施例中启动了16*16个线程块,每个块中启动了16*16个线程。每个线程都有一个唯一的线程编号和一个容量较小但速度快的私有寄存器;每个线程块拥有一个共享存储器,块中所有线程均可快速访问共享内存,若干个块构成一个线程格。In the CUDA programming model, the CPU acts as the host, called the Host, and the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler. As shown in Figure 3, a complete CUDA programming model. Kernel parallel processing is based on the Single Instruction Multiple Thread (SIMD) computing model. CUDA uses three vectors to organize threads into three different levels: threads, threads, and grids. ). The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.
PSO算法寻优过程,各粒子只是在更新全局最优时互相交换信息,其他步骤均相互独立。在CUDA上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数;更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。In the PSO algorithm optimization process, each particle exchanges information only when updating the global optimum, and the other steps are independent of each other. When the history is optimal on CUDA, one thread corresponds to one particle, and each thread calls the preset function at the same time; when updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.
CUDA编程模型中,主机端初始化粒子的位置和速度,将数据从CPU复制到GPU上,在设备上迭代寻优,最后将最优解复制到CPU输出。GPU性能虽然出色,但其发挥在很大程度上受限于算法结构。在CUDA的使用中,数据结构和对内存的访问对GPU性能的发挥有极大的影响。性能优化主要从四个方面考虑:1)最大优化并行执行;2)优化内存以获得最大的内存带宽;3)优化指令使用以获得最大指令的吞吐量;4)线程块和线程的数量设置。In the CUDA programming model, the host side initializes the position and velocity of the particles, copies the data from the CPU to the GPU, iterates over the device, and finally copies the optimal solution to the CPU output. Although GPU performance is excellent, its performance is largely limited by the algorithm structure. In the use of CUDA, data structures and access to memory have a significant impact on GPU performance. Performance optimization is mainly considered from four aspects: 1) maximum optimized parallel execution; 2) optimized memory for maximum memory bandwidth; 3) optimized instruction usage for maximum instruction throughput; 4) thread block and thread number setting.
根据另一个示例,标准PSO方法的速度和位置更新方程如下:According to another example, the speed and position update equations of the standard PSO method are as follows:
vij(t+1)=ωvij(t)+c1*r1*(pij(t)-xij(t))+c2*r2*(pgj(t)-xij(t))         (1) v ij (t+1)=ωv ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t)) (1)
xij(t+1)=xij(t)+vij(t+1)         (2)x ij (t+1)=x ij (t)+v ij (t+1) (2)
其中,vi(t)=(vi1,vi2,...,vid)为第i个粒子在第t代的速度,vij(t)表示第i个粒子的第j维速度;w为惯性权重0.2~0.9,本实施例中取w=0.5;c1为认知系数,c2为社会系数,常取c1=c2=2;r1和r2为服从均匀分布的[0,1]之间的随机数;pi(t)=(pi1,pi2,...,pid)为第i个粒子的历史最优位置;xi(t)=(xi1,xi2,...,xid)为第i个粒子在第t代的位置;pgj(t)=(pg1,pg2,...,pgd)为群体历史最优位置,PSO方法的寻优过程可以结合图5所示。Where v i (t)=(v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimensional velocity; w is inertia weight 0.2-0.9, in this embodiment, w=0.5; c 1 is the cognitive coefficient, c 2 is the social coefficient, often taking c 1 =c 2 =2; r 1 and r 2 are obeying a uniformly distributed random number between [0, 1]; p i (t) = (p i 1, p i 2, ..., p i d) is the historical optimal position of the i-th particle; x i (t) = (x i 1 , x i 2, ..., x i d) for the location of the i-th particle generation t; p gj (t) = ( p g 1, p g 2, .. , p g d) is the optimal position of the group history, and the optimization process of the PSO method can be combined with FIG. 5 .
本实施例中,设计的并行PSO算法,用以下几个常用benchmark函数进行了测试,以分析和计算,便于比较各种方法的运行速度和运行效率,同时,粒子群优化方法的运行效率和收敛性的确定可以使用以下七种函数的一种或几种确定。其中D表示粒子的维数,xi的范围表示搜索空间,本实施例主要从计算效率和收敛性两方面与串行PSO算法作了比较分析,具体函数如下:In this embodiment, the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods. At the same time, the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x i represents the search space. This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm. The specific functions are as follows:
(1)Sphere函数(1) Sphere function
Figure PCTCN2016103892-appb-000017
Figure PCTCN2016103892-appb-000017
(2)Ackley函数(2) Ackley function
Figure PCTCN2016103892-appb-000018
Figure PCTCN2016103892-appb-000018
(3)Schwefel函数(3) Schwefel function
Figure PCTCN2016103892-appb-000019
Figure PCTCN2016103892-appb-000019
(4)Levy函数(4) Levy function
Figure PCTCN2016103892-appb-000021
Figure PCTCN2016103892-appb-000021
(5)Griewank函数(5) Griewank function
Figure PCTCN2016103892-appb-000022
Figure PCTCN2016103892-appb-000022
(6)Rastrigin函数(6) Rastrigin function
Figure PCTCN2016103892-appb-000023
Figure PCTCN2016103892-appb-000023
(7)Rosenbrock函数(7) Rosenbrock function
Figure PCTCN2016103892-appb-000024
Figure PCTCN2016103892-appb-000024
星型结构PSO算法在CPU和CPU+GPU上的运行时间如表一所示,测试时令N=1000,D=50,M=5000。环型结构PSO算法在CPU和CPU+GPU上的运行时间如表二所示,测试时令N=1000,D=50,M=5000。The running time of the star structure PSO algorithm on the CPU and CPU + GPU is shown in Table 1. The test time is N=1000, D=50, M=5000. The running time of the ring structure PSO algorithm on the CPU and CPU + GPU is shown in Table 2. The test time is N=1000, D=50, M=5000.
表一 N=1000,D=50,M=5000时星型结构PSO算法求解函数f1~f7的时间Table 1 N=1000, D=50, M=5000 time when the star structure PSO algorithm solves the functions f1~f7
Figure PCTCN2016103892-appb-000025
Figure PCTCN2016103892-appb-000025
表二 N=1000,D=50,M=5000时环型结构PSO算法求解函数f1~f7的时间Table 2 Time when the ring structure PSO algorithm solves the functions f1 to f7 when N=1000, D=50, M=5000
Figure PCTCN2016103892-appb-000026
Figure PCTCN2016103892-appb-000026
Figure PCTCN2016103892-appb-000027
Figure PCTCN2016103892-appb-000027
计算效率和收敛性对比Comparison of computational efficiency and convergence
上述表一和表二可以看到计算效率的对比,当各参数设置一致的情况下,比较两算法性能,对比表一和表二中加速比部分,可以看到星型算法要好于环型算法。在CPU上的运行时间,二种算法差距并不明显,而在CPU+GPU上,星型算法效率明显高于环型算法。Table 1 and Table 2 above can see the comparison of computational efficiency. When the parameters are set consistently, compare the performance of the two algorithms. Comparing the acceleration ratios in Tables 1 and 2, you can see that the star algorithm is better than the ring algorithm. . In the running time of the CPU, the difference between the two algorithms is not obvious, and on the CPU+GPU, the efficiency of the star algorithm is significantly higher than that of the ring algorithm.
可结合图6和图7描绘了两种拓扑结构的PSO算法分别在CPU和GPU上求解f1和f2收敛曲线,图6和图7可以看出收敛性的对比,参数取值情况为:N=500,D=50,M从0逐渐增大。Together with FIG. 6 and 7 depict the PSO algorithm for solving two topologies are f 1 and f 2 in the convergence curve CPU and the GPU, Figures 6 and 7 it can be seen the comparative convergence of the parameter values for the case where: N=500, D=50, and M gradually increases from 0.
实现了环型和星型拓扑结构的PSO算法在统一计算设备架构(CUDA,Compute Unified Device Architecture)上的寻优过程,通过几个benchmark函数的求解,对比了与在手机CPU上的计算时间和计算结果。本发明结果显示,两种拓扑结构的PSO方法在计算效率均高于CPU,对于星型结构PSO方法,GPU显著地加快粒子群的收敛速度。The optimization process of the PSO algorithm of the ring and star topology in the Unified Computing Device Architecture (CUDA, Compute Unified Device Architecture) is compared with the computing time of the mobile phone CPU by solving several benchmark functions. Calculation results. The results show that the PSO method of the two topologies is higher in computational efficiency than the CPU. For the star-structured PSO method, the GPU significantly accelerates the convergence speed of the particle swarm.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。 The above is a further detailed description of the present invention in connection with the specific preferred embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Claims (10)

  1. 一种基于移动终端GPU运行的粒子群优化方法,其特征在于,所述方法包括:A particle swarm optimization method based on GPU operation of a mobile terminal, characterized in that the method comprises:
    在CPU端初始化粒子群中粒子的位置和速度;Initializing the position and velocity of the particles in the particle group on the CPU side;
    将粒子的位置和速度发送到GPU端;Send the position and speed of the particle to the GPU;
    根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;Updating the velocity and position of the particle based on the obtained historical optimal particle and the global optimal particle;
    根据更新的粒子的速度和位置更新全局最优粒子;Updating global optimal particles based on the speed and position of the updated particles;
    若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。If the current global optimal particle satisfies the iterative condition, the GPU sends the optimal particle to the CPU.
  2. 根据权利要求1所述的一种方法,其特征在于,所述粒子群的计算模型为星型拓扑结构,所述星型拓扑结构中,所述粒子群中的每个粒子均相互连接。A method according to claim 1, wherein the computational model of the population of particles is a star topology in which each of the population of particles is connected to each other.
  3. 根据权利要求1所述的一种方法,其特征在于,所述粒子群的计算模型为环型拓扑结构,所述环形拓扑结构中,所述粒子群中的每个粒子与其相邻的粒子连接。A method according to claim 1, wherein the calculation model of the particle group is a ring topology in which each particle in the particle group is connected to its neighboring particles. .
  4. 根据权利要求1所述的一种方法,其特征在于,所述粒子群中的粒子的速度和位置更新方程如下:A method according to claim 1 wherein the velocity and position update equations of the particles in the population of particles are as follows:
    vij(t+1)=ωvij(t)+c1*r1*(pij(t)-xij(t))+c2*r2*(pgj(t)-xij(t))v ij (t+1)=ωv ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t))
    xij(t+1)=xij(t)+vij(t+1)x ij (t+1)=x ij (t)+v ij (t+1)
    其中,vi(t)=(vi1,vi2,...,vid)为第i个粒子在第t代的速度,vij(t)表示第i个粒子的第j维速度;w为惯性权重0.2~0.9;c1为认知系数,c2为社会系数;r1和r2为服从均匀分布的[0,1]之间的随机数;pi(t)=(pi1,pi2,...,pid)为第i个粒子的历史最优位置;xi(t)=(xi1,xi2,...,xid)为第i个粒子在第t代的位置;pgj(t)=(pg1,pg2,...,pgd)为群体历史最优位置。Where v i (t)=(v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimension velocity; w is the inertia weight 0.2-0.9; c 1 is the cognitive coefficient, c 2 is the social coefficient; r 1 and r 2 are random numbers obeying the uniformly distributed [0, 1]; p i (t) =(p i 1,p i 2,...,p i d) is the historical optimal position of the ith particle; x i (t)=(x i 1,x i 2,...,x i d) is the position of the i-th particle at the t-th generation; p gj (t)=(p g 1, p g 2, . . . , p g d) is the optimal position of the group history.
  5. 根据权利要求1所述的一种方法,其特征在于,所述方法基于CUDA编程模型运行,所述CUDA编程模型包括:线程、线程块以及线程格;每个线程都有一个唯一的线程编号和一个私有寄存器;每个线程块拥有一个共享存储器,每个线程块中所有线程均可访问共享内存,至少两个线程块构成一个线程格。A method according to claim 1, wherein said method operates based on a CUDA programming model, said CUDA programming model comprising: a thread, a thread block, and a thread grid; each thread has a unique thread number and A private register; each thread block has a shared memory, all threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.
  6. 根据权利要求5所述的一种方法,其特征在于,所述历史最优粒子和全局最优粒子通过所述CUDA编程模型获取,在CUDA编程模型上获取历史最优时,一个线程对应一个粒子,各线程同时调用预设函数。 The method according to claim 5, wherein the historical optimal particle and the global optimal particle are obtained by the CUDA programming model, and when a historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle. Each thread calls the preset function at the same time.
  7. 根据权利要求5所述的一种方法,其特征在于,所述更新粒子的速度和位置通过所述CUDA编程模型完成,更新位置和速度时,一个线程对应粒子的每一维;均按线程索引来读取数据并处理。The method according to claim 5, wherein the speed and position of the updated particles are completed by the CUDA programming model, and when updating the position and speed, one thread corresponds to each dimension of the particle; To read the data and process it.
  8. 根据权利要求1所述的一种方法,其特征在于,所述粒子群优化方法的运行效率和收敛性的确定包括使用以下七种函数的一种或几种确定:(1)Sphere函数A method according to claim 1, wherein the determination of the operational efficiency and convergence of the particle swarm optimization method comprises determining one or more of the following seven functions: (1) Sphere function
    Figure PCTCN2016103892-appb-100001
    Figure PCTCN2016103892-appb-100001
    (2)Ackley函数(2) Ackley function
    Figure PCTCN2016103892-appb-100002
    Figure PCTCN2016103892-appb-100002
    (3)Schwefel函数(3) Schwefel function
    Figure PCTCN2016103892-appb-100003
    Figure PCTCN2016103892-appb-100003
    (4)Levy函数(4) Levy function
    Figure PCTCN2016103892-appb-100004
    Figure PCTCN2016103892-appb-100004
    (5)Griewank函数(5) Griewank function
    Figure PCTCN2016103892-appb-100005
    Figure PCTCN2016103892-appb-100005
    (6)Rastrigin函数(6) Rastrigin function
    Figure PCTCN2016103892-appb-100006
    Figure PCTCN2016103892-appb-100006
    (7)Rosenbrock函数(7) Rosenbrock function
    Figure PCTCN2016103892-appb-100007
    Figure PCTCN2016103892-appb-100007
    Figure PCTCN2016103892-appb-100008
    Figure PCTCN2016103892-appb-100008
    其中D表示粒子的维数,xi的范围表示搜索空间。Where D represents the dimension of the particle and the range of x i represents the search space.
  9. 根据权利要求1所述的一种方法,其特征在于,若当前的全局最优粒子不满足迭代条件,则进入到所述根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置的步骤。A method according to claim 1, wherein if the current global optimal particle does not satisfy the iterative condition, then the speed and position of the updated particle based on the acquired historical optimal particle and the global optimal particle are entered. A step of.
  10. 一种基于移动终端GPU运行的粒子群优化系统,其特征在于,所述系统包括:A particle swarm optimization system based on GPU operation of a mobile terminal, characterized in that the system comprises:
    初始化模块,用于在CPU端初始化粒子群中粒子的位置和速度;An initialization module for initializing the position and velocity of particles in the particle group on the CPU side;
    发送模块,用于将粒子的位置和速度发送到GPU端;a sending module for sending the position and speed of the particle to the GPU;
    第一更新模块,根据获取的历史最优粒子和全局最优粒子更新粒子的速度和位置;The first update module updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;
    第二更新模块,根据更新的粒子的速度和位置更新全局最优粒子;a second update module that updates global optimal particles according to the speed and position of the updated particles;
    判定模块,用于若当前的全局最优粒子满足迭代条件,则GPU端将最优粒子发送至CPU端。 The determining module is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.
PCT/CN2016/103892 2016-01-21 2016-10-29 Particle swarm optimization method and system based on gpu operation of mobile terminal WO2017124809A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610040738.XA CN105718998A (en) 2016-01-21 2016-01-21 Particle swarm optimization method based on mobile terminal GPU operation and system thereof
CN201610040738.X 2016-01-21

Publications (1)

Publication Number Publication Date
WO2017124809A1 true WO2017124809A1 (en) 2017-07-27

Family

ID=56153722

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103892 WO2017124809A1 (en) 2016-01-21 2016-10-29 Particle swarm optimization method and system based on gpu operation of mobile terminal

Country Status (2)

Country Link
CN (1) CN105718998A (en)
WO (1) WO2017124809A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177874A (en) * 2019-12-17 2020-05-19 北京航空航天大学杭州创新研究院 Method for rapidly designing airway network topological structure by using GPU
CN111337050A (en) * 2020-03-20 2020-06-26 深圳市汇泰科电子有限公司 Zero-speed judgment condition and step counting method based on multi-condition fusion
CN111858029A (en) * 2020-06-16 2020-10-30 国网福建省电力有限公司信息通信分公司 Storm cluster load balancing method and system based on discrete particle swarm
CN112183008A (en) * 2019-06-18 2021-01-05 北京理工大学 Terminal resistance matching method of CAN bus network
CN112395090A (en) * 2020-11-19 2021-02-23 华侨大学 Intelligent hybrid optimization method for service placement in mobile edge computing
CN113011097A (en) * 2021-03-26 2021-06-22 云南电网有限责任公司电力科学研究院 Multi-population self-adaptive cooperation optimization method
WO2022033446A1 (en) * 2020-08-10 2022-02-17 北京字节跳动网络技术有限公司 Dynamic fluid effect processing method and apparatus, and electronic device and readable medium
CN114172541A (en) * 2021-12-02 2022-03-11 广东电网有限责任公司 Energy consumption balance control method, device and system for power line communication network
CN114944999A (en) * 2022-04-08 2022-08-26 北京遥测技术研究所 High-speed routing communication system and method based on GPU and particle swarm optimization algorithm

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718998A (en) * 2016-01-21 2016-06-29 上海斐讯数据通信技术有限公司 Particle swarm optimization method based on mobile terminal GPU operation and system thereof
CN106502632B (en) * 2016-10-28 2019-01-18 武汉大学 A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam
CN108564213B (en) * 2018-04-10 2022-05-13 中国水利水电科学研究院 GPU acceleration-based parallel reservoir group flood control optimal scheduling method
CN108845828B (en) * 2018-05-29 2021-01-08 深圳市国微电子有限公司 Coprocessor, matrix operation acceleration method and system
CN111815749A (en) * 2019-09-03 2020-10-23 厦门雅基软件有限公司 Particle calculation method, particle calculation device, electronic equipment and computer-readable storage medium
CN113362425B (en) * 2021-06-18 2022-07-19 中科三清科技有限公司 Image fusion method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1753031A (en) * 2005-11-10 2006-03-29 北京航空航天大学 Particle system based on GPU
CN101685530A (en) * 2008-09-23 2010-03-31 中国科学院过程工程研究所 Method for calculating particles on GPU by utilizing multi-body interaction model
CN101727653A (en) * 2008-10-31 2010-06-09 中国科学院过程工程研究所 Graphics processing unit based discrete simulation computation method of multicomponent system
CN102243770A (en) * 2011-07-19 2011-11-16 南昌航空大学 Method for quickly drawing realistic rendering of naval battlefield based on OSG
CN102982506A (en) * 2012-11-13 2013-03-20 沈阳信达信息科技有限公司 Particle system optimization based on graphic processing unit (GPU)
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
CN105718998A (en) * 2016-01-21 2016-06-29 上海斐讯数据通信技术有限公司 Particle swarm optimization method based on mobile terminal GPU operation and system thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8090665B2 (en) * 2008-09-24 2012-01-03 Nec Laboratories America, Inc. Finding communities and their evolutions in dynamic social network
CN103646281B (en) * 2013-12-13 2017-07-21 浪潮电子信息产业股份有限公司 It is a kind of based on population computational methods on multiple populations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1753031A (en) * 2005-11-10 2006-03-29 北京航空航天大学 Particle system based on GPU
CN101685530A (en) * 2008-09-23 2010-03-31 中国科学院过程工程研究所 Method for calculating particles on GPU by utilizing multi-body interaction model
CN101727653A (en) * 2008-10-31 2010-06-09 中国科学院过程工程研究所 Graphics processing unit based discrete simulation computation method of multicomponent system
CN102243770A (en) * 2011-07-19 2011-11-16 南昌航空大学 Method for quickly drawing realistic rendering of naval battlefield based on OSG
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
CN102982506A (en) * 2012-11-13 2013-03-20 沈阳信达信息科技有限公司 Particle system optimization based on graphic processing unit (GPU)
CN105718998A (en) * 2016-01-21 2016-06-29 上海斐讯数据通信技术有限公司 Particle swarm optimization method based on mobile terminal GPU operation and system thereof

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183008A (en) * 2019-06-18 2021-01-05 北京理工大学 Terminal resistance matching method of CAN bus network
CN112183008B (en) * 2019-06-18 2022-07-22 北京理工大学 Terminal resistance matching method of CAN bus network
CN111177874B (en) * 2019-12-17 2023-06-09 北京航空航天大学杭州创新研究院 Method for rapidly designing road network topology structure by using GPU
CN111177874A (en) * 2019-12-17 2020-05-19 北京航空航天大学杭州创新研究院 Method for rapidly designing airway network topological structure by using GPU
CN111337050A (en) * 2020-03-20 2020-06-26 深圳市汇泰科电子有限公司 Zero-speed judgment condition and step counting method based on multi-condition fusion
CN111337050B (en) * 2020-03-20 2023-03-28 深圳市汇泰科电子有限公司 Zero-speed judgment condition and step counting method based on multi-condition fusion
CN111858029B (en) * 2020-06-16 2023-06-27 国网福建省电力有限公司信息通信分公司 Storm cluster load balancing method and system based on discrete particle swarm
CN111858029A (en) * 2020-06-16 2020-10-30 国网福建省电力有限公司信息通信分公司 Storm cluster load balancing method and system based on discrete particle swarm
WO2022033446A1 (en) * 2020-08-10 2022-02-17 北京字节跳动网络技术有限公司 Dynamic fluid effect processing method and apparatus, and electronic device and readable medium
CN112395090B (en) * 2020-11-19 2023-05-30 华侨大学 Intelligent hybrid optimization method for service placement in mobile edge calculation
CN112395090A (en) * 2020-11-19 2021-02-23 华侨大学 Intelligent hybrid optimization method for service placement in mobile edge computing
CN113011097A (en) * 2021-03-26 2021-06-22 云南电网有限责任公司电力科学研究院 Multi-population self-adaptive cooperation optimization method
CN113011097B (en) * 2021-03-26 2024-03-22 云南电网有限责任公司电力科学研究院 Multi-group self-adaptive collaborative optimization method
CN114172541A (en) * 2021-12-02 2022-03-11 广东电网有限责任公司 Energy consumption balance control method, device and system for power line communication network
CN114944999A (en) * 2022-04-08 2022-08-26 北京遥测技术研究所 High-speed routing communication system and method based on GPU and particle swarm optimization algorithm
CN114944999B (en) * 2022-04-08 2023-10-10 北京遥测技术研究所 High-speed route communication system and method based on GPU and particle swarm optimization algorithm

Also Published As

Publication number Publication date
CN105718998A (en) 2016-06-29

Similar Documents

Publication Publication Date Title
WO2017124809A1 (en) Particle swarm optimization method and system based on gpu operation of mobile terminal
Lv et al. Intelligent edge computing based on machine learning for smart city
Pan et al. GPU-based parallel collision detection for fast motion planning
Blej et al. Comparison of Mamdani-type and Sugeno-type fuzzy inference systems for fuzzy real time scheduling
Ghosh et al. Inter-particle communication and search-dynamics of lbest particle swarm optimizers: An analysis
Ryu et al. Improved informed RRT* using gridmap skeletonization for mobile robot path planning
CN109840154A (en) A kind of computation migration method that task based access control relies under mobile cloud environment
US11281232B2 (en) Systems and methods for multi-agent system control using consensus and saturation constraints
Li et al. Resource scheduling based on improved spectral clustering algorithm in edge computing
Du et al. Model parallelism optimization for distributed inference via decoupled CNN structure
Kashani et al. Using simulated annealing for task scheduling in distributed systems
CN112817730A (en) Deep neural network service batch processing scheduling method and system and GPU
CN112732436A (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
Zhang et al. Parallel planning of Marine Observation Tasks Based on threading building blocks
Fu et al. Modified atom search optimization based on immunologic mechanism and reinforcement learning
Liu et al. A set-based discrete differential evolution algorithm
Darmawan et al. Grid computing process improvement through computing resource scheduling using genetic algorithm and Tabu Search integration
Ding et al. A task scheduling algorithm for heterogeneous systems using aco
CN110415162B (en) Adaptive graph partitioning method facing heterogeneous fusion processor in big data
Leifu et al. Improved YSGA algorithm combining declining strategy and fuch chaotic mechanism
dos Santos et al. Parallel multi-speed pursuit-evasion game algorithms
Ying et al. Optimal scheduling simulation of software for multi-tenant in cloud computing environment
Zhang et al. Deploying GIS services into the edge: A study from performance evaluation and optimization viewpoint
Teng et al. A New Frog Leaping Algorithm Based on Simulated Annealing and Immunization Algorithm for Low-power Mapping in Network-on-chip.
Cao et al. Optimization Study of KNN Classification Algorithm on Large-Scale Datasets: Real-Time Optimization Strategy Based on Balanced KD Tree and Multi-threaded Parallel Computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16886069

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16886069

Country of ref document: EP

Kind code of ref document: A1