WO2017124809A1

WO2017124809A1 - Particle swarm optimization method and system based on gpu operation of mobile terminal

Info

Publication number: WO2017124809A1
Application number: PCT/CN2016/103892
Authority: WO
Inventors: 刘小东
Original assignee: 上海斐讯数据通信技术有限公司
Priority date: 2016-01-21
Filing date: 2016-10-29
Publication date: 2017-07-27
Also published as: CN105718998A

Abstract

A particle swarm optimization method based on a GPU operation of a mobile terminal. The method comprises: initializing, at a CPU end, the position and speed of a particle in a particle swarm (S101); sending the position and speed of the particle to a GPU end (S102); updating the speed and position of the particle according to an acquired historical optimal particle and an overall optimal particle (S103); updating the overall optimal particle according to the updated speed and position of the particle (S104); and if the current overall optimal particle satisfies an iteration condition, the GPU end sending the optimal particle to the CPU end (S105). By means of a particle swarm optimization algorithm, the operation speed and operation efficiency of a GPU are improved, and the performance of the GPU is greatly improved such that the capability of the GPU, as a coprocessor, in assisting a CPU to perform an operation is greatly improved, and the whole operation capability and operation efficiency are improved.

Description

Particle swarm optimization method and system based on mobile terminal GPU operation

Technical field

The present invention relates to the field of computers, and more particularly to a particle swarm optimization method based on GPU operation of a mobile terminal.

Background technique

At present, the CPU (central processing unit) core frequency has reached the extreme, the real-time processing of massive data has far exceeded the load of the CPU. This development will undoubtedly bring new ways to solve the big data problem, GPU (graphic processor) At first, it was only responsible for graphics rendering. Until NVIDIA announced the GeForce series of GPUs, GPUs began to be used in general-purpose computing; the introduction of CUDA (Compute Unified Device Architecture) allowed development to quickly transition from CPU to GPU. The collaborative work of GPU and CPU has been widely used in oil exploration, astronomical calculation, fluid mechanics simulation, biological computing, etc., and has achieved several times or even dozens of times of acceleration ratio.

However, in the existing mobile terminals, the efficiency of the GPU as a cooperative processing is limited, and the work efficiency of assisting the CPU for processing is not ideal, and therefore, a more efficient operation method is urgently needed.

Summary of the invention

The technical problem to be solved by the present invention is to provide a particle swarm optimization method based on the operation of a GPU of a mobile terminal, which can improve the processing efficiency of the GPU, thereby improving the processing efficiency of the GPU cooperating CPU.

The object of the present invention is achieved by the following technical solutions:

A particle swarm optimization method based on GPU operation of a mobile terminal, the method comprising:

Initializing the position and velocity of the particles in the particle group on the CPU side;

Send the position and speed of the particle to the GPU;

Updating the velocity and position of the particle based on the obtained historical optimal particle and the global optimal particle;

Updating global optimal particles based on the speed and position of the updated particles;

If the current global optimal particle satisfies the iterative condition, the GPU sends the optimal particle to the CPU.

Preferably, the calculation model of the particle group is a star topology, in which each particle in the particle group is connected to each other.

Preferably, the calculation model of the particle group is a ring topology in which each particle in the particle group is connected to its adjacent particles.

Preferably, the velocity and position update equations of the particles in the particle population are as follows:

v _ij (t+1)=ωv _ij (t)+c ₁ *r ₁ *(p _ij (t)-x _ij (t))+c ₂ *r ₂ *(p _gj (t)-x _ij ( t))

x _ij (t+1)=x _ij (t)+v _ij (t+1)

Where v _i (t)=(v _i 1,v _i 2,...,v _i d) is the velocity of the ith particle at the tth generation, and v _ij (t) represents the jth of the ith particle Dimension velocity; w is the inertia weight 0.2-0.9; c ₁ is the cognitive coefficient, c ₂ is the social coefficient; r ₁ and r ₂ are random numbers obeying the uniformly distributed [0, 1]; p _i (t) =(p _i 1,p _i 2,...,p _i d) is the historical optimal position of the ith particle; x _i (t)=(x _i 1,x _i 2,...,x _i d) is the position of the i-th particle at the t-th generation; p _gj (t)=(p _g 1, p _g 2, . . . , p _g d) is the optimal position of the group history.

Advantageously, the method operates based on a CUDA programming model comprising: a thread, a thread block, and a thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory All threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.

Preferably, the historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function.

Preferably, the speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.

Preferably, the determination of the operational efficiency and convergence of the particle swarm optimization method comprises determining one or more of the following seven functions:

(1) Sphere function

(2) Ackley function

(3) Schwefel function

(4) Levy function

(5) Griewank function

(6) Rastrigin function

(7) Rosenbrock function

Where D represents the dimension of the particle and the range of x _i represents the search space.

Preferably, if the current global optimal particle does not satisfy the iterative condition, the step of updating the velocity and position of the particle according to the acquired historical optimal particle and the global optimal particle is entered.

The invention also discloses a particle swarm optimization system based on operation of a mobile terminal GPU, the system comprising:

An initialization module for initializing the position and velocity of particles in the particle group on the CPU side;

a sending module for sending the position and speed of the particle to the GPU;

The first update module updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;

a second update module that updates global optimal particles according to the speed and position of the updated particles;

The determining module is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.

In the prior art, the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the GPU The performance-limited algorithm structure is not maximized. In this embodiment, the particle swarm optimization algorithm is used to improve the operation speed and operation efficiency of the GPU, thereby greatly improving the performance of the GPU, thereby greatly improving the GPU as a cooperative processor to assist the CPU. The ability to perform calculations improves the overall computing power and computational efficiency. In particular, in mobile terminals, because of the limited size and heat dissipation, the CPU used in mobile terminals is generally less efficient, and the method is applied. In mobile devices such as mobile phones, tablets and other electronic devices, the computing efficiency of the mobile terminal can be significantly improved, the computing speed and efficiency of the mobile terminal can be greatly improved, and the operating efficiency of the CPU platform of the conventional mobile terminal can be solved and the convergence speed is slow. The problem.

DRAWINGS

1 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to Embodiment 1 of the present invention;

2 is a flowchart of a particle swarm optimization method based on a GPU operation of a mobile terminal according to another embodiment of the present invention;

3 is a schematic diagram of a star topology and a ring topology according to an embodiment of the present invention;

4 is a schematic diagram of a CUDA programming model according to an embodiment of the present invention;

FIG. 5 is a flowchart of optimization of a PSO method according to an embodiment of the present invention; FIG.

6 is a convergence graph of a function f1 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention;

7 is a convergence graph of a function f2 on a CPU and a GPU+CPU of a star topology and a ring topology PSO method according to an embodiment of the present invention;

8 is a schematic diagram of a particle swarm optimization system based on a GPU operation of a mobile terminal according to Embodiment 2 of the present invention;

FIG. 9 is a schematic diagram of a particle swarm optimization system based on a mobile terminal GPU according to another embodiment of the present invention.

Wherein: 201, initialization module, 202, sending module, 203, first update module, 204, second update module, 205, decision module, 206, output module.

detailed description

Before discussing the exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as a process or method depicted as a flowchart. Although the flowcharts describe various operations as a sequential process, many of the operations can be implemented in parallel, concurrently or concurrently. In addition, the order of operations can be rearranged. The process may be terminated when its operation is completed, but may also have additional steps not included in the figures. The processing may correspond to methods, functions, procedures, subroutines, subroutines, and the like.

By "computer device", also referred to as "computer" in the context, is meant an intelligent electronic device that can perform predetermined processing, such as numerical calculations and/or logical calculations, by running a predetermined program or instruction, which can include a processor and The memory is executed by the processor to execute a predetermined process pre-stored in the memory to execute a predetermined process, or is executed by hardware such as an ASIC, an FPGA, a DSP, or the like, or a combination of the two. Computer devices include, but are not limited to, servers, personal computers, notebook computers, tablets, smart phones, and the like.

The computer device includes a user device and a network device. The user equipment or client includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing based A cloud consisting of a large number of computers or network servers, where cloud computing is a type of distributed computing, a super-virtual computer composed of a group of loosely coupled computers. Wherein, the computer device can be operated separately to implement the present invention, and can also access the network and implement the present invention by interacting with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the client, the network device, the network, and the like are merely examples, and other existing or future computer devices or networks may be applicable to the present invention, and should also be included in the scope of the present invention. Within, and by reference.

The methods discussed below, some of which are illustrated by flowcharts, can be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to carry out the necessary tasks can be stored in a machine or computer readable medium, such as a storage medium. The processor(s) can perform the necessary tasks.

The specific structural and functional details disclosed are merely representative and are for the purpose of describing exemplary embodiments of the invention. The present invention may, however, be embodied in many alternative forms and should not be construed as being limited only to the embodiments set forth herein.

It should be understood that although the terms "first," "second," etc. may be used herein to describe the various elements, these elements should not be limited by these terms. These terms are used only to distinguish one unit from another. For example, without departing from the example In the case of the scope of the embodiment, the first unit may be referred to as a second unit, and similarly the second unit may be referred to as a first unit. The term "and/or" used herein includes any and all combinations of one or more of the associated listed items.

It will be understood that when a unit is referred to as "connected" or "coupled" to another unit, it can be directly connected or coupled to the other unit, or an intermediate unit can be present. In contrast, when a unit is referred to as being "directly connected" or "directly coupled" to another unit, there is no intermediate unit. Other words used to describe the relationship between the units should be interpreted in a similar manner (eg "between" and "directly between" and "adjacent to" Than "directly adjacent to", etc.).

The terminology used herein is for the purpose of describing the particular embodiments, The singular forms "a", "an", It is also to be understood that the terms "comprising" and """ Other features, integers, steps, operations, units, components, and/or combinations thereof.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur in a different order than that illustrated in the drawings. For example, two figures shown in succession may in fact be executed substantially concurrently or sometimes in the reverse order, depending on the function/acts involved.

The invention will now be further described with reference to the drawings and preferred embodiments.

The method of the embodiment is mainly implemented by a mobile terminal or an electronic device; the mobile terminal or the electronic device, such as a mobile phone or a tablet, may also have a network function, and may communicate with other terminals, but is limited to the above.

It should be noted that the mobile terminal or the electronic device is only an example, and other existing or future network devices and clients may be applicable to the present invention, and are also included in the scope of the present invention and are included by reference. herein. Further, the embodiments of the present invention are merely illustrative, and the embodiments and the embodiments may be freely combined, and a plurality of embodiments may be combined as long as the effects of the present invention can be achieved.

Embodiment 1

As shown in FIG. 1 , in this embodiment, a particle swarm optimization method based on operation of a GPU of a mobile terminal is disclosed, and the method includes:

S101. Initialize the position and velocity of the particles in the particle group on the CPU side;

S102. Send the position and speed of the particle to the GPU.

S103. Update the velocity and position of the particle according to the obtained historical optimal particle and the global optimal particle;

S104. Update the global optimal particle according to the updated particle speed and position;

S105. If the current global optimal particle satisfies the iteration condition, the GPU sends the optimal particle to the CPU.

In the prior art, the GPU acts as a cooperative processor to enhance the computing power of the CPU, but the performance-limited algorithm structure of the GPU is not maximized. In this embodiment, the particle swarm optimization algorithm is used to improve the computing speed and operation of the GPU. Efficiency, greatly improving the performance of the GPU, thereby greatly improving the ability of the GPU to assist the CPU in computing as a cooperative processor, improving the overall computing power and computing efficiency, in particular, in mobile terminals, because of limitations in volume and heat dissipation, etc. Generally, the CPU used in the mobile terminal has low computational efficiency, and the method is applied to mobile devices such as mobile phones and tablets, which can significantly improve the computing efficiency of the mobile terminal and make the computing speed and efficiency of the mobile terminal large. The increase in amplitude can also solve the problem of low operating efficiency and slow convergence speed of the traditional mobile terminal CPU platform.

In this embodiment, according to an example, as shown in FIG. 2, if the current global optimal particle does not satisfy the iterative condition, then the process proceeds to step S103, and the particle is updated according to the acquired historical optimal particle and the global optimal particle. Speed and position, then proceed to steps S104, S105 until the current global optimal particle satisfies the iteration condition. The iteration condition may be that the speed of the particle reaches a preset threshold, or the position reaches a preset condition, or the speed and position reach a preset condition, and the like. In addition, in this embodiment, after the optimal particle is sent to the CPU end on the GPU side, the CPU end can output the result. Specifically described as follows, a particle swarm optimization method based on GPU operation of a mobile terminal, the method comprising:

S102. Send the position and speed of the particle to the GPU.

S105. Determine whether the current global optimal particle satisfies an iteration condition. If yes, the GPU sends the optimal particle to the CPU; otherwise, the process proceeds to step S103 to perform the next iteration;

S106. Output the optimal particles sent to the CPU.

According to another example, as shown in FIG. 3, in order to improve the performance of the PSO (Particle Swarm Optimization) method, the present invention selects one or both of a star topology and a ring topology. The left side of the figure is the star topology and the right side is the ring topology. In the star PSO method, all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself. To share the best solution for the entire particle swarm. In the ring topology, each particle in the particle group in the ring topology communicates with its n(n≥1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it. Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.

The existing PSO method is easy to fall into the disadvantage of local extremum, in order to improve the PSO method, including parameter adjustment, change the search network space, and mix other algorithms. The performance of the PSO method has a strong dependence on the network structure. Therefore, the change of the topology plays an important role in the convergence of the PSO algorithm. In this embodiment, the PSO method of the ring topology and the star topology is improved, combined with PSO. The parallelism of the algorithm solves the technical problem of low efficiency and slow convergence speed of the traditional mobile terminal CPU platform. For the star topology PSO algorithm, the GPU significantly accelerates the convergence speed of the particle group.

According to another example, as shown in FIG. 4, the particle swarm optimization algorithm in this embodiment is based on a CUDA (Compute Unified Device Architecture) programming model, and the CUDA programming model includes: a thread, a thread block, and Thread grid; each thread has a unique thread number and a private register; each thread block has a shared memory, all threads in each thread block can access shared memory, and at least two thread blocks form a thread grid. The historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function. The speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.

In the CUDA programming model, the CPU acts as the host, called the Host, and the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler. A complete CUDA programming model. Kernel parallel processing is based on the Single Instruction Multiple Thread (SIMD) computing model. CUDA uses three vectors to organize threads into three different levels: threads, Thread blocks and grids. The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.

In the PSO algorithm optimization process, each particle exchanges information only when updating the global optimum, and the other steps are independent of each other. When the history is optimal on CUDA, one thread corresponds to one particle, and each thread calls the preset function at the same time; when updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.

In the CUDA programming model, the host side initializes the position and velocity of the particles, copies the data from the CPU to the GPU, iterates over the device, and finally copies the optimal solution to the CPU output. Although GPU performance is excellent, its performance is largely limited by the algorithm structure. In the use of CUDA, data structures and access to memory have a significant impact on GPU performance. Performance optimization is mainly considered from four aspects: 1) maximum optimized parallel execution; 2) optimized memory for maximum memory bandwidth; 3) optimized instruction usage for maximum instruction throughput; 4) thread block and thread number setting.

According to another example, the speed and position update equations of the standard PSO method are as follows:

v _ij (t+1)=ωv _ij (t)+c ₁ *r ₁ *(p _ij (t)-x _ij (t))+c ₂ *r ₂ *(p _gj (t)-x _ij ( t)) (1)

x _ij (t+1)=x _ij (t)+v _ij (t+1) (2)

Where v _i (t)=(v _i 1,v _i 2,...,v _i d) is the velocity of the ith particle at the tth generation, and v _ij (t) represents the jth of the ith particle Dimensional velocity; w is inertia weight 0.2-0.9, in this embodiment, w=0.5; c ₁ is the cognitive coefficient, c ₂ is the social coefficient, often taking c ₁ =c ₂ =2; r ₁ and r ₂ are obeying a uniformly distributed random number between [0, 1]; p _i (t) = (p _i 1, p _i 2, ..., p _i d) is the historical optimal position of the i-th particle; x _i (t)=(x _i 1,x _i 2,...,x _i d) is the position of the ith particle at the tth generation; p _gj (t)=(p _g 1,p _g 2,.. , p _g d) is the optimal position of the group history, and the optimization process of the PSO method is shown in Fig. 5.

In this embodiment, the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods. At the same time, the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x _i represents the search space. This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm.

(1) Sphere function

(2) Ackley function

(3) Schwefel function

(4) Levy function

(5) Griewank function

(6) Rastrigin function

(7) Rosenbrock function

The running time of the star structure PSO algorithm on the CPU and CPU + GPU is shown in Table 1. The test time is N=1000, D=50, M=5000. The running time of the ring structure PSO algorithm on the CPU and CPU + GPU is shown in Table 2. The test time is N=1000, D=50, M=5000.

Table 1 N=1000, D=50, M=5000 time when the star structure PSO algorithm solves the functions f1~f7

Table 2 Time when the ring structure PSO algorithm solves the functions f1 to f7 when N=1000, D=50, M=5000

Comparison of computational efficiency and convergence

Table 1 and Table 2 above can see the comparison of computational efficiency. When the parameters are set consistently, compare the performance of the two algorithms. Comparing the acceleration ratios in Tables 1 and 2, you can see that the star algorithm is better than the ring algorithm. . In the running time of the CPU, the difference between the two algorithms is not obvious, and on the CPU+GPU, the efficiency of the star algorithm is significantly higher than that of the ring algorithm.

Figure 6 and Figure 7 depict the PSO algorithms of the two topologies for solving the convergence curves of f ₁ and f ₂ on the CPU and GPU respectively. Figure 6 and Figure 7 show the convergence of the convergence. The parameter values are: N= 500, D = 50, M gradually increases from 0.

The optimization process of the PSO algorithm of the ring and star topology in the Unified Computing Device Architecture (CUDA, Compute Unified Device Architecture) is compared with the computing time of the mobile phone CPU by solving several benchmark functions. Calculation results. Result of the invention It is shown that the PSO method of the two topologies is higher in computational efficiency than the CPU. For the star-structured PSO method, the GPU significantly accelerates the convergence speed of the particle swarm.

With the improvement of human material living standards, the application of computer technology is becoming more and more popular. In the process of solving the optimization problem, the optimization theory and algorithm have developed rapidly, forming a series of classical traditional optimization theories, which are widely used in scientific research. , engineering technology, molecular biology, control systems, finance, transportation, network communications, energy and many other fields.

Embodiment 2

As shown in FIG. 8 , the present embodiment discloses a particle swarm optimization system based on a GPU running on a mobile terminal, where the system is used to run the method in the first embodiment, and the system includes:

An initialization module 201, configured to initialize a position and a velocity of a particle in the particle group on the CPU side;

The sending module 202 is configured to send the position and speed of the particle to the GPU end;

The first update module 203 updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;

The second update module 204 updates the global optimal particle according to the updated particle speed and position;

The determining module 205 is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.

In this embodiment, according to an example, as shown in FIG. 9, the determining module 205 is further configured to: if the current global optimal particle does not satisfy the iterative condition, proceed to the first update module 203, according to the acquired historical optimal particle. And the global optimal particle updates the speed and position of the particle, and then proceeds to the second update module 204, the decision module 205 until the current global optimal particle satisfies the iteration condition. The iteration condition may be that the speed of the particle reaches a preset threshold, or the position reaches a preset condition, or the speed and position reach a preset condition, and the like. In this embodiment, the system shown is also The output module 206 is configured to output the optimal particles to the CPU end on the GPU side, and the CPU end can output the result. Specifically, the method is as follows: a particle swarm optimization system based on the operation of a GPU of a mobile terminal, the system is used to run the method in the first embodiment, the system includes:

The determining module 205 is configured to determine whether the current global optimal particle satisfies an iteration condition, and if yes, the GPU sends the optimal particle to the CPU; otherwise, enters the first update module 203 to perform the next iteration;

The output module 106 outputs the optimal particles to be sent to the CPU.

According to another example, in order to improve the performance of the PSO (Particle Swarm Optimization) method, the present invention selects one or two of a star topology and a ring topology, which may be combined with FIG. The left side of the figure is the star topology and the right side is the ring topology. . In the star PSO method, all the particles in the particle group are all connected, that is, in the star topology, each particle in the particle group is connected to each other, and each particle can communicate with all particles except itself. To share the best solution for the entire particle swarm. In the ring topology, each particle in the particle group in the ring topology communicates with its n(n≥1) neighbor particles, that is, in the ring topology, each particle in the particle group is adjacent to it. Particles are connected, each particle is close to the optimal position in the neighborhood to update its position, and each particle is only the optimal solution in the neighborhood where it is shared.

According to another example, as shown in FIG. 4, the particle swarm optimization algorithm in this embodiment is based on CUDA (Compute Unified Device Architecture) programming. Model, the CUDA programming model includes: threads, thread blocks, and thread grids; each thread has a unique thread number and a private register; each thread block has a shared memory, and all threads in each thread block Access to shared memory, at least two thread blocks form a thread grid. The historical optimal particle and the global optimal particle are obtained by the CUDA programming model. When the historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle, and each thread simultaneously calls a preset function. The speed and position of the updated particle are completed by the CUDA programming model. When updating the position and speed, one thread corresponds to each dimension of the particle; the data is read and processed according to the thread index.

In the CUDA programming model, the CPU acts as the host, called the Host, and the GPU acts as the coprocessor, called the Device. It processes the tasks by executing a large number of threads, each with its own independent memory and its own compiler. As shown in Figure 3, a complete CUDA programming model. Kernel parallel processing is based on the Single Instruction Multiple Thread (SIMD) computing model. CUDA uses three vectors to organize threads into three different levels: threads, threads, and grids. ). The number of threads and blocks is set by the user. In this embodiment, 16*16 thread blocks are started, and 16*16 threads are started in each block. Each thread has a unique thread number and a small but fast private register; each thread block has a shared memory, and all threads in the block can quickly access shared memory, and several blocks form a thread.

x _ij (t+1)=x _ij (t)+v _ij (t+1) (2)

Where v _i (t)=(v _i 1,v _i 2,...,v _i d) is the velocity of the ith particle at the tth generation, and v _ij (t) represents the jth of the ith particle Dimensional velocity; w is inertia weight 0.2-0.9, in this embodiment, w=0.5; c ₁ is the cognitive coefficient, c ₂ is the social coefficient, often taking c ₁ =c ₂ =2; r ₁ and r ₂ are obeying a uniformly distributed random number between [0, 1]; p _i (t) = (p _i 1, p _i 2, ..., p _i d) is the historical optimal position of the i-th particle; x _i _{(t) = (x i 1} , x i 2, ..., x i d) for the location of the i-th particle generation _{t; p gj (t) = (} p g 1, p g 2, .. , p _g d) is the optimal position of the group history, and the optimization process of the PSO method can be combined with FIG. 5 .

In this embodiment, the designed parallel PSO algorithm is tested with the following commonly used benchmark functions to analyze and calculate, which is convenient for comparing the running speed and operating efficiency of various methods. At the same time, the operating efficiency and convergence of the particle swarm optimization method The determination of sex can be determined using one or more of the following seven functions. Where D represents the dimension of the particle, and the range of x _i represents the search space. This embodiment mainly compares the computational efficiency and convergence with the serial PSO algorithm. The specific functions are as follows:

(1) Sphere function

(2) Ackley function

(3) Schwefel function

(4) Levy function

(5) Griewank function

(6) Rastrigin function

(7) Rosenbrock function

Comparison of computational efficiency and convergence

Together with FIG. 6 and 7 depict the PSO algorithm for solving two topologies are f ₁ and f ₂ in the convergence curve CPU and the GPU, Figures 6 and 7 it can be seen the comparative convergence of the parameter values for the case where: N=500, D=50, and M gradually increases from 0.

The optimization process of the PSO algorithm of the ring and star topology in the Unified Computing Device Architecture (CUDA, Compute Unified Device Architecture) is compared with the computing time of the mobile phone CPU by solving several benchmark functions. Calculation results. The results show that the PSO method of the two topologies is higher in computational efficiency than the CPU. For the star-structured PSO method, the GPU significantly accelerates the convergence speed of the particle swarm.

The above is a further detailed description of the present invention in connection with the specific preferred embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Claims

A particle swarm optimization method based on GPU operation of a mobile terminal, characterized in that the method comprises:

Initializing the position and velocity of the particles in the particle group on the CPU side;

Send the position and speed of the particle to the GPU;

Updating the velocity and position of the particle based on the obtained historical optimal particle and the global optimal particle;

Updating global optimal particles based on the speed and position of the updated particles;

If the current global optimal particle satisfies the iterative condition, the GPU sends the optimal particle to the CPU.
A method according to claim 1, wherein the computational model of the population of particles is a star topology in which each of the population of particles is connected to each other.
A method according to claim 1, wherein the calculation model of the particle group is a ring topology in which each particle in the particle group is connected to its neighboring particles. .
A method according to claim 1 wherein the velocity and position update equations of the particles in the population of particles are as follows:

v ij (t+1)=ωv ij (t)+c 1 *r 1 *(p ij (t)-x ij (t))+c 2 *r 2 *(p gj (t)-x ij ( t))

x ij (t+1)=x ij (t)+v ij (t+1)

Where v i (t)=(v i 1,v i 2,...,v i d) is the velocity of the ith particle at the tth generation, and v ij (t) represents the jth of the ith particle Dimension velocity; w is the inertia weight 0.2-0.9; c 1 is the cognitive coefficient, c 2 is the social coefficient; r 1 and r 2 are random numbers obeying the uniformly distributed [0, 1]; p i (t) =(p i 1,p i 2,...,p i d) is the historical optimal position of the ith particle; x i (t)=(x i 1,x i 2,...,x i d) is the position of the i-th particle at the t-th generation; p gj (t)=(p g 1, p g 2, . . . , p g d) is the optimal position of the group history.
A method according to claim 1, wherein said method operates based on a CUDA programming model, said CUDA programming model comprising: a thread, a thread block, and a thread grid; each thread has a unique thread number and A private register; each thread block has a shared memory, all threads in each thread block can access shared memory, and at least two thread blocks form a thread grid.
The method according to claim 5, wherein the historical optimal particle and the global optimal particle are obtained by the CUDA programming model, and when a historical optimal is obtained on the CUDA programming model, one thread corresponds to one particle. Each thread calls the preset function at the same time.
The method according to claim 5, wherein the speed and position of the updated particles are completed by the CUDA programming model, and when updating the position and speed, one thread corresponds to each dimension of the particle; To read the data and process it.
A method according to claim 1, wherein the determination of the operational efficiency and convergence of the particle swarm optimization method comprises determining one or more of the following seven functions: (1) Sphere function

(2) Ackley function

(3) Schwefel function

(4) Levy function

(5) Griewank function

(6) Rastrigin function

(7) Rosenbrock function

Where D represents the dimension of the particle and the range of x i represents the search space.
A method according to claim 1, wherein if the current global optimal particle does not satisfy the iterative condition, then the speed and position of the updated particle based on the acquired historical optimal particle and the global optimal particle are entered. A step of.
A particle swarm optimization system based on GPU operation of a mobile terminal, characterized in that the system comprises:

An initialization module for initializing the position and velocity of particles in the particle group on the CPU side;

a sending module for sending the position and speed of the particle to the GPU;

The first update module updates the speed and position of the particle according to the acquired historical optimal particle and the global optimal particle;

a second update module that updates global optimal particles according to the speed and position of the updated particles;

The determining module is configured to send the optimal particle to the CPU end if the current global optimal particle satisfies the iterative condition.