CN106681149B

CN106681149B - It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning

Info

Publication number: CN106681149B
Application number: CN201710018201.8A
Authority: CN
Inventors: 郑能干; 金梦洁; 马倩; 王雪菲; 江毅
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-01-11
Filing date: 2017-01-11
Publication date: 2019-11-19
Anticipated expiration: 2037-01-11
Also published as: CN106681149A

Abstract

Abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning the invention discloses a kind of, belongs to insect robot technical field.The control method, comprising: (1) target pattern swung according to the bumblebee abdomen of setting establishes Q table using nitrification enhancement, and the Q table has recorded the accumulation discount reward value that each movement obtains under every class pendulum abdomen mode；(2) the pendulum abdomen mode for monitoring bumblebee current time, compares Q table, according to reward value maximum principle, makes action control bumblebee pendulum abdomen in real time.The present invention establishes the interaction mechanism between bumblebee pendulum abdomen behavior and visual stimulator using algorithm, does not need to carry out a large amount of Bioexperiment to obtain the relationship between stimulation-behavior response.The motor behavior of visual stimulus induction bumblebee of the present invention, and using nitrification enhancement as control algolithm, the control system of a closed loop is effectively formed, the abdomen that can be accurately controlled bumblebee swings behavior.

Description

It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning

Technical field

The present invention relates to insect robot technical fields, and in particular to a kind of bumblebee based on virtual reality and intensified learning Abdomen control method is put by robot.

Background technique

Insect robot is using insect as carrier, in conjunction with a kind of novel animal robot system of control module formation.This Class special machine people is by carrier locomitivity itself, without the concern for complicated machinery and dynamical system, in scientific research, state The fields such as anti-security, industrial production have wide practical use.

How to control insect flying is primarily solved the problems, such as in insect robot research.Insect robot generally comprises row For stimulating module, insect lipids acquisition and three analysis module, control instruction decision parts.

Currently, the behavior stimulation mode of insect robot is based on electro photoluminescence, and insect is adjusted not by manpower work With parameters of electrical stimulation corresponding under motion state.Electrical stimulation method is to be implanted to silicon or fibril electrode by animal surgery The muscle or nervous system of insect generate the stimulation site that square-wave pulse is input to insect using microcontroller, reach control The purpose of the behaviors such as insect deflection.Although the insect robot of electro photoluminescence formula achieves significant progress, still deposit In some problem:

First, the muscle or nervous system of insect can be caused to damage in implantable stimulation electrode process, caused subsequent Control performance decline；Second, prolonged electro photoluminescence will cause insect fatigue, decline to the response sensitivity of electro photoluminescence；The Three, since insect figure is smaller, the muscle of most of insect and the volume of brain are about 1 cubic millimeter, then an electro photoluminescence It may relate to multiple feeling-moving loops, even entire muscle or brain, to cause uncertain behavior reaction；The Four, to carry out manual adjustment stimulation parameter, needs to carry out a large amount of Bioexperiment in advance and obtain stimulation-behavior response pass System, needs a large amount of Bioexperiment in this way, will consume a large amount of time, and behavior of the different insects individual to identical stimulation parameter There are still biggish deviations for response, therefore there are inefficiency for the mode of manual adjustment stimulation parameter, control the problems such as not accurate；

The control mode for developing non-built-in mode can be to avoid above-mentioned the problem of causing damage to insect body.Numerous studies table Bright, light stream plays key effect in many important behaviors of insect, for example, flying speed control, height control, depth survey Amount, flying distance measurement etc., to realize that the insect robot of optofluidic standard provides theoretical basis.

Mode in the research of insect visual and electro physiology for providing visual stimulus has: printing streaky paper, throwing Shadow instrument, display and LED dot matrix etc..Most common visual stimulator is made of LED dot matrix, such as application No. is 201410422437.4 Chinese patent application document provides a kind of bumblebee flight controller, not only to meet the compound eye of bumblebee Characteristic, and want that customized light stream stimulation can be generated.But how to be stimulated using light stream, and in nitrification enhancement frame Under, it control effectively to bumblebee flight and does not have been reported that but.

Summary of the invention

Abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning the present invention provides a kind of, using strong The abdomen for changing learning algorithm control bumblebee, which is swung, from an original state reaches a dbjective state, overcome in the prior art due to Artificial adjusting stimulation parameter control is not accurate, and needs the problem of carrying out a large amount of Bioexperiment.

It is a kind of that abdomen control method is put based on the bumblebee robot of virtual reality and intensified learning, comprising:

(1) target pattern swung according to the bumblebee abdomen of setting establishes Q table, the Q table note using nitrification enhancement The accumulation discount reward value that each movement obtains under every class pendulum abdomen mode is recorded；

(2) the pendulum abdomen mode for monitoring bumblebee current time, compares Q table, according to reward value maximum principle, makes movement in real time It controls bumblebee and puts abdomen；

The movement is by the visual stimulus of the different parameters formed around the light beam of bumblebee flowing.

The present invention models the insect robot control problem with nitrification enhancement.Virtual reality system provides Optic flow information is directly inputted to the compound eye of bumblebee, induces the pendulum abdomen behavior of bumblebee；The abdomen of high-definition camera real-time capture bumblebee Motor behavior, and by the abdomen weave mode of computer picture algorithm acquisition bumblebee, as ambient condition；Algorithm can change light The parameter of stream information, LED screen can change its kinematic parameter, then input to bumblebee, so as to form a closed-loop control system.

The reward value maximum principle, that is, Greedy strategy.

The movement is generated by the virtual reality system that several LED displays form, and the LED display is by being based on The LED control system of CAN bus controls.

Specifically, a kind of CAN bus based LED display, the Optic flow information for providing bumblebee input；The LED Display screen is 32 × 8 structures, and 32 pins control column of single-chip microcontroller show that remaining 8 pins control transverse direction LED is shown to LED. By in PC control terminal, CAN bus host node and several slave nodes comprising LED drive module form.The PC control terminal packet Input unit and first serial communication service unit are included, the data packet for containing image information and control command is sent to CAN The host node of bus, the host node in the CAN bus includes second serial communication service unit, receives the number that client is sent According to；CAN bus communication service unit uploads to CAN bus after the data encoding received；The slave node includes the Two CAN bus communication service units receive data from CAN bus, and are transferred to corresponding LED drive module.The LED driving Single-chip microcontroller model MSP430F149 used in module, LED drive module drive LED light, to reach display image and control The function of image motion.

The present invention reflects the flight behavior of bumblebee, numerous studies discovery, insect using record bumblebee abdomen weave mode Abdomen insect flying control in play a key role because the position of abdomen determine wing flutter its wings up and down generation power side To.

The acquisition and analysis of above-mentioned bumblebee pendulum abdomen behavior, are made of the camera of a high definition, particle filter algorithm, can To obtain the deflection angle time series of bumblebee abdomen.The high-definition camera model ANC HD1080P, acquisition frame rate 30 Frame/second；Camera is connected by USB line with computer, by image transmitting to computer, is used for subsequent behavioural analysis.The grain Son filtering be it is a kind of widely used, for the classic algorithm of object tracking, before the experiments, choose a certain fixed point be used as with Track point chooses one piece of bumblebee abdomen rectangular area in image to use as tracking target, the color histogram in the region as feature In calculating next frame target position, the abdomen deflection angle of bumblebee is thus calculated.After obtaining angle, sliding window is used Method calculate bumblebee pendulum abdomen mode.

Preferably, the pendulum abdomen mode by the maximum value, minimum value of bumblebee abdomen deflection angle in single sliding window, Average absolute value, Wave crest and wave trough number, the 6 dimensional vectors characterization constituted greater than average point number and variance.

Preferably, the width of the sliding window is 200~400ms.More preferably, the width of sliding window is 200ms。

The present invention establishes the interaction mechanism between bumblebee and visual stimulator, intensified learning using nitrification enhancement Process is one constantly with the process of environmental interaction, available a kind of from environment to the optimal of movement mapping by interaction Strategy.Intensified learning includes environment, movement and three major parts of reward function.

The environment division is that the abdomen of bumblebee puts abdomen mode, is indicated by 6 above-mentioned dimensional vectors；

The movement is that the parameter to external visual stimulus that nitrification enhancement can be taken regulates and controls, preferably, the Q Table includes 5~20 movements.The visual stimulus parameter is the angular speed of image rotation, specifically, there is 10 optional angle speed Degree, is respectively as follows: [2.05,2.89,3.97,4.88,6.20,7.21,8.72,11.02,14.5,22.16 ,] rad/s；In addition, view Feel that stimulation parameter also may include brightness, the direction of motion, width of image etc..

The reward function is mainly put the Euclidean distance between abdomen mode by the pendulum abdomen mode and target of current bumblebee and is determined, Note dbjective state is vector x_g, corresponding discrete state is k_g, the bumblebee pendulum abdomen mode of current time t is x_t, corresponding to discrete state is k_t, the pendulum abdomen mode of last moment bumblebee is denoted as x_t-1, dist₁For x_tWith x_gBetween distance, dist₂For x_t-1With x_gBetween distance, So reward function can be denoted as:

R=0.2, if (dist₁> dist₂and k_t≠k_g)；

R=+1, if (dist₁> dist₂and k_t=k_g)；

R=-1, otherwise.

Reward function is to generate the quality of some movement to intensified learning to make a kind of evaluation, the purpose of intensified learning be so that The reward of accumulation maximizes.

Preferably, the method for building up of the Q table are as follows:

(1) Q table is initialized, learning rate α, discount factor γ put abdomen pattern clustering center；

(2) the pendulum abdomen mode x of moment t bumblebee is obtained_t；

(3) sequence K-means discretization, adjustment pendulum abdomen mode class center are utilized；

(4) a is acted according to epsilon-greedy selection_t；

(5) execution acts a_t, obtain reward value r_t+1With the pendulum abdomen mode x of subsequent time bumblebee_t+1；

(6) step (3) and (4) are repeated, selection acts a_t+1；

(7) Q table is updated according to the following formula；

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]；

(8) execution is started the cycle over from step (2) again, until Q table is restrained.

By the interaction of nitrification enhancement and environment, obtain one from can optimal policy with environment to movement mapping, The parameter that visual stimulus next time can be determined according to the pendulum abdomen mode of current bumblebee, forms the control system an of closed loop System.In actual control process, pendulum abdomen mode will fast, accurately level off to target-like under nitrification enhancement control State.

It is that the present invention has the utility model has the advantages that

(1) present invention is directed to the pathways for vision of bumblebee, is stimulated using light stream, eliminates because being caused using implanted mode The injury to bumblebee itself.

(2) interaction mechanism between bumblebee pendulum abdomen behavior and visual stimulator is established using nitrification enhancement, no Need to carry out a large amount of Bioexperiment to obtain the relationship between stimulation-behavior response.

(3) motor behavior of visual stimulus induction bumblebee of the present invention, and using nitrification enhancement as control algolithm, effectively Ground forms the control system of a closed loop, and the abdomen that can be accurately controlled bumblebee swings behavior.

Detailed description of the invention

Fig. 1 is the bumblebee robot closed-loop control system control flow chart based on virtual reality and intensified learning of invention.

Fig. 2 is the bumblebee robot close-loop control part structural representation based on virtual reality and intensified learning of invention Figure.

Fig. 3 is CAN bus based LED display module diagram.

Fig. 4 is the behavior acquisition of bumblebee and analytic process and result schematic diagram.

Fig. 5 is nitrification enhancement interactive process schematic diagram.

Fig. 6 is the process extracted bumblebee abdomen based on sliding window and swing feature.

Fig. 7 is the learning process of nitrification enhancement.

Fig. 8 be a bumblebee respectively nitrification enhancement and randomized policy control under dimension SSC controlling curve.

Fig. 9 is the bumblebee dimension SSC and target dimension SSC under nitrification enhancement and randomized policy control respectively Aberration curve.

Figure 10 is the Q table distribution map that algorithm learns.

Specific embodiment

Below with reference to embodiment, the invention will be further described.

Embodiment 1

As shown in Figure 1, wherein Fig. 1 (A) is the bumblebee robot of the invention based on virtual reality and nitrification enhancement The control loop of closed-loop control system, control loop are that the compound eye of bumblebee first has received external light stream stimulation, are then produced Abdomen swing phase has been given birth to answer；Behavior is swung in order to capture the abdomen of bumblebee, the present invention uses particle filter algorithm tracking bumblebee Abdomen tip, and its swing angle is calculated, feature extraction is then carried out according to sliding window；The bumblebee abdomen pendulum extracted Dynamic feature, as the ambient condition of nitrification enhancement, algorithm can choose the stimulation parameter at next moment, thus form The learning process of closed loop.Fig. 1 (B) is then the bumblebee robot closed loop control of the invention based on virtual reality and nitrification enhancement The physical composition of system processed, corresponding diagram 1 (A), system include a CAN bus based LED display, for providing light stream thorn Swash；One high-definition camera for acquiring bumblebee pendulum abdomen behavior and one transport for image to be analyzed and handled Go the computer of nitrification enhancement.

As shown in Fig. 2, a kind of bumblebee robot control method experiment porch based on light stream stimulation and intensified learning, including Vibration isolators 1,2, the LED display 3 being placed on vibration isolators, LED screen is used to generate specific visual stimulus, micro- for fixing Support bracket fastened pillar 4 is grasped, for adjusting the fixed bracket 5 of micro- behaviour of the fixed position of bumblebee, swings row for shooting bumblebee abdomen For high-definition camera 6, gimbals 7, for the extension of follow-up system, in addition, further comprised in design high speed camera 8 with And the bracket 9 for fixing high speed camera.Whole system is placed on vibration isolators other than PC, prevents desktop shake band The influence to experiment.

As shown in figure 3, a kind of CAN bus based LED display, the Optic flow information for providing bumblebee is inputted；It is described LED display is 32 × 8 structures, and 32 pins control column of single-chip microcontroller show that remaining 8 pins control transverse direction LED is aobvious to LED Show.By in PC control terminal, CAN bus host node and several slave nodes comprising LED drive module form.The PC control terminal Including input unit and first serial communication service unit, the data packet for containing image information and control command is sent to The host node of CAN bus, the host node in the CAN bus includes second serial communication service unit, receives client and sends Data；CAN bus communication service unit uploads to CAN bus after the data encoding received；The slave node packet The second CAN bus communication service unit is included, receives data from CAN bus, and be transferred to corresponding LED drive module.The LED Single-chip microcontroller model MSP430F149 used in drive module, LED drive module drive LED light, thus reach display image and Control the function of image motion.

It uses the single bright fringes as bumblebee robotic vision information input in the present invention, thus bumblebee is induced to transport Dynamic behavior.

As shown in figure 4, analyzing schematic diagram for the acquisition of bumblebee flight behavior in the present invention and data, the abdomen of bumblebee is swung Behavior is captured by the high-definition camera at the back side, and the model of camera is ANC HD1080P, the pendulum abdomen frequency highest order of bumblebee 7Hz or so, therefore the frame per second that camera is arranged is 30 frames/second.(x in figure₀,y₀) it is the fixed point of bumblebee, bumblebee is fixed on On a piece thin iron staff of bending, the bending angle of iron staff is 45 degree, diameter 1mm, and fixed point is bumblebee back.It is selected by operator The abdominal region region for taking needs to track remembers that the center in region is (x, y), according to formula available bumblebee pendulum abdomen angle beta= arctan[(x-x₀)/(y-y₀)]。

As shown in figure 5, for the interactive process of the nitrification enhancement in the present invention, in general, intensified learning maintains one From environment to the movement value function Q (x of movement mapping_t,a_t), it indicates in state x_tUnder, take movement a_tThe accumulation prize that can be obtained It encourages, each time, after obtaining this value function, generally selects corresponding states x_tUnder, the movement a of maximum value can be obtained_t.This The update of value function is constantly obtained with environmental interaction by intensified learning.

When t moment, the pendulum abdomen mode of bumblebee is x_t, and state discrete is carried out according to sequence K-means algorithm, then select Select a movement a_t, that is, current light stream stimulation parameter is had adjusted, meanwhile, after the compound eye of bumblebee receives light stream stimulation, meeting It generates corresponding abdomen and swings behavior, the pendulum abdomen behavioral parameters of bumblebee are the deflection angle obtained by particle filter algorithm, and such as Shown in Fig. 6, we are provided with the sliding window that a size is 200ms here, take wherein all deflection angle data, calculate Include maximum value, minimum value, average absolute value, Wave crest and wave trough number, is greater than average point number and variance this 6 feature conducts The pendulum abdomen mode of bumblebee, this 6 dimensional feature, that is, state shown in figure x_t+1, meanwhile, it is available by calculating reward function Currently available instant prize r_t+1, the more new formula of nitrification enhancement are as follows:

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]。

It is wherein the learning rate of algorithm for α, γ is discount factor.Specific algorithm process is as shown in Figure 7.

R=0.2, if (dist₁> dist₂and k_t≠k_g)；

R=+1, if (dist₁> dist₂and k_t=k_g)；

R=-1, otherwise.

Using nitrification enhancement, without knowing the corresponding pass between light stream stimulation parameter and bumblebee pendulum abdomen mode in advance System, by continuous interactive learning, nitrification enhancement may learn this from ambient condition (bumblebee put abdomen mode) to moving Make the mapping relations of (light stream stimulation parameter), and as subsequent closed loop Real-Time Control Strategy, avoids artificially many experiments And carry out the process of data analysis.

In order to verify the validity of the controlling mechanism based on nitrification enhancement, we are also carried out with common random algorithm Compare, data that Fig. 8-10 is our collected bumblebees and its Algorithm Control Experiment are as a result, Fig. 8 and 9 is the control of bumblebee Koji-making line chart, embodiment be current state dimension SSC to the dimension SSC of dbjective state xg approach the case where (SSC refers to sliding The number of Wave crest and wave trough in window, has reacted the frequency information of abdominal exercise, is that description bumblebee abdomen swings most important spy Sign).

Wherein the line with triangle is the SSC value of target, is the variation under nitrification enhancement control with circular line Curve, band × line be then random algorithm control change curve, it will thus be seen that

1, the control result of nitrification enhancement makes each step of SSC all increasingly level off to dbjective state, each step shape The deviation (corresponding is the Euclidean distance of Figure 10) of state and dbjective state is monotone decreasing, and final deviation tends to 0.But it is random The controlling curve of algorithm be in control process it is random, deviation be also it is random, there is no specific control trend.

2, under the control of nitrification enhancement, bumblebee has tended to target value, and final holding within most 8 steps Stabilization, however random algorithm does not make the process of control tend to target value, the state close to target value is without regularity.

3. the curve under nitrification enhancement control is more smooth, the curve shake of random algorithm control is very big.

From the point of view of above result and analysis, it is based on nitrification enhancement, we can be very good to be controlled with light stream stimulation The abdomen of bumblebee processed is swung, and is laid a good foundation for subsequent light stream formula bumblebee robot.

Claims

1. a kind of put abdomen control method based on the bumblebee robot of virtual reality and intensified learning, comprising:

(1) target pattern swung according to the bumblebee abdomen of setting, establishes Q table using nitrification enhancement, the Q table has recorded The accumulation discount reward value that each movement obtains under every class pendulum abdomen mode；

(2) the pendulum abdomen mode for monitoring bumblebee current time, compares Q table, according to reward value maximum principle, makes action control in real time Bumblebee puts abdomen；

Intensified learning includes environment, movement and reward function, and the environment division is that the abdomen of bumblebee puts abdomen mode, the movement It is that the parameter to external visual stimulus that nitrification enhancement can be taken regulates and controls, by the difference formed around the light beam of bumblebee flowing The visual stimulus of parameter, the reward function are determined by the Euclidean distance between pendulum abdomen mode and target the pendulum abdomen mode of current bumblebee It is fixed；

Bumblebee puts the acquisition and analysis of abdomen behavior, is made of the camera of a high definition, particle filter algorithm, the pendulum abdomen mode By the maximum value, minimum value, average absolute value of bumblebee abdomen deflection angle in single sliding window, Wave crest and wave trough number, be greater than it is equal The 6 dimensional vectors characterization that value point number and variance are constituted.

2. abdomen control method is put by bumblebee robot as described in claim 1, which is characterized in that the width of the sliding window is 200~400ms.

3. abdomen control method is put by bumblebee robot as described in claim 1, which is characterized in that the movement is by several LED The virtual reality system of display screen composition generates, and the LED display is controlled by CAN bus based LED control system.

4. abdomen control method is put by bumblebee robot as described in claim 1, which is characterized in that the Q table includes 5~20 dynamic Make.

5. abdomen control method is put by bumblebee robot as described in claim 1, which is characterized in that the method for building up of the Q table are as follows:

(2) the pendulum abdomen mode x of moment t bumblebee is obtained_t；

(3) cluster discretization is carried out to state using sequence K-means, and adjusts pendulum abdomen mode class center；

(4) a is acted according to epsilon-greedy selection_t；

(6) step (3) and (4) are repeated, selection acts a_t+1；

(7) Q table is updated according to the following formula；

Q(x_t,a_t)←Q(x_t,a_t)+α[r_t+1+γQ(x_t+1,a_t+1)-Q(x_t,a_t)]；