CN111515932A - Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning - Google Patents

Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning Download PDF

Info

Publication number
CN111515932A
CN111515932A CN202010328714.0A CN202010328714A CN111515932A CN 111515932 A CN111515932 A CN 111515932A CN 202010328714 A CN202010328714 A CN 202010328714A CN 111515932 A CN111515932 A CN 111515932A
Authority
CN
China
Prior art keywords
mechanical arm
robot
vector
articles
potential field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010328714.0A
Other languages
Chinese (zh)
Inventor
刘华山
应丰糠
江荣鑫
李祥健
程新
夏玮
蔡明军
陈荣川
梁健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN202010328714.0A priority Critical patent/CN111515932A/en
Publication of CN111515932A publication Critical patent/CN111515932A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/0093Programme-controlled manipulators co-operating with conveyor means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/1605Simulation of manipulator lay-out, design, modelling of manipulator
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed

Abstract

The invention discloses a man-machine co-fusion assembly line implementation method based on an artificial potential field and reinforcement learning, which adopts an A-DPPO algorithm based on reinforcement learning to plan a robot motion track, in actual operation, the robot executes actions according to a preset optimal track and has collision avoidance capability based on the artificial potential field, meanwhile, a visual sensor is adopted to acquire position information of a human, the behavior intention of the human is inferred by a metabolism GM (1,1) model after particle filtering, whether the influence of repulsive force brought by the fact that the human approaches a mechanical arm is considered or not is determined according to different intentions, so that whether the mechanical arm takes avoidance actions or not is determined, and the man-machine-environment co-fusion is realized.

Description

Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
Technical Field
The invention relates to a man-machine co-fusion assembly line implementation method based on an artificial potential field and reinforcement learning, and belongs to the technical field of man-machine co-fusion of industrial assembly lines.
Background
With the continuous development and maturity of the robot technology, the application of the robot technology in the industrial field is more and more extensive, compared with human labor, the industrial robot cannot sense fatigue and lose interest in the process of complicated assembly line work, and the accuracy and the working efficiency of the assembly line work are greatly improved, so that the development of enterprise and social productivity is effectively promoted. Therefore, the development of mechanical automation plants is rapidly progressing around the world and in various industrial fields.
However, with the successive advent of high and new technologies such as artificial intelligence, people are not limited to the robot to perform a single mechanical operation, but want to give the robot certain degree of "intelligence", so that the robot can perform operations without human instructions, realize the perception of the environment, and further make a decision through autonomous learning, thereby enabling the robot to adapt to a complex unstructured environment and continuously improving the working efficiency. The reinforcement learning is used as a branch of the machine learning field, can promote the interaction between the robot and the environment, and promotes the robot to obtain the decision with the maximum accumulated reward through a reward and punishment mechanism, thereby realizing the optimization of the system.
Under these circumstances, people hope to realize the cooperation of people and machines in the same environment, and provide the topic of man-machine co-fusion, which further promotes the friendly relationship among people, machines and environments, and promotes various industries such as manufacturing industry, service industry and transportation industry to enter a brand new development stage.
On the premise of ensuring the working precision and efficiency of the robot, in order to realize man-machine integration, firstly, the safety of a person in the process of cooperation with the robot must be ensured, and when the robot is in contact with the person, the intention of action of the person can be distinguished from 'friendly cooperation' or 'collision will occur', and corresponding action is taken. In view of the above, an automated pipeline human-computer cooperation implementation method based on artificial potential field, intention inference and reinforcement learning is provided.
Disclosure of Invention
The technical problem to be solved by the invention is how to enable the robot to have the capabilities of recognizing action intentions and avoiding collision.
In order to solve the technical problems, the technical scheme of the invention is to provide a human-computer co-fusion pipeline implementation method based on an artificial potential field and reinforcement learning, which is characterized by comprising the following steps:
the method comprises the following steps: in order to avoid possible article loss and equipment damage caused by improper actions in real object debugging, a SolidWorks drawing equipment 3D model is adopted, and a 1:1 restored assembly line digital virtual system is built in a V-REP simulation environment;
step two: in a V-REP virtual environment, planning the motion tracks of a first robot and a second robot by adopting an A-DPPO algorithm based on reinforcement learning;
step three: in the V-REP, an optimized motion track searched by an A-DPPO algorithm is stored as a path, dummy is added to an end effector of the mechanical arm and is set as tip, and a tip-target group is formed by the dummy and the path, so that the mechanical arm executes actions according to a determined optimal path under normal conditions;
step four: in the delta t time before the t moment, the visual acquisition system records the position state sequences s1, s2.. st of the person, and the sequences are smoothed by a particle filter algorithm to eliminate noise interference; processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value containing a position sequence at the time t +1, analyzing to obtain an average change rate of a position state at the time t, classifying after comparison, and deducing a behavior intention of an observation object;
step five: and the mechanical arm acts along a preset optimal motion track, and whether the robot avoids collision by adopting an artificial potential field method is determined based on the behavior intention of the observed object.
Preferably, the step one is to build a digital virtual system restored by a pipeline 1:1, and the digital virtual system includes:
two robots:
first robot (1): selecting KUKA LBR iiiwa 7R800, locating at the head end of the conveyor belt, and taking charge of grabbing and placing articles on the conveyor belt;
the second robot is two: selecting KUKA KR 6R 900 sixx, locating at the tail end of the conveyor belt, and taking charge of grabbing the articles on the conveyor belt and placing the articles at a specified position;
horizontal moving guide rail (c): the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR can be combined into a whole to be regarded as a 7-degree-of-freedom robot;
a turntable (4): 6 articles can be hung on one side for storing parts;
four indicate cylinder type clamping jaw fifthly: the clamping jaw has universality for grabbing articles in various shapes and is responsible for grabbing the articles;
a cylindrical article: the cylindrical object is more difficult to grasp than other objects with regular shapes, so that the consideration on the grasping pose is more examined;
a conveying belt (c): the total length of the conveyor belt is 3.5 meters, and the conveyor belt is responsible for conveying articles, and is provided with an infrared sensor (I) for detecting the positions of the articles on the conveyor belt;
the vision acquisition system is positioned at the center right above the whole working space, and an upper computer is used for processing the acquired image; the upper computer is connected with the PLC, and the PLC is used for carrying out overall control on the system.
Preferably, implemented in the pipeline: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.
Preferably, the second step includes:
step 2-1: determining a position reward function and a direction reward function; the position reward function is composed of an obstacle avoidance item and a target guide item, and is defined as:
Rpos(DEO,DET)=fob(DEO)+ftr(DEO,DET)
wherein, in the formula, DEOThe smaller the value of the relative distance between the mechanical arm end effector and the obstacle, the more the penalty is, DETThe relative distance between the end effector of the robotic arm and the target point,
Figure BDA0002464180780000031
α is a constant, the sign [ ·]+Is defined as:
Figure BDA0002464180780000032
the directional reward function is defined as:
Figure BDA0002464180780000033
in the formula (I), the compound is shown in the specification,
Figure BDA0002464180780000034
an included angle between the current motion vector of the end effector of the mechanical arm and the expected relative motion direction is shown, and tau is a positive compensation parameter;
the azimuth reward function of the A-DPPO is defined as:
Figure BDA0002464180780000035
wherein λ ispos,λorientWeights for the location and direction reward functions, respectively;
step 2-2: relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state StThe torque of each rotary joint of the robot arm is denoted by atAfter the value evaluation of the evaluation network, the strategy network is combined with StAnd value calculation of judgments atAnd performing an action to update the environmental status to St+1Calculating the reward value R of the current action by the orientation reward functiontThe above information is stored as a set of four-tuple information and noted (S)t,at,Rt,St+1) And respectively training the strategy network and the estimation network by the quadruple information and the penalty item KL, and correcting the action deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.
Preferably, the fourth step includes:
step 4-1: and collecting images and carrying out filtering processing. Assuming that the intention inference system sends an instruction at the time t, the vision acquisition system records the position state sequence s of the person within the time delta t before the time t1,s2...,stAnd smoothing the sequence by a particle filter algorithm to eliminate noise interference. And extracting H components by adopting an HSV model, constructing a color distribution histogram with a kernel function, and calculating a candidate target model of the particle. The similarity of the candidate model to the target model can be described by the babbitt distance:
Figure BDA0002464180780000041
in the formula (I), the compound is shown in the specification,
Figure BDA0002464180780000049
representing the similarity of the candidate model and the target model. The update equation for the system is defined as:
Figure BDA0002464180780000042
in the formula
Figure BDA0002464180780000043
The probability density function of the system metric model at the kth frame,
Figure BDA0002464180780000044
is the system noise at the time of the k-th frame,
Figure BDA0002464180780000045
the pixel point of the k frame. Then further determining the target position as:
Figure BDA0002464180780000046
step 4-2: processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value of a position sequence containing a t +1 moment, analyzing to obtain an average change rate of a position state at the t moment, classifying after comparison, and deducing the behavior intention of an observation object and classifying into the following conditions: fast/slow left movement, fast/slow right movement, fast/slow forward movement, stationary.
Preferably, the step five includes:
step 5-1: position vector P by end effectorendAnd a position vector P of the target pointgoalAn error vector e between the end effector and the target point is determined. Then the attraction vector vattIs defined as:
vatt=-Kpe-Ki∫edt
step 5-2: the method adopts a Damped Least Squares (DLS) method to solve, can better deal with inverse kinematics solution problems with redundant degrees of freedom, and attracts the angular velocity of the mechanical arm joint corresponding to the vector
Figure BDA0002464180780000047
Is defined as:
Figure BDA0002464180780000048
step 5-3: from the position vector P of the point on the arm closest to the obstaclerAnd a position vector P of a point on the obstacle closest to the robotoDefining a unit vector s of the exclusion vector; in the direction of repulsionMagnitude f of the quantityrepIs defined as:
Figure BDA0002464180780000051
in the formula (d)0A predetermined safety distance, dminIs the minimum distance, k, between the robot and the obstaclerepIs the rejection coefficient. The accurate minimum distance between the mechanical arm and the person or the obstacle can be obtained by means of a minimum distance calculation module of the V-REP;
the exclusion vector is defined as:
vrep=freps
the damping least square method (DLS) is adopted for solving, and the angular velocity of the mechanical arm joint is determined by the rejection vector
Figure BDA0002464180780000052
Is defined as:
Figure BDA0002464180780000053
step 5-4: the joint angular velocity of the mechanical arm is defined as:
Figure BDA0002464180780000054
in the formula, k is a repulsion term weight coefficient, and a repulsion vector weight coefficient k in the artificial potential field is determined by combining the position and the speed of the observed object at the next moment and the current position of the mechanical arm.
Preferably, d isminModified as dmin-dcrWherein d iscrIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.
Preferably, the value k is graded according to the speed, a reasonable value between 0 and 1 is given to the value k according to the speed grade of the predicted object approaching the mechanical arm, namely the proportion of the reasonably compressed repulsive vector, so that collision avoidance is more intelligent, when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoidance is adopted, and the weight coefficient k is 1; when the predicted object is close to the mechanical arm or far away from the mechanical arm at a slower speed through intention reasoning, the mechanical arm continues to work, and the weight coefficient k is 0; after the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined.
The invention adopts an MATLAB platform to bear the high calculation amount of the system, and improves the overall running speed in a V-REP and MATLAB combined simulation mode. The algorithm composition framework of the whole system is shown in fig. 4.
Drawings
FIG. 1 is a 1:1 restored digitized virtual pipeline system built with V-REP;
FIG. 2 is a flow chart of a reinforcement learning A-DPPO algorithm;
FIG. 3 is a schematic diagram of an artificial potential field method based on a predetermined optimal trajectory;
FIG. 4 is a schematic diagram of an algorithm composition framework of the overall system of this example.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.
Examples
The method plans the action path of the robot in advance through reinforcement learning, and in actual operation, the robot acts on the established path, judges the action intention of a human or other robots through a grey prediction model, and introduces an artificial potential field to avoid possible collision. The method not only improves the efficiency of the assembly line, but also enables the robot to have the capabilities of identifying action intentions and avoiding collision, and provides a solution for realizing man-machine cooperation.
The invention discloses an intention inference method based on gray prediction and a collision avoidance method based on an artificial potential field, as shown in figure 1, and the intention inference method is characterized in that:
two robots:
preferably, the first robot 1 is selected as KUKA LBR iiiwa 7R800, located at the head end of the conveyor belt. The iwa is a 7-degree-of-freedom robot capable of realizing man-machine interaction, is one of representatives of cooperative robots, and is responsible for grabbing and placing articles on a conveying belt;
preferably, the second robot 2 is selected from the group consisting of KUKA KR 6R 900 sixx, which is located at the end of the conveyor belt. The robot has the advantages that the space is saved, the functions of efficient assembly, carrying, packaging, detection and the like can be realized, and the robot is responsible for grabbing articles on a conveyor belt and placing the articles to a specified position;
horizontal movement guide rail 3: the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR6 and the KR6 can be combined into a whole to be regarded as a 7-degree-of-freedom robot;
the rotary table 4: 6 articles can be hung on one side for storing parts;
four indicate cylinder type clamping jaw 5: the clamping jaw has universality for grabbing articles in various shapes and is responsible for grabbing the articles;
cylindrical article 6: the cylindrical object is more difficult to grasp than other objects with regular shapes, so that the consideration on the grasping pose is more examined;
conveyor belt 7: the total length of the conveyor belt is 3.5 meters, and the conveyor belt is responsible for conveying articles, and is provided with an infrared sensor for detecting the positions of the articles on the conveyor belt;
the vision acquisition system is positioned at the center right above the whole working space, and an upper computer is used for processing the acquired image;
preferably, an upper computer is connected with the PLC, and the PLC is used for controlling the system integrally.
The technical scheme of the invention is as follows:
the method comprises the following steps: in order to avoid the possible article loss and equipment damage caused by improper actions in the real object debugging, a digital virtual system of a production line is built.
Preferably, a 3D model of the device is drawn by using SolidWorks, and a digital virtual system restored by a pipeline 1:1 is built in a V-REP simulation environment.
The following steps are realized in the set of assembly line: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.
Step two: and in the V-REP virtual environment, planning the motion tracks of the first robot and the second robot by adopting an A-DPPO algorithm based on reinforcement learning.
First, a location reward function and a direction reward function are determined.
The position reward function consists of an obstacle avoidance item and a target guide item, wherein the obstacle avoidance item is defined as:
Figure BDA0002464180780000071
in the formula, DEOThe smaller the value of the relative distance between the end effector of the mechanical arm and the obstacle, the more penalized.
The target guidance item is defined as:
Figure BDA0002464180780000072
in the formula, DETα is a constant, symbol [. cndot. ] that is the relative distance between the end effector of the robotic arm and the target point]+Is defined as:
Figure BDA0002464180780000081
combining equations (1), (2), the location reward function is defined as:
Rpos(DEO,DET)=fob(DEO)+ftr(DEO,DET) (4)
the directional reward function is defined as:
Figure BDA0002464180780000082
in the formula (I), the compound is shown in the specification,
Figure BDA0002464180780000083
and tau is an included angle between the current motion vector of the mechanical arm end effector and the expected relative motion direction, and is a positive compensation parameter.
Let λpos,λorientThe weights of the location and direction reward functions, respectively, then the azimuth reward function of the a-DPPO is defined as:
Figure BDA0002464180780000084
relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state StThe torque of each rotary joint of the robot arm is denoted by atAfter the value evaluation of the evaluation network, the strategy network is combined with StAnd value calculation of judgments atAnd performing an action to update the environmental status to St+1Calculating the reward value R of the current action by the orientation reward functiontThe above information is stored as a set of four-tuple information and noted (S)t,at,Rt,St+1) And respectively training the strategy network and the estimation network by the quadruple information and the penalty item KL, and correcting the action deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.
Step three: in the V-REP, an optimized motion track found by an A-DPPO algorithm is stored as a path, a virtual object (dummy) is added at the tail end of a mechanical arm and is set as a tip (tip), a tip-target link (tip-target group) is formed with a target (target) on the path, a motion chain is formed, the dummy is added at an end actuator of the mechanical arm and is set as the tip, and the tip-target group is formed with the path, so that the mechanical arm executes actions according to a determined optimal path under normal conditions.
Step four: intent reasoning capabilities are incorporated into pipelined systems. Assuming that the intention inference system issues an instruction at time t, Δ t time before time tIn the interior, the vision acquisition system records the position state sequence s of the person1,s2...,stAnd smoothing the sequence by a particle filter algorithm to eliminate noise interference. Preferably, an HSV model is adopted, the calculated amount is small, the color characteristics of the image can be reflected better, H components are extracted, a color distribution histogram with a kernel function is constructed, and a candidate target model of the particle is calculated. The similarity of the candidate model to the target model can be described by the babbitt distance:
Figure BDA0002464180780000091
in the formula (I), the compound is shown in the specification,
Figure BDA0002464180780000092
representing the similarity of the candidate model and the target model. The update equation for the system is defined as:
Figure BDA0002464180780000093
in the formula
Figure BDA0002464180780000094
The probability density function of the system metric model at the kth frame,
Figure BDA0002464180780000095
is the system noise at the time of the k-th frame,
Figure BDA0002464180780000096
the pixel point of the k frame. Then further determining the target position as:
Figure BDA0002464180780000099
preferably, the smoothed data is processed by using a metabolism GM (1,1) model to obtain a predicted value of a position sequence including a time t +1, an average change rate of a position state at the time t is obtained by analysis, and classification is performed after comparison to infer a behavior intention of an observation object and classify the behavior intention into the following cases: fast/slow left movement, fast/slow right movement, fast/slow forward movement, stationary.
Step five: and deciding whether the robot takes avoidance or not based on the behavior intention of the observation object. Preferably, the obstacle is avoided by means of an artificial potential field. As shown in fig. 3, the robotic arm follows a predetermined optimal motion trajectory with an attraction vector acting on the end effector to direct it toward the target, the one vector being defined by an error function:
e=Pend-Pgoal(10)
where e is the error vector between the end effector and the target point, PendIs the position vector of the end effector, PgoalIs the position vector of the target point. Then the attraction vector vattIs defined as:
vatt=-Kpe-Ki∫edt (11)
in the formula, KpAnd KiRespectively representing a scale coefficient matrix and an integral coefficient matrix.
Solving an inverse kinematics solution for the mechanical arm, preferably using Damped Least Squares (DLS), which better addresses inverse kinematics solution problems with redundant degrees of freedom, the mechanical arm joint angular velocity defined by the attraction vector
Figure BDA0002464180780000097
Is defined as:
Figure BDA0002464180780000098
in the formula, J is a Jacobian matrix associated with the center point of the mechanical arm and the end tool, lambda is a damping coefficient, and I is a unit matrix.
The unit vector s of the exclusion vector is defined as:
Figure BDA0002464180780000101
in the formula, PrIndicating machinePosition vector of the point on the arm closest to the obstacle, PoA position vector representing a point on the obstacle closest to the robot.
Magnitude f of the repulsive vectorrepIs defined as:
Figure BDA0002464180780000102
in the formula (d)0A predetermined safety distance, dminIs the minimum distance, k, between the robot and the obstaclerepIs the rejection coefficient. Preferably, the precise minimum distance between the robot arm and the person or obstacle is obtained by means of a minimum distance calculation module of V-REP. More preferably, d can be modifiedminIs dmin-dcrWherein d iscrIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.
The exclusion vector is defined as:
vrep=freps (15)
the angular velocity of the mechanical arm joint determined by the repulsive vector
Figure BDA0002464180780000103
Is defined as:
Figure BDA0002464180780000104
in the formula, JcpIs the jacobian matrix associated with the closest point of obstacle distance on the mechanical arm.
Finally, the joint angular velocity of the mechanical arm is defined as:
Figure BDA0002464180780000105
where k is the repulsive term weight coefficient. And determining a weight coefficient k of a repulsion vector in the artificial potential field by combining the position and the speed of the observed object at the next moment and the current position of the mechanical arm, and when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoiding is adopted, and the weight coefficient k is 1. When it is determined through the intention inference that the predicted object is approaching the robot arm at a slow speed or is moving away from the robot arm, the robot arm continues to operate, and the weight coefficient k takes 0. Preferably, the k value can be graded according to the speed, a reasonable value between 0 and 1 is given to k according to the speed grade of the predicted object close to the mechanical arm, namely the specific gravity of the rejection vector is reasonably compressed, and the collision avoidance is more intelligent.
After the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined. Preferably, the MATLAB platform is adopted to bear high calculation amount of the system, and the overall running speed is improved in a V-REP and MATLAB joint simulation mode. The algorithm composition framework of the whole system is shown in fig. 4.
In conclusion, the invention integrates technologies such as reinforcement learning, intention reasoning, artificial potential field and the like into the pipelining operation, and determines whether to adopt a potential field method to avoid the collision by judging whether the action intention of a person is defined as friendly cooperation or collision to occur on the premise of searching the optimal motion trail by utilizing the reinforcement learning and ensuring the working efficiency of the pipelining. By combining the method, the invention realizes the co-fusion of the human-machine-environment of the automatic assembly line. Meanwhile, the invention builds a 1:1 restored digital virtual system by means of V-REP, performs learning training on the A-DPPO and the artificial potential field through the virtual system, and then applies the found optimal path to the actual production line for testing, thereby avoiding the equipment damage and article loss which may occur in the process of utilizing a real object platform for debugging.

Claims (8)

1. A man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning is characterized by comprising the following steps:
the method comprises the following steps: in order to avoid possible article loss and equipment damage caused by improper actions in real object debugging, a SolidWorks drawing equipment 3D model is adopted, and a 1:1 restored assembly line digital virtual system is built in a V-REP simulation environment;
step two: in a V-REP virtual environment, planning the motion tracks of a first robot and a second robot by adopting an A-DPPO algorithm based on reinforcement learning;
step three: in the V-REP, an optimized motion track searched by an A-DPPO algorithm is stored as a path, dummy is added to an end effector of the mechanical arm and is set as tip, and a tip-target group is formed by the dummy and the path, so that the mechanical arm executes actions according to a determined optimal path under normal conditions;
step four: in the delta t time before the t moment, the visual acquisition system records the position state sequences s1, s2.. st of the person, and the sequences are smoothed by a particle filter algorithm to eliminate noise interference; processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value containing a position sequence at the time t +1, analyzing to obtain an average change rate of a position state at the time t, classifying after comparison, and deducing a behavior intention of an observation object;
step five: and the mechanical arm acts along a preset optimal motion track, and whether the robot avoids collision by adopting an artificial potential field method is determined based on the behavior intention of the observed object.
2. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that:
the step one, the digital virtual system restored by the pipeline 1:1 is built, and the method comprises the following steps:
two robots:
first robot (1): selecting KUKA LBR iiiwa 7R800, locating at the head end of the conveyor belt, and taking charge of grabbing and placing articles on the conveyor belt;
the second robot is two: selecting KUKA KR 6R 900 sixx, locating at the tail end of the conveyor belt, and taking charge of grabbing the articles on the conveyor belt and placing the articles at a specified position;
horizontal moving guide rail (c): the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR can be combined into a whole to be regarded as a 7-degree-of-freedom robot; a turntable (4): 6 articles can be hung on one side for storing parts;
four indicate cylinder type clamping jaw fifthly: the clamping jaw has universality for grabbing articles in various shapes and is responsible for grabbing the articles;
a cylindrical article: the cylindrical object is more difficult to grasp than other objects with regular shapes, so that the consideration on the grasping pose is more examined;
a conveying belt (c): the total length of the conveyor belt is 3.5 meters, and the conveyor belt is responsible for conveying articles, and is provided with an infrared sensor (I) for detecting the positions of the articles on the conveyor belt;
the vision acquisition system is positioned at the center right above the whole working space, and an upper computer is used for processing the acquired image; the upper computer is connected with the PLC, and the PLC is used for carrying out overall control on the system.
3. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 2, characterized in that: implementing in the pipeline: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.
4. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the second step comprises the following steps:
step 2-1: determining a position reward function and a direction reward function; the position reward function is composed of an obstacle avoidance item and a target guide item, and is defined as:
Rpos(DEO,DET)=fob(DEO)+ftr(DEO,DET)
wherein, in the formula, DEOFor the end of a robot armThe smaller the relative distance between the actuator and the obstacle, the more penalties, DETThe relative distance between the end effector of the robotic arm and the target point,
Figure FDA0002464180770000021
α is a constant, the sign [ ·]+Is defined as:
Figure FDA0002464180770000022
the directional reward function is defined as:
Figure FDA0002464180770000023
in the formula (I), the compound is shown in the specification,
Figure FDA0002464180770000024
an included angle between the current motion vector of the end effector of the mechanical arm and the expected relative motion direction is shown, and tau is a positive compensation parameter;
the azimuth reward function of the A-DPPO is defined as:
Figure FDA0002464180770000031
wherein λ ispos,λorientWeights for the location and direction reward functions, respectively;
step 2-2: relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state StThe torque of each rotary joint of the robot arm is denoted by atAfter the value evaluation of the evaluation network, the strategy network is combined with StAnd value calculation of judgments atAnd performing an action to update the environmental status to St+1Calculating the reward value R of the current action by the orientation reward functiontThe above information is stored as a set of four-tuple information and noted (S)t,at,Rt,St+1) The four-tuple information and the penalty term KL are respectively used for carrying out the operations on the policy network and the estimation networkTraining, correcting the motion deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.
5. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the fourth step comprises:
step 4-1: and collecting images and carrying out filtering processing. Assuming that the intention inference system sends an instruction at the time t, the vision acquisition system records the position state sequence s of the person within the time delta t before the time t1,s2...,stAnd smoothing the sequence by a particle filter algorithm to eliminate noise interference. And extracting H components by adopting an HSV model, constructing a color distribution histogram with a kernel function, and calculating a candidate target model of the particle. The similarity of the candidate model to the target model can be described by the babbitt distance:
Figure FDA0002464180770000032
in the formula (I), the compound is shown in the specification,
Figure FDA0002464180770000033
representing the similarity of the candidate model and the target model. The update equation for the system is defined as:
Figure FDA0002464180770000034
in the formula
Figure FDA0002464180770000035
The probability density function of the system metric model at the kth frame,
Figure FDA0002464180770000036
is the system noise at the time of the k-th frame,
Figure FDA0002464180770000037
in the k-th frameAnd (6) pixel points. Then further determining the target position as:
Figure FDA0002464180770000038
step 4-2: processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value of a position sequence containing a t +1 moment, analyzing to obtain an average change rate of a position state at the t moment, classifying after comparison, and deducing the behavior intention of an observation object and classifying into the following conditions: fast/slow left movement, fast/slow right movement, fast/slow forward movement, stationary.
6. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the fifth step comprises the following steps:
step 5-1: position vector P by end effectorendAnd a position vector P of the target pointgoalAn error vector e between the end effector and the target point is determined. Then the attraction vector vattIs defined as:
vatt=-Kpe-Ki∫edt
step 5-2: the method adopts a Damped Least Squares (DLS) method to solve, can better deal with inverse kinematics solution problems with redundant degrees of freedom, and attracts the angular velocity of the mechanical arm joint corresponding to the vector
Figure FDA0002464180770000041
Is defined as:
Figure FDA0002464180770000042
step 5-3: from the position vector P of the point on the arm closest to the obstaclerAnd a position vector P of a point on the obstacle closest to the robotoDefining a unit vector s of the exclusion vector; magnitude f of the repulsive vectorrepIs defined as:
Figure FDA0002464180770000043
in the formula (d)0A predetermined safety distance, dminIs the minimum distance, k, between the robot and the obstaclerepIs the rejection coefficient. The accurate minimum distance between the mechanical arm and the person or the obstacle can be obtained by means of a minimum distance calculation module of the V-REP;
the exclusion vector is defined as:
vrep=freps
the damping least square method (DLS) is adopted for solving, and the angular velocity of the mechanical arm joint is determined by the rejection vector
Figure FDA0002464180770000046
Is defined as:
Figure FDA0002464180770000044
step 5-4: the joint angular velocity of the mechanical arm is defined as:
Figure FDA0002464180770000045
in the formula, k is a repulsion term weight coefficient, and a repulsion vector weight coefficient k in the artificial potential field is determined by combining the position and the speed of the observed object at the next moment and the current position of the mechanical arm.
7. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 6, characterized in that: d isminModified as dmin-dcrWherein d iscrIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.
8. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 6, characterized in that: grading the value k according to the speed, giving k a reasonable value between 0 and 1 according to the speed grade of the predicted object approaching the mechanical arm, namely reasonably compressing the specific gravity of the exclusion vector, so that collision avoidance is more intelligent, and when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoiding is adopted, and the weight coefficient k is 1; when the predicted object is close to the mechanical arm or far away from the mechanical arm at a slower speed through intention reasoning, the mechanical arm continues to work, and the weight coefficient k is 0; after the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined.
CN202010328714.0A 2020-04-23 2020-04-23 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning Pending CN111515932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010328714.0A CN111515932A (en) 2020-04-23 2020-04-23 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010328714.0A CN111515932A (en) 2020-04-23 2020-04-23 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning

Publications (1)

Publication Number Publication Date
CN111515932A true CN111515932A (en) 2020-08-11

Family

ID=71904534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010328714.0A Pending CN111515932A (en) 2020-04-23 2020-04-23 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning

Country Status (1)

Country Link
CN (1) CN111515932A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112192614A (en) * 2020-10-09 2021-01-08 西南科技大学 Man-machine cooperation based shaft hole assembling method for nuclear operation and maintenance robot
CN112904848A (en) * 2021-01-18 2021-06-04 长沙理工大学 Mobile robot path planning method based on deep reinforcement learning
CN113043284A (en) * 2021-04-23 2021-06-29 江苏理工学院 Multi-constraint inverse solution method for redundant robot
CN113156940A (en) * 2021-03-03 2021-07-23 河北工业职业技术学院 Robot path planning method based on curiosity-greedy reward function
CN113341706A (en) * 2021-05-06 2021-09-03 东华大学 Man-machine cooperation assembly line system based on deep reinforcement learning
CN113848974A (en) * 2021-09-28 2021-12-28 西北工业大学 Aircraft trajectory planning method and system based on deep reinforcement learning
CN113858196A (en) * 2021-09-26 2021-12-31 中国舰船研究设计中心 Robot disassembly sequence planning method considering robot collision avoidance track
CN113934219A (en) * 2021-12-16 2022-01-14 宏景科技股份有限公司 Robot automatic obstacle avoidance method, system, equipment and medium
CN113970321A (en) * 2021-10-21 2022-01-25 北京房江湖科技有限公司 Method and device for calculating house type dynamic line
CN114589701A (en) * 2022-04-20 2022-06-07 浙江大学 Multi-joint mechanical arm obstacle avoidance inverse kinematics method based on damping least squares
CN114851184A (en) * 2021-01-20 2022-08-05 广东技术师范大学 Industrial robot-oriented reinforcement learning reward value calculation method
WO2023065494A1 (en) * 2021-10-18 2023-04-27 东南大学 Intent-driven reinforcement learning path planning method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823466A (en) * 2013-05-23 2014-05-28 电子科技大学 Path planning method for mobile robot in dynamic environment
US20160236349A1 (en) * 2015-02-18 2016-08-18 Disney Enterprises, Inc. Control method for floating-base robots including generating feasible motions using time warping
CN107891425A (en) * 2017-11-21 2018-04-10 北方民族大学 The control method of the intelligent man-machine co-melting humanoid robot system of both arms security cooperation
CN110262478A (en) * 2019-05-27 2019-09-20 浙江工业大学 Man-machine safety obstacle-avoiding route planning method based on modified embedded-atom method
CN110253570A (en) * 2019-05-27 2019-09-20 浙江工业大学 The industrial machinery arm man-machine safety system of view-based access control model
CN110900601A (en) * 2019-11-15 2020-03-24 武汉理工大学 Robot operation autonomous control method for human-robot cooperation safety guarantee

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823466A (en) * 2013-05-23 2014-05-28 电子科技大学 Path planning method for mobile robot in dynamic environment
US20160236349A1 (en) * 2015-02-18 2016-08-18 Disney Enterprises, Inc. Control method for floating-base robots including generating feasible motions using time warping
CN107891425A (en) * 2017-11-21 2018-04-10 北方民族大学 The control method of the intelligent man-machine co-melting humanoid robot system of both arms security cooperation
CN110262478A (en) * 2019-05-27 2019-09-20 浙江工业大学 Man-machine safety obstacle-avoiding route planning method based on modified embedded-atom method
CN110253570A (en) * 2019-05-27 2019-09-20 浙江工业大学 The industrial machinery arm man-machine safety system of view-based access control model
CN110900601A (en) * 2019-11-15 2020-03-24 武汉理工大学 Robot operation autonomous control method for human-robot cooperation safety guarantee

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
李洁; 苏剑波: "基于强化学习和意图推理的目标跟踪", 《第三十三届中国控制会议论文集》 *
李跃;邵振洲;赵振东;施智平;关永: "面向轨迹规划的深度强化学习奖励函数设计", 《计算机工程与应用》 *
王本亮等: "一种基于几何力学的机械臂末端规划算法", 《动力学与控制学报》 *
黄俊杰: "《机器人技术基础》", 31 August 2018, 华中科技大学出版社 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112192614A (en) * 2020-10-09 2021-01-08 西南科技大学 Man-machine cooperation based shaft hole assembling method for nuclear operation and maintenance robot
CN112904848A (en) * 2021-01-18 2021-06-04 长沙理工大学 Mobile robot path planning method based on deep reinforcement learning
CN112904848B (en) * 2021-01-18 2022-08-12 长沙理工大学 Mobile robot path planning method based on deep reinforcement learning
CN114851184A (en) * 2021-01-20 2022-08-05 广东技术师范大学 Industrial robot-oriented reinforcement learning reward value calculation method
CN114851184B (en) * 2021-01-20 2023-05-09 广东技术师范大学 Reinforced learning rewarding value calculating method for industrial robot
CN113156940A (en) * 2021-03-03 2021-07-23 河北工业职业技术学院 Robot path planning method based on curiosity-greedy reward function
CN113043284A (en) * 2021-04-23 2021-06-29 江苏理工学院 Multi-constraint inverse solution method for redundant robot
CN113341706B (en) * 2021-05-06 2022-12-06 东华大学 Man-machine cooperation assembly line system based on deep reinforcement learning
CN113341706A (en) * 2021-05-06 2021-09-03 东华大学 Man-machine cooperation assembly line system based on deep reinforcement learning
CN113858196A (en) * 2021-09-26 2021-12-31 中国舰船研究设计中心 Robot disassembly sequence planning method considering robot collision avoidance track
CN113848974A (en) * 2021-09-28 2021-12-28 西北工业大学 Aircraft trajectory planning method and system based on deep reinforcement learning
CN113848974B (en) * 2021-09-28 2023-08-15 西安因诺航空科技有限公司 Aircraft trajectory planning method and system based on deep reinforcement learning
WO2023065494A1 (en) * 2021-10-18 2023-04-27 东南大学 Intent-driven reinforcement learning path planning method
CN113970321A (en) * 2021-10-21 2022-01-25 北京房江湖科技有限公司 Method and device for calculating house type dynamic line
CN113934219A (en) * 2021-12-16 2022-01-14 宏景科技股份有限公司 Robot automatic obstacle avoidance method, system, equipment and medium
CN114589701A (en) * 2022-04-20 2022-06-07 浙江大学 Multi-joint mechanical arm obstacle avoidance inverse kinematics method based on damping least squares
CN114589701B (en) * 2022-04-20 2024-04-09 浙江大学 Damping least square-based multi-joint mechanical arm obstacle avoidance inverse kinematics method

Similar Documents

Publication Publication Date Title
CN111515932A (en) Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
CN110202583B (en) Humanoid manipulator control system based on deep learning and control method thereof
US20230321837A1 (en) Machine learning device, robot system, and machine learning method for learning object picking operation
DE102016015873B3 (en) Machine learning apparatus, robot system, and machine learning system for learning a workpiece pick-up operation
CN112297013B (en) Robot intelligent grabbing method based on digital twin and deep neural network
Huang et al. Design of automatic strawberry harvest robot suitable in complex environments
CN111223141B (en) Automatic pipeline work efficiency optimization system and method based on reinforcement learning
CN113341706A (en) Man-machine cooperation assembly line system based on deep reinforcement learning
Xie et al. Visual tracking control of SCARA robot system based on deep learning and Kalman prediction method
Breyer et al. Closed-loop next-best-view planning for target-driven grasping
CN113664828A (en) Robot grabbing-throwing method based on deep reinforcement learning
Chen et al. Combined task and motion planning for a dual-arm robot to use a suction cup tool
Gonçalves et al. Grasp planning with incomplete knowledge about the object to be grasped
Xie Industrial Robot Assembly Line Design Using Machine Vision
Uçar et al. Determination of Angular Status and Dimensional Properties of Objects for Grasping with Robot Arm
Sebbata et al. An adaptive robotic grasping with a 2-finger gripper based on deep learning network
Domínguez-Vidal et al. Improving Human-Robot Interaction Effectiveness in Human-Robot Collaborative Object Transportation using Force Prediction
Han Trajectory tracking control method for flexible robot of construction machinery based on computer vision
Tian et al. Optimal Path Planning for a Robot Shelf Picking System
Yu et al. A cascaded deep learning framework for real-time and robust grasp planning
Wen et al. Research status and tendency of intelligent industrial robot
Zhang et al. Control method of shaft and hole mating based on convolution neural network in assembly building prefabricated components
Anglani et al. Learning to grasp by using visual information
Poss Applications of Object Detection in Industrial Contexts Based on Logistics Robots
Terasaki et al. Intelligent manipulation of sliding operations with parallel two-fingered grippers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811

RJ01 Rejection of invention patent application after publication