CN111515932A

CN111515932A - Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning

Info

Publication number: CN111515932A
Application number: CN202010328714.0A
Authority: CN
Inventors: 刘华山; 应丰糠; 江荣鑫; 李祥健; 程新; 夏玮; 蔡明军; 陈荣川; 梁健
Original assignee: Donghua University
Current assignee: Donghua University; National Dong Hwa University
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-08-11

Abstract

The invention discloses a man-machine co-fusion assembly line implementation method based on an artificial potential field and reinforcement learning, which adopts an A-DPPO algorithm based on reinforcement learning to plan a robot motion track, in actual operation, the robot executes actions according to a preset optimal track and has collision avoidance capability based on the artificial potential field, meanwhile, a visual sensor is adopted to acquire position information of a human, the behavior intention of the human is inferred by a metabolism GM (1,1) model after particle filtering, whether the influence of repulsive force brought by the fact that the human approaches a mechanical arm is considered or not is determined according to different intentions, so that whether the mechanical arm takes avoidance actions or not is determined, and the man-machine-environment co-fusion is realized.

Description

Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning

Technical Field

The invention relates to a man-machine co-fusion assembly line implementation method based on an artificial potential field and reinforcement learning, and belongs to the technical field of man-machine co-fusion of industrial assembly lines.

Background

With the continuous development and maturity of the robot technology, the application of the robot technology in the industrial field is more and more extensive, compared with human labor, the industrial robot cannot sense fatigue and lose interest in the process of complicated assembly line work, and the accuracy and the working efficiency of the assembly line work are greatly improved, so that the development of enterprise and social productivity is effectively promoted. Therefore, the development of mechanical automation plants is rapidly progressing around the world and in various industrial fields.

However, with the successive advent of high and new technologies such as artificial intelligence, people are not limited to the robot to perform a single mechanical operation, but want to give the robot certain degree of "intelligence", so that the robot can perform operations without human instructions, realize the perception of the environment, and further make a decision through autonomous learning, thereby enabling the robot to adapt to a complex unstructured environment and continuously improving the working efficiency. The reinforcement learning is used as a branch of the machine learning field, can promote the interaction between the robot and the environment, and promotes the robot to obtain the decision with the maximum accumulated reward through a reward and punishment mechanism, thereby realizing the optimization of the system.

Under these circumstances, people hope to realize the cooperation of people and machines in the same environment, and provide the topic of man-machine co-fusion, which further promotes the friendly relationship among people, machines and environments, and promotes various industries such as manufacturing industry, service industry and transportation industry to enter a brand new development stage.

On the premise of ensuring the working precision and efficiency of the robot, in order to realize man-machine integration, firstly, the safety of a person in the process of cooperation with the robot must be ensured, and when the robot is in contact with the person, the intention of action of the person can be distinguished from 'friendly cooperation' or 'collision will occur', and corresponding action is taken. In view of the above, an automated pipeline human-computer cooperation implementation method based on artificial potential field, intention inference and reinforcement learning is provided.

Disclosure of Invention

The technical problem to be solved by the invention is how to enable the robot to have the capabilities of recognizing action intentions and avoiding collision.

In order to solve the technical problems, the technical scheme of the invention is to provide a human-computer co-fusion pipeline implementation method based on an artificial potential field and reinforcement learning, which is characterized by comprising the following steps:

the method comprises the following steps: in order to avoid possible article loss and equipment damage caused by improper actions in real object debugging, a SolidWorks drawing equipment 3D model is adopted, and a 1:1 restored assembly line digital virtual system is built in a V-REP simulation environment;

step two: in a V-REP virtual environment, planning the motion tracks of a first robot and a second robot by adopting an A-DPPO algorithm based on reinforcement learning;

step three: in the V-REP, an optimized motion track searched by an A-DPPO algorithm is stored as a path, dummy is added to an end effector of the mechanical arm and is set as tip, and a tip-target group is formed by the dummy and the path, so that the mechanical arm executes actions according to a determined optimal path under normal conditions;

step four: in the delta t time before the t moment, the visual acquisition system records the position state sequences s1, s2.. st of the person, and the sequences are smoothed by a particle filter algorithm to eliminate noise interference; processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value containing a position sequence at the time t +1, analyzing to obtain an average change rate of a position state at the time t, classifying after comparison, and deducing a behavior intention of an observation object;

step five: and the mechanical arm acts along a preset optimal motion track, and whether the robot avoids collision by adopting an artificial potential field method is determined based on the behavior intention of the observed object.

Preferably, the step one is to build a digital virtual system restored by a pipeline 1:1, and the digital virtual system includes:

two robots:

first robot (1): selecting KUKA LBR iiiwa 7R800, locating at the head end of the conveyor belt, and taking charge of grabbing and placing articles on the conveyor belt;

the second robot is two: selecting KUKA KR 6R 900 sixx, locating at the tail end of the conveyor belt, and taking charge of grabbing the articles on the conveyor belt and placing the articles at a specified position;

horizontal moving guide rail (c): the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR can be combined into a whole to be regarded as a 7-degree-of-freedom robot;

a turntable (4): 6 articles can be hung on one side for storing parts;

four indicate cylinder type clamping jaw fifthly: the clamping jaw has universality for grabbing articles in various shapes and is responsible for grabbing the articles;

a cylindrical article: the cylindrical object is more difficult to grasp than other objects with regular shapes, so that the consideration on the grasping pose is more examined;

a conveying belt (c): the total length of the conveyor belt is 3.5 meters, and the conveyor belt is responsible for conveying articles, and is provided with an infrared sensor (I) for detecting the positions of the articles on the conveyor belt;

the vision acquisition system is positioned at the center right above the whole working space, and an upper computer is used for processing the acquired image; the upper computer is connected with the PLC, and the PLC is used for carrying out overall control on the system.

Preferably, implemented in the pipeline: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.

Preferably, the second step includes:

step 2-1: determining a position reward function and a direction reward function; the position reward function is composed of an obstacle avoidance item and a target guide item, and is defined as:

R_pos(D_EO,D_ET)＝f_ob(D_EO)+f_tr(D_EO,D_ET)

wherein, in the formula, D_EOThe smaller the value of the relative distance between the mechanical arm end effector and the obstacle, the more the penalty is, D_ETThe relative distance between the end effector of the robotic arm and the target point,

α is a constant, the sign [ ·]₊Is defined as:

the directional reward function is defined as:

in the formula (I), the compound is shown in the specification,

an included angle between the current motion vector of the end effector of the mechanical arm and the expected relative motion direction is shown, and tau is a positive compensation parameter;

the azimuth reward function of the A-DPPO is defined as:

wherein λ is_pos，λ_orientWeights for the location and direction reward functions, respectively;

step 2-2: relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state S_tThe torque of each rotary joint of the robot arm is denoted by a_tAfter the value evaluation of the evaluation network, the strategy network is combined with S_tAnd value calculation of judgments a_tAnd performing an action to update the environmental status to S_t+1Calculating the reward value R of the current action by the orientation reward function_tThe above information is stored as a set of four-tuple information and noted (S)_t,a_t,R_t,S_t+1) And respectively training the strategy network and the estimation network by the quadruple information and the penalty item KL, and correcting the action deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.

Preferably, the fourth step includes:

step 4-1: and collecting images and carrying out filtering processing. Assuming that the intention inference system sends an instruction at the time t, the vision acquisition system records the position state sequence s of the person within the time delta t before the time t₁,s₂...,s_tAnd smoothing the sequence by a particle filter algorithm to eliminate noise interference. And extracting H components by adopting an HSV model, constructing a color distribution histogram with a kernel function, and calculating a candidate target model of the particle. The similarity of the candidate model to the target model can be described by the babbitt distance:

in the formula (I), the compound is shown in the specification,

representing the similarity of the candidate model and the target model. The update equation for the system is defined as:

in the formula

The probability density function of the system metric model at the kth frame,

is the system noise at the time of the k-th frame,

the pixel point of the k frame. Then further determining the target position as:

step 4-2: processing the smoothed data by adopting a metabolism GM (1,1) model to obtain a predicted value of a position sequence containing a t +1 moment, analyzing to obtain an average change rate of a position state at the t moment, classifying after comparison, and deducing the behavior intention of an observation object and classifying into the following conditions: fast/slow left movement, fast/slow right movement, fast/slow forward movement, stationary.

Preferably, the step five includes:

step 5-1: position vector P by end effector_endAnd a position vector P of the target point_goalAn error vector e between the end effector and the target point is determined. Then the attraction vector v_attIs defined as:

v_att＝-K_pe-K_i∫edt

step 5-2: the method adopts a Damped Least Squares (DLS) method to solve, can better deal with inverse kinematics solution problems with redundant degrees of freedom, and attracts the angular velocity of the mechanical arm joint corresponding to the vector

Is defined as:

step 5-3: from the position vector P of the point on the arm closest to the obstacle_rAnd a position vector P of a point on the obstacle closest to the robot_oDefining a unit vector s of the exclusion vector; in the direction of repulsionMagnitude f of the quantity_repIs defined as:

in the formula (d)₀A predetermined safety distance, d_minIs the minimum distance, k, between the robot and the obstacle_repIs the rejection coefficient. The accurate minimum distance between the mechanical arm and the person or the obstacle can be obtained by means of a minimum distance calculation module of the V-REP;

the exclusion vector is defined as:

v_rep＝f_reps

the damping least square method (DLS) is adopted for solving, and the angular velocity of the mechanical arm joint is determined by the rejection vector

Is defined as:

step 5-4: the joint angular velocity of the mechanical arm is defined as:

in the formula, k is a repulsion term weight coefficient, and a repulsion vector weight coefficient k in the artificial potential field is determined by combining the position and the speed of the observed object at the next moment and the current position of the mechanical arm.

Preferably, d is_minModified as d_min-d_crWherein d is_crIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.

Preferably, the value k is graded according to the speed, a reasonable value between 0 and 1 is given to the value k according to the speed grade of the predicted object approaching the mechanical arm, namely the proportion of the reasonably compressed repulsive vector, so that collision avoidance is more intelligent, when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoidance is adopted, and the weight coefficient k is 1; when the predicted object is close to the mechanical arm or far away from the mechanical arm at a slower speed through intention reasoning, the mechanical arm continues to work, and the weight coefficient k is 0; after the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined.

The invention adopts an MATLAB platform to bear the high calculation amount of the system, and improves the overall running speed in a V-REP and MATLAB combined simulation mode. The algorithm composition framework of the whole system is shown in fig. 4.

Drawings

FIG. 1 is a 1:1 restored digitized virtual pipeline system built with V-REP;

FIG. 2 is a flow chart of a reinforcement learning A-DPPO algorithm;

FIG. 3 is a schematic diagram of an artificial potential field method based on a predetermined optimal trajectory;

FIG. 4 is a schematic diagram of an algorithm composition framework of the overall system of this example.

Detailed Description

In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.

Examples

The method plans the action path of the robot in advance through reinforcement learning, and in actual operation, the robot acts on the established path, judges the action intention of a human or other robots through a grey prediction model, and introduces an artificial potential field to avoid possible collision. The method not only improves the efficiency of the assembly line, but also enables the robot to have the capabilities of identifying action intentions and avoiding collision, and provides a solution for realizing man-machine cooperation.

The invention discloses an intention inference method based on gray prediction and a collision avoidance method based on an artificial potential field, as shown in figure 1, and the intention inference method is characterized in that:

two robots:

preferably, the first robot 1 is selected as KUKA LBR iiiwa 7R800, located at the head end of the conveyor belt. The iwa is a 7-degree-of-freedom robot capable of realizing man-machine interaction, is one of representatives of cooperative robots, and is responsible for grabbing and placing articles on a conveying belt;

preferably, the second robot 2 is selected from the group consisting of KUKA KR 6R 900 sixx, which is located at the end of the conveyor belt. The robot has the advantages that the space is saved, the functions of efficient assembly, carrying, packaging, detection and the like can be realized, and the robot is responsible for grabbing articles on a conveyor belt and placing the articles to a specified position;

horizontal movement guide rail 3: the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR6 and the KR6 can be combined into a whole to be regarded as a 7-degree-of-freedom robot;

the rotary table 4: 6 articles can be hung on one side for storing parts;

four indicate cylinder type clamping jaw 5: the clamping jaw has universality for grabbing articles in various shapes and is responsible for grabbing the articles;

cylindrical article 6: the cylindrical object is more difficult to grasp than other objects with regular shapes, so that the consideration on the grasping pose is more examined;

conveyor belt 7: the total length of the conveyor belt is 3.5 meters, and the conveyor belt is responsible for conveying articles, and is provided with an infrared sensor for detecting the positions of the articles on the conveyor belt;

the vision acquisition system is positioned at the center right above the whole working space, and an upper computer is used for processing the acquired image;

preferably, an upper computer is connected with the PLC, and the PLC is used for controlling the system integrally.

The technical scheme of the invention is as follows:

the method comprises the following steps: in order to avoid the possible article loss and equipment damage caused by improper actions in the real object debugging, a digital virtual system of a production line is built.

Preferably, a 3D model of the device is drawn by using SolidWorks, and a digital virtual system restored by a pipeline 1:1 is built in a V-REP simulation environment.

The following steps are realized in the set of assembly line: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.

Step two: and in the V-REP virtual environment, planning the motion tracks of the first robot and the second robot by adopting an A-DPPO algorithm based on reinforcement learning.

First, a location reward function and a direction reward function are determined.

The position reward function consists of an obstacle avoidance item and a target guide item, wherein the obstacle avoidance item is defined as:

in the formula, D_EOThe smaller the value of the relative distance between the end effector of the mechanical arm and the obstacle, the more penalized.

The target guidance item is defined as:

in the formula, D_ETα is a constant, symbol [. cndot. ] that is the relative distance between the end effector of the robotic arm and the target point]₊Is defined as:

combining equations (1), (2), the location reward function is defined as:

R_pos(D_EO,D_ET)＝f_ob(D_EO)+f_tr(D_EO,D_ET) (4)

the directional reward function is defined as:

in the formula (I), the compound is shown in the specification,

and tau is an included angle between the current motion vector of the mechanical arm end effector and the expected relative motion direction, and is a positive compensation parameter.

Let λ_pos，λ_orientThe weights of the location and direction reward functions, respectively, then the azimuth reward function of the a-DPPO is defined as:

relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state S_tThe torque of each rotary joint of the robot arm is denoted by a_tAfter the value evaluation of the evaluation network, the strategy network is combined with S_tAnd value calculation of judgments a_tAnd performing an action to update the environmental status to S_t+1Calculating the reward value R of the current action by the orientation reward function_tThe above information is stored as a set of four-tuple information and noted (S)_t,a_t,R_t,S_t+1) And respectively training the strategy network and the estimation network by the quadruple information and the penalty item KL, and correcting the action deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.

Step three: in the V-REP, an optimized motion track found by an A-DPPO algorithm is stored as a path, a virtual object (dummy) is added at the tail end of a mechanical arm and is set as a tip (tip), a tip-target link (tip-target group) is formed with a target (target) on the path, a motion chain is formed, the dummy is added at an end actuator of the mechanical arm and is set as the tip, and the tip-target group is formed with the path, so that the mechanical arm executes actions according to a determined optimal path under normal conditions.

Step four: intent reasoning capabilities are incorporated into pipelined systems. Assuming that the intention inference system issues an instruction at time t, Δ t time before time tIn the interior, the vision acquisition system records the position state sequence s of the person₁,s₂...,s_tAnd smoothing the sequence by a particle filter algorithm to eliminate noise interference. Preferably, an HSV model is adopted, the calculated amount is small, the color characteristics of the image can be reflected better, H components are extracted, a color distribution histogram with a kernel function is constructed, and a candidate target model of the particle is calculated. The similarity of the candidate model to the target model can be described by the babbitt distance:

in the formula (I), the compound is shown in the specification,

in the formula

The probability density function of the system metric model at the kth frame,

is the system noise at the time of the k-th frame,

preferably, the smoothed data is processed by using a metabolism GM (1,1) model to obtain a predicted value of a position sequence including a time t +1, an average change rate of a position state at the time t is obtained by analysis, and classification is performed after comparison to infer a behavior intention of an observation object and classify the behavior intention into the following cases: fast/slow left movement, fast/slow right movement, fast/slow forward movement, stationary.

Step five: and deciding whether the robot takes avoidance or not based on the behavior intention of the observation object. Preferably, the obstacle is avoided by means of an artificial potential field. As shown in fig. 3, the robotic arm follows a predetermined optimal motion trajectory with an attraction vector acting on the end effector to direct it toward the target, the one vector being defined by an error function:

e＝P_end-P_goal(10)

where e is the error vector between the end effector and the target point, P_endIs the position vector of the end effector, P_goalIs the position vector of the target point. Then the attraction vector v_attIs defined as:

v_att＝-K_pe-K_i∫edt (11)

in the formula, K_pAnd K_iRespectively representing a scale coefficient matrix and an integral coefficient matrix.

Solving an inverse kinematics solution for the mechanical arm, preferably using Damped Least Squares (DLS), which better addresses inverse kinematics solution problems with redundant degrees of freedom, the mechanical arm joint angular velocity defined by the attraction vector

Is defined as:

in the formula, J is a Jacobian matrix associated with the center point of the mechanical arm and the end tool, lambda is a damping coefficient, and I is a unit matrix.

The unit vector s of the exclusion vector is defined as:

in the formula, P_rIndicating machinePosition vector of the point on the arm closest to the obstacle, P_oA position vector representing a point on the obstacle closest to the robot.

Magnitude f of the repulsive vector_repIs defined as:

in the formula (d)₀A predetermined safety distance, d_minIs the minimum distance, k, between the robot and the obstacle_repIs the rejection coefficient. Preferably, the precise minimum distance between the robot arm and the person or obstacle is obtained by means of a minimum distance calculation module of V-REP. More preferably, d can be modified_minIs d_min-d_crWherein d is_crIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.

The exclusion vector is defined as:

v_rep＝f_reps (15)

the angular velocity of the mechanical arm joint determined by the repulsive vector

Is defined as:

in the formula, J_cpIs the jacobian matrix associated with the closest point of obstacle distance on the mechanical arm.

Finally, the joint angular velocity of the mechanical arm is defined as:

where k is the repulsive term weight coefficient. And determining a weight coefficient k of a repulsion vector in the artificial potential field by combining the position and the speed of the observed object at the next moment and the current position of the mechanical arm, and when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoiding is adopted, and the weight coefficient k is 1. When it is determined through the intention inference that the predicted object is approaching the robot arm at a slow speed or is moving away from the robot arm, the robot arm continues to operate, and the weight coefficient k takes 0. Preferably, the k value can be graded according to the speed, a reasonable value between 0 and 1 is given to k according to the speed grade of the predicted object close to the mechanical arm, namely the specific gravity of the rejection vector is reasonably compressed, and the collision avoidance is more intelligent.

After the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined. Preferably, the MATLAB platform is adopted to bear high calculation amount of the system, and the overall running speed is improved in a V-REP and MATLAB joint simulation mode. The algorithm composition framework of the whole system is shown in fig. 4.

In conclusion, the invention integrates technologies such as reinforcement learning, intention reasoning, artificial potential field and the like into the pipelining operation, and determines whether to adopt a potential field method to avoid the collision by judging whether the action intention of a person is defined as friendly cooperation or collision to occur on the premise of searching the optimal motion trail by utilizing the reinforcement learning and ensuring the working efficiency of the pipelining. By combining the method, the invention realizes the co-fusion of the human-machine-environment of the automatic assembly line. Meanwhile, the invention builds a 1:1 restored digital virtual system by means of V-REP, performs learning training on the A-DPPO and the artificial potential field through the virtual system, and then applies the found optimal path to the actual production line for testing, thereby avoiding the equipment damage and article loss which may occur in the process of utilizing a real object platform for debugging.

Claims

1. A man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning is characterized by comprising the following steps:

2. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that:

the step one, the digital virtual system restored by the pipeline 1:1 is built, and the method comprises the following steps:

two robots:

horizontal moving guide rail (c): the base connected with the second robot realizes the movement of the KR6 in the horizontal direction, thereby greatly widening the working space of the KR6, and the KR can be combined into a whole to be regarded as a 7-degree-of-freedom robot; a turntable (4): 6 articles can be hung on one side for storing parts;

3. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 2, characterized in that: implementing in the pipeline: the first robot takes the article off the rotary table and places the article at the head end of the conveyor belt, and then the first robot returns to the initial pose to prepare for grabbing the next article; after the articles pass through the infrared sensor in the middle of the conveyor belt, the second robot is matched with the horizontal guide rail to clamp the articles and is hung on the other surface of the rotary table, after the articles are finished, the second robot returns to the initial position to be ready for grabbing the next article, the process is repeated, and after all the articles are placed on the other surface of the rotary table, the rotary table is rotated to return to the original position.

4. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the second step comprises the following steps:

R_pos(D_EO,D_ET)＝f_ob(D_EO)+f_tr(D_EO,D_ET)

wherein, in the formula, D_EOFor the end of a robot armThe smaller the relative distance between the actuator and the obstacle, the more penalties, D_ETThe relative distance between the end effector of the robotic arm and the target point,

α is a constant, the sign [ ·]₊Is defined as:

the directional reward function is defined as:

in the formula (I), the compound is shown in the specification,

the azimuth reward function of the A-DPPO is defined as:

step 2-2: relative position and direction information of the mechanical arm end effector, the barrier and the target point are collected and stored as the current environment state S_tThe torque of each rotary joint of the robot arm is denoted by a_tAfter the value evaluation of the evaluation network, the strategy network is combined with S_tAnd value calculation of judgments a_tAnd performing an action to update the environmental status to S_t+1Calculating the reward value R of the current action by the orientation reward function_tThe above information is stored as a set of four-tuple information and noted (S)_t,a_t,R_t,S_t+1) The four-tuple information and the penalty term KL are respectively used for carrying out the operations on the policy network and the estimation networkTraining, correcting the motion deviation to seek the optimized motion track. The A-DPPO algorithm flow is shown in FIG. 2.

5. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the fourth step comprises:

in the formula (I), the compound is shown in the specification,

in the formula

The probability density function of the system metric model at the kth frame,

is the system noise at the time of the k-th frame,

in the k-th frameAnd (6) pixel points. Then further determining the target position as:

6. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 1, characterized in that: the fifth step comprises the following steps:

v_att＝-K_pe-K_i∫edt

Is defined as:

step 5-3: from the position vector P of the point on the arm closest to the obstacle_rAnd a position vector P of a point on the obstacle closest to the robot_oDefining a unit vector s of the exclusion vector; magnitude f of the repulsive vector_repIs defined as:

the exclusion vector is defined as:

v_rep＝f_reps

Is defined as:

step 5-4: the joint angular velocity of the mechanical arm is defined as:

7. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 6, characterized in that: d is_minModified as d_min-d_crWherein d is_crIndicating a critical distance that the robotic arm will always maintain at no less than the critical distance from the obstacle.

8. The human-computer co-fusion pipeline implementation method based on artificial potential field and reinforcement learning of claim 6, characterized in that: grading the value k according to the speed, giving k a reasonable value between 0 and 1 according to the speed grade of the predicted object approaching the mechanical arm, namely reasonably compressing the specific gravity of the exclusion vector, so that collision avoidance is more intelligent, and when the predicted object is determined to approach the mechanical arm at a higher speed through intention reasoning, avoiding is adopted, and the weight coefficient k is 1; when the predicted object is close to the mechanical arm or far away from the mechanical arm at a slower speed through intention reasoning, the mechanical arm continues to work, and the weight coefficient k is 0; after the action intention of the person is judged, whether the mechanical arm takes the influence of repulsive force caused by the approach of the person to the mechanical arm into consideration is determined, and whether the mechanical arm takes evasive action is determined.