CN111633647A

CN111633647A - Multi-mode fusion robot sewing method and system based on deep reinforcement learning

Info

Publication number: CN111633647A
Application number: CN202010453893.0A
Authority: CN
Inventors: 宋锐; 付天宇; 李凤鸣; 李贻斌; 田新诚
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-09-08
Anticipated expiration: 2040-05-26
Also published as: CN111633647B

Abstract

The invention discloses a multimode fusion robot sewing method and a multimode fusion robot sewing system based on deep reinforcement learning, which comprise the following steps: respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process; constructing and training a sewing operation skill learning network of the robot, inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm. The invention fuses the image information and the force sense information to jointly represent the state of the fabric in the sewing process, thereby representing the motion of the robot more accurately. The robot can actively adapt to the change of the environment by learning and mastering the operation skills, and the training result has generalization capability, thereby realizing the independent sewing operation of different fabrics.

Description

Multi-mode fusion robot sewing method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of industrial robots, in particular to multi-mode fusion robot sewing based on deep reinforcement learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Flexible web material handling is one of the most challenging problems in the field of robotic operating skills in recent years. Except for the problems of uncertain geometry, obstacle avoidance and the like in rigid material treatment, the anisotropy and the nonuniformity of the flexible fabric material bring difficulties to the sewing operation behavior of the robot. Most of existing robot sewing systems perform geometric modeling on fabrics to be sewn through machine vision, sewing actions are completed through a vision servo control robot, and once the fabrics deform, great influence is generated on operation.

In addition, the interactive information of the robot collaborative sewing system mostly comes from a single sensor, the data is one side, the information quantity is limited, and the influence of environmental noise is large.

Disclosure of Invention

In view of the above, the invention provides a multimode fusion robot sewing method and system based on deep reinforcement learning, which are based on a deep reinforcement learning framework, fuse visual force sense modal information, and can improve the decision-making capability of a robot for autonomously operating a flexible fabric.

In order to achieve the above purpose, in some embodiments, the following technical solutions are adopted:

a multimode fusion robot sewing method based on deep reinforcement learning comprises the following steps:

respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;

constructing and training a sewing operation skill learning network of the robot, wherein the sewing operation skill learning network comprises a strategy network and an evaluation network; the input of the strategy network is fabric state image information and fabric tension state information, and the output is the action value of the mechanical arm; the input of the evaluation network is fabric state image information and a mechanical arm action value, and the output is a Q function value;

and inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm.

In other embodiments, the following technical solutions are adopted:

a multimode fusion robot sewing system based on deep reinforcement learning comprises:

the state perception module is used for respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;

the fusion decision module is used for processing the information acquired by the state sensing module into the input of a robot sewing operation skill learning network and applying mechanical arm sewing actions output by the network to the sewing environment module;

and the sewing environment module is used for receiving and executing the actions of the mechanical arm and simultaneously feeding back the state image and the fabric tension information of the fabric in the changed finger sewing environment to the state sensing module.

In other embodiments, the following technical solutions are adopted:

a robot controller comprising a processor and a computer readable storage medium, the processor for implementing instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the multimode fusion robot sewing method based on the deep reinforcement learning.

A robot comprises a robot controller, wherein the robot controller adopts the robot sewing method based on the multi-mode dictionary control strategy to realize sewing of fabrics.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a scheme for solving the problem that a robot operates a flexible deformation object by combining a deep reinforcement learning method.

The invention fuses the image information and the force sense information to jointly represent the state of the fabric in the sewing process, thereby representing the motion of the robot more accurately. The robot can actively adapt to the change of the environment by learning and mastering the operation skills, and the training result has generalization capability, thereby realizing the independent sewing operation of different fabrics.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a schematic view of a sewing process of a multimode fusion robot based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a sewing operation skill learning network of a robot according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a policy network according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an evaluation network according to an embodiment of the present invention;

FIG. 5 is a schematic view of a multi-mode fusion robot sewing system based on deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Example one

In one or more embodiments, a multimode fusion robot sewing method based on deep reinforcement learning is disclosed, and with reference to fig. 1, the method specifically includes the following processes:

step (1): respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;

specifically, stitch state image information in the fabric sewing process is acquired through a local camera, fabric state image information in the fabric sewing process is acquired through a global camera, and fabric tension state information in the fabric sewing process is acquired through a six-dimensional force sensor.

Defining the sewing state of the mechanical arm as(s)_I,s_F)；

In the formula, s_IFor the RGB-D fabric status image of 640 x 480 x 4 in the sewing process, s_F＝(f_x,f_y,f_z，τ_x,τ_y,τ_z) Is the fabric tension state in the sewing process, wherein, f_x,f_y,f_zIs force, τ_x,τ_y,τ_zIs a moment.

Defining the sewing action a of the mechanical arm as (theta)₁,θ₂,θ₃,θ₄,θ₅,θ₆)；

In the formula, theta₁,θ₂,θ₃,θ₄,θ₅,θ₆The angle of each joint of the six-axis mechanical arm is shown.

Step (2): and determining a reward function of the mechanical arm action based on the trace state image information, and using the reward function as a sewing quality evaluation sparse reward function r to evaluate the quality degree of the current mechanical arm sewing action.

Specifically, image filtering, image binarization, connected domain combination, Hough line segment detection and sewing stitch extraction are carried out in sequence on sewing stitch images identified by a local camera, and the extracted stitch slope l is calculated₁Extracting local fabric boundaries through a Canny operator, and calculating the slope l of the extracted boundaries₂The trace straightness l ═ l₁-l₂And taking the vertical distance between the sewing stitch and the local boundary of the fabric as a stitch translation amount d.

Thus, state s at time t_tAction a_tThe reward function of (a) is:

in the formula I₀Is the maximum threshold value of the stitch order, d_minAs a minimum threshold for trace translation, d_maxIs the maximum threshold for trace translation. State s_tMeans the current sewing state of the fabric, s_t＝(s_I,s_F) Wherein s is_IAs a fabric image, s_FIs the fabric tension.

When the sewing stitches are nearly parallel to the fabric boundary and are at a proper position away from the fabric boundary, the sewing is considered to be successful, otherwise, the sewing fails.

In this embodiment, when the stitch straightness is less than the maximum threshold l₀And the translation amount of the stitch from the fabric boundary is d_minAnd d_maxIn between, sewing was considered successful.

And (3): constructing and training a sewing operation skill learning network of the robot;

specifically, referring to fig. 2, the sewing operation skill learning network includes a strategy network and an evaluation network; the input of the strategy network is fabric state image information and fabric tension state information, and the output is the action value of the mechanical arm; the input of the evaluation network is fabric state image information and a mechanical arm action value, and the output is a Q function value.

The Q function is a state action value function that refers to the cumulative reward for the arm's action over a period of time. Is defined as:

Q^μ(s_t,a_t)＝E[r(s_t,a_t)+γQ^μ(s_t+1,μ(s_t+1))]

wherein s is_tA current sewing state, a_tSewing action, execution strategy mu, s_t+1To perform the sewing operation a_tThe next sewing state.

The policy network and the evaluation network are basic network models of the deep reinforcement learning algorithm under the framework of "policy-evaluation", the evaluation network is also commonly referred to as a value network, the target policy network corresponds to the current policy network, and the target evaluation network (target value network) corresponds to the current evaluation network (current value network).

The strategy network is a network model constructed by fitting and selecting a strategy mu aiming at the sewing action of the robot.

The evaluation network is a network model constructed by fitting a sewing action value function Q function.

Since a single network is unstable during training and learning, the corresponding current strategy network (with the parameter set to theta)^μ) With the current evaluation network (parameter set to theta)^Q) Setting a target policy network (parameter θ)^μ′) With a target evaluation network (parameter θ)^Q′) Collectively called as a target network, and the parameters of the target network are updated in the network learning and training process, and the updating method comprises the following steps:

θ^Q'←τθ^Q+(1-τ)θ^Q'

θ^μ'←τθ^μ+(1-τ)θ^μ'

in general, τ is usually 0.001.

Policy network mu (s | theta)^μ) The network structure of (2) is shown in fig. 3, and the network input is a sewing state s ═(s)_I,s_F) The output is (θ) the motion value a of the robot arm₁,θ₂,θ₃,θ₄,θ₅,θ₆) Network parameter is theta^μFabric state image s_IBlending the fabric tension state s through two convolutional layers and one maximum pooling layer_FAnd then obtaining an output action value through 3 full-connection layers, wherein the sizes of each convolution layer, each pooling layer and each full-connection layer are the same as those of each layer in the evaluation network structure. And the final layer of full connection adopts a tanh activation function, as shown in a formula (1).

the tanh function is a 0-mean value, which is more beneficial to improving the training efficiency.

Constructing a target policy network μ' (s | θ)^μ′) Network architecture and policy network mu (s | theta)^μ) The structure is the same and the same weights are initialized.

Evaluation network Q (s, a | θ)^Q) The network structure of (2) is shown in fig. 4, and the network input is a sewing state s ═(s)_I,s_F) And action a ═ theta₁,θ₂,θ₃,θ₄,θ₅,θ₆) The evaluation network (value network) outputs the reward value obtained after the mechanical arm sewing action is taken under the fabric state, namely the corresponding Q function value, and the network parameter is theta^Q。

Sewn fabric state image s_IAfter passing through two convolutional layers and one maximum pooling layer, the fabric tension state s is fused_FAnd then the output of the full connection layer 1 and the output of the full connection layer 2 are connected in series and then reach the full connection layer 3, and finally the output action Q value is obtained through the full connection layer 4. Wherein the convolution size of the convolution layer is 6 x 6, the convolution kernel number is 32, the pooling layer is maximum pooling, and the size is 4 x 4; the full connection layer comprises 512 units, and a Relu activation function is adopted, as shown in formula (2):

constructing a target evaluation network Q' (s, a | θ)^Q′) The network structure is the same as the evaluation network structure and the same weight is initialized.

The strategy network selects sewing action according to the evaluation result of the value network, and the strategy mu'(s) obtained by the target strategy network_i+1) And feeding back to the target evaluation network, and updating the target evaluation network by combining the current value network parameters.

The evaluation network (value network) is updated by constantly optimizing a loss function defined as:

wherein the predicted Q value y_i＝r_i+γQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q') And N represents the number of quadruples in the experience pool.

The strategy network adopts a Monte Carlo method to calculate the strategy gradient and updates:

constructing an experience pool R ═ s, a, R, s ', wherein s is the current sewing state of the mechanical arm, a is the action selected by the mechanical arm in the current sewing state, R is the reward obtained after the action a is executed, and s' is the sewing state after the mechanical arm executes the action a; the experience pool is used for storing the collected network training samples(s)_t,a_t,r_t,s_t+1)。

The training process for the robot sewing operation skill learning network is as follows:

step (3-1): initializing evaluation network parameter theta^QPolicy network parameter θ^μAnd copying the parameters to the corresponding target network parameters theta^Q′←θ^Q，θ^μ′←θ^μ。

Step (3-2): the experience pool R memory space is initialized.

Step (3-3): t periodic training of the network is started. Since the training is based on the markov process, each training period includes N rounds of single-step training, and the number t of trained periods and the number N of trained rounds are set to 0 before the training is started.

Step (3-4): selecting an action a according to equation (3)_tAnd transmitting the motion to the sewing environment to execute the motion.

In the formula (I), the compound is shown in the specification,

is a random process for generating random noise to improve the exploratory property of the strategy model, and the function adopts Ornstein-Uhlenbeck (OU) processThe generation mode is shown as formula (4):

dx_t＝θ(μ-x_t)+σW_t(4)

in the formula, x_tFor the data to be generated, μ is the designed random variable expectation, W is the random variable generated by the Wiener process, which can be replaced by a simple random function.

Step (3-5): the robot arm performs action a_tObtaining the reward r and the next time state s in the sewing environment_t+1Then will(s)_t,a_t,r_t,s_t+1) Represented as a transition datum, is stored in the experience pool R.

Step (3-6): in an experience pool R, randomly sampling N transition data as a group of training data, taking a formula (5) as an objective function, and evaluating a network parameter theta by adopting an Adam algorithm^QAnd (6) optimizing.

Step (3-7): taking the formula (6) as the gradient of the objective function, and adopting an Adam algorithm to measure a strategy network parameter theta^μAnd (6) optimizing.

Step (3-8): evaluating a parameter theta of the network for the target according to equation (7)^Q′And a target policy network theta^μ′Updating is carried out;

in the formula, τ is generally 0.001.

Step (3-9): parameter theta^Q′And theta^μ′And after the updating is finished, N is equal to N +1, namely the training of the current round is finished, the next training round is started, and the steps 1 to 5 are repeated until N is equal to N.

Step (3-10): when the N rounds of single-step training are completed, t is t +1, namely the training of the next period is started,when T is T, the sewing operation skill learning network training is completed, μ' (s | θ)^μ′) And obtaining a network training result.

And (4): using μ' (s | θ)^μ′) The network is used as a mechanical arm sewing operation motion controller to control the angle of each joint of the mechanical arm; and inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm.

Example two

In one or more embodiments, a multimode fusion robot sewing system based on deep reinforcement learning is disclosed, which, with reference to fig. 5, specifically includes:

and the sewing environment module is used for receiving and executing the actions of the mechanical arm and simultaneously feeding back the changed environment state information to the state sensing module.

The state perception module is composed of a local camera, a six-dimensional force sensor and a global camera. The local camera is used for collecting stitch state images in the sewing process, the global camera is used for collecting fabric state images in the sewing process, and the six-dimensional force sensor is used for collecting fabric tension states in the sewing process.

The fusion decision module is used for processing the information collected by the state sensing module into robot sewing operation skill learning network input and applying mechanical arm sewing action output by the network to a sewing environment.

The sewing environment module receives the action of the mechanical arm, changes the state image of the fabric in the sewing environment and the tension state of the fabric, and feeds back the environment information to the state sensing module.

The system determines the strategy gradient based on depth, integrates force sense and visual multimode fabric state description, trains and learns in the constructed sewing environment, generates mechanical arm control quantity based on environment feedback, and further guides the mechanical arm to finish sewing action to obtain sewing skill.

The specific implementation process of each module corresponds to steps (1) to (4) in the first embodiment, and is not described again.

EXAMPLE III

In one or more embodiments, a robot controller is disclosed that includes a processor and a computer-readable storage medium, the processor to implement instructions; the computer-readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by the processor and executing the multimode fusion robot sewing method based on the deep reinforcement learning in the first embodiment, and for brevity, the detailed description is omitted.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In other embodiments, a robot is disclosed, which uses the multi-mode fusion robot sewing method based on deep reinforcement learning described in the first embodiment to sew a fabric.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A multimode fusion robot sewing method based on deep reinforcement learning is characterized by comprising the following steps:

2. The multimode fusion robot sewing method based on the depth reinforcement learning as claimed in claim 1, wherein a reward function of the mechanical arm action is determined based on the stitch state image information to evaluate the quality of the current mechanical arm sewing action, and the specific process comprises:

after image filtering, image binarization and connected domain combination are carried out on the stitch state image information, Hough line segment detection is carried out to extract sewing stitches, and the slope of the extracted stitches is calculated;

extracting a local fabric boundary through a Canny operator, and calculating the slope of the extracted boundary and the stitch straightness;

taking the vertical distance between the sewing stitch and the local boundary of the fabric as the stitch translation amount;

determining the current fabric sewing state s based on the stitch straightness and the range of the stitch translation amount_tNext, the reward function for the robot arm action.

3. The multi-mode fusion robot sewing method based on the depth reinforcement learning as claimed in claim 2, wherein when the sewing stitch straightness is less than the maximum threshold l₀And the translation amount of the stitch from the fabric boundary is d_minAnd d_maxIn between, it is considered to be sewnAnd (4) working.

4. The multi-mode fusion robot sewing method based on deep reinforcement learning as claimed in claim 3, wherein at time t, at state s_tIn the following, action a_tThe reward function of (a) is:

wherein l₀Is the maximum threshold value of the stitch order, d_minIs a minimum threshold for trace translation, d_maxIs the maximum threshold for trace translation.

5. The deep reinforcement learning-based multimode fusion robot sewing method according to claim 1, wherein the strategy network comprises:

the fabric state image information is fused with the fabric tension state information through two convolution layers and a maximum pooling layer, and then is subjected to 3 full-connection layers to obtain an output action value.

6. The multi-mode fusion robot sewing method based on deep reinforcement learning as claimed in claim 1, wherein the evaluation network comprises:

sewing fabric state image information, after passing through two convolution layers and one maximum pooling layer, fusing fabric tension state information, passing through a first full-connection layer to reach a third full-connection layer, enabling a mechanical arm action a to pass through the second full-connection layer to reach the third full-connection layer, connecting outputs of the first full-connection layer and the second full-connection layer in series, then reaching the third full-connection layer, passing through a fourth full-connection layer, and finally obtaining an output Q value.

7. The multimode fusion robot sewing method based on deep reinforcement learning according to claim 1, wherein the training process for the sewing operation skill learning network comprises:

initializing parameters of a sewing operation skill learning network;

setting and executing mechanical arm action a_t；

Awarding a prize r and a next time status s in a sewing environment_t+1Then will(s)_t,a_t,r_t,s_t+1) The data is expressed as a transition data and stored in an experience pool R;

in an experience pool R, randomly sampling N transition data to serve as a group of training data;

respectively optimizing and updating the strategy network parameters and the evaluation network parameters by adopting an Adam algorithm;

when the single-step training of N rounds is completed, starting the training of the next period; and obtaining a training result until the training of the set period is completed.

8. The utility model provides a multimode fuses robot system of making based on degree of depth reinforcement learning which characterized in that includes:

9. A robot controller comprising a processor and a computer readable storage medium, the processor for implementing instructions; the computer-readable storage medium is configured to store a plurality of instructions, wherein the instructions are adapted to be loaded by a processor and to perform the method for multimodal fusion robot based on deep reinforcement learning of any one of claims 1-7.

10. A robot comprising a robot controller, wherein the robot controller adopts the robot sewing method based on the multi-mode dictionary control strategy according to any one of claims 1 to 7 to realize sewing of fabrics.