CN111633647A - Multi-mode fusion robot sewing method and system based on deep reinforcement learning - Google Patents

Multi-mode fusion robot sewing method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN111633647A
CN111633647A CN202010453893.0A CN202010453893A CN111633647A CN 111633647 A CN111633647 A CN 111633647A CN 202010453893 A CN202010453893 A CN 202010453893A CN 111633647 A CN111633647 A CN 111633647A
Authority
CN
China
Prior art keywords
sewing
robot
fabric
network
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010453893.0A
Other languages
Chinese (zh)
Other versions
CN111633647B (en
Inventor
宋锐
付天宇
李凤鸣
李贻斌
田新诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010453893.0A priority Critical patent/CN111633647B/en
Publication of CN111633647A publication Critical patent/CN111633647A/en
Application granted granted Critical
Publication of CN111633647B publication Critical patent/CN111633647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • DTEXTILES; PAPER
    • D05SEWING; EMBROIDERING; TUFTING
    • D05BSEWING
    • D05B19/00Programme-controlled sewing machines
    • D05B19/02Sewing machines having electronic memory or microprocessor control unit
    • D05B19/04Sewing machines having electronic memory or microprocessor control unit characterised by memory aspects
    • D05B19/08Arrangements for inputting stitch or pattern data to memory ; Editing stitch or pattern data
    • DTEXTILES; PAPER
    • D05SEWING; EMBROIDERING; TUFTING
    • D05BSEWING
    • D05B19/00Programme-controlled sewing machines
    • D05B19/02Sewing machines having electronic memory or microprocessor control unit
    • D05B19/12Sewing machines having electronic memory or microprocessor control unit characterised by control of operation of machine
    • D05B19/16Control of workpiece movement, e.g. modulation of travel of feed dog

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Textile Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a multimode fusion robot sewing method and a multimode fusion robot sewing system based on deep reinforcement learning, which comprise the following steps: respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process; constructing and training a sewing operation skill learning network of the robot, inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm. The invention fuses the image information and the force sense information to jointly represent the state of the fabric in the sewing process, thereby representing the motion of the robot more accurately. The robot can actively adapt to the change of the environment by learning and mastering the operation skills, and the training result has generalization capability, thereby realizing the independent sewing operation of different fabrics.

Description

Multi-mode fusion robot sewing method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of industrial robots, in particular to multi-mode fusion robot sewing based on deep reinforcement learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Flexible web material handling is one of the most challenging problems in the field of robotic operating skills in recent years. Except for the problems of uncertain geometry, obstacle avoidance and the like in rigid material treatment, the anisotropy and the nonuniformity of the flexible fabric material bring difficulties to the sewing operation behavior of the robot. Most of existing robot sewing systems perform geometric modeling on fabrics to be sewn through machine vision, sewing actions are completed through a vision servo control robot, and once the fabrics deform, great influence is generated on operation.
In addition, the interactive information of the robot collaborative sewing system mostly comes from a single sensor, the data is one side, the information quantity is limited, and the influence of environmental noise is large.
Disclosure of Invention
In view of the above, the invention provides a multimode fusion robot sewing method and system based on deep reinforcement learning, which are based on a deep reinforcement learning framework, fuse visual force sense modal information, and can improve the decision-making capability of a robot for autonomously operating a flexible fabric.
In order to achieve the above purpose, in some embodiments, the following technical solutions are adopted:
a multimode fusion robot sewing method based on deep reinforcement learning comprises the following steps:
respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
constructing and training a sewing operation skill learning network of the robot, wherein the sewing operation skill learning network comprises a strategy network and an evaluation network; the input of the strategy network is fabric state image information and fabric tension state information, and the output is the action value of the mechanical arm; the input of the evaluation network is fabric state image information and a mechanical arm action value, and the output is a Q function value;
and inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm.
In other embodiments, the following technical solutions are adopted:
a multimode fusion robot sewing system based on deep reinforcement learning comprises:
the state perception module is used for respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
the fusion decision module is used for processing the information acquired by the state sensing module into the input of a robot sewing operation skill learning network and applying mechanical arm sewing actions output by the network to the sewing environment module;
and the sewing environment module is used for receiving and executing the actions of the mechanical arm and simultaneously feeding back the state image and the fabric tension information of the fabric in the changed finger sewing environment to the state sensing module.
In other embodiments, the following technical solutions are adopted:
a robot controller comprising a processor and a computer readable storage medium, the processor for implementing instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the multimode fusion robot sewing method based on the deep reinforcement learning.
A robot comprises a robot controller, wherein the robot controller adopts the robot sewing method based on the multi-mode dictionary control strategy to realize sewing of fabrics.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a scheme for solving the problem that a robot operates a flexible deformation object by combining a deep reinforcement learning method.
The invention fuses the image information and the force sense information to jointly represent the state of the fabric in the sewing process, thereby representing the motion of the robot more accurately. The robot can actively adapt to the change of the environment by learning and mastering the operation skills, and the training result has generalization capability, thereby realizing the independent sewing operation of different fabrics.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a schematic view of a sewing process of a multimode fusion robot based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a sewing operation skill learning network of a robot according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a policy network according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an evaluation network according to an embodiment of the present invention;
FIG. 5 is a schematic view of a multi-mode fusion robot sewing system based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
In one or more embodiments, a multimode fusion robot sewing method based on deep reinforcement learning is disclosed, and with reference to fig. 1, the method specifically includes the following processes:
step (1): respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
specifically, stitch state image information in the fabric sewing process is acquired through a local camera, fabric state image information in the fabric sewing process is acquired through a global camera, and fabric tension state information in the fabric sewing process is acquired through a six-dimensional force sensor.
Defining the sewing state of the mechanical arm as(s)I,sF);
In the formula, sIFor the RGB-D fabric status image of 640 x 480 x 4 in the sewing process, sF=(fx,fy,fz,τxyz) Is the fabric tension state in the sewing process, wherein, fx,fy,fzIs force, τxyzIs a moment.
Defining the sewing action a of the mechanical arm as (theta)123456);
In the formula, theta123456The angle of each joint of the six-axis mechanical arm is shown.
Step (2): and determining a reward function of the mechanical arm action based on the trace state image information, and using the reward function as a sewing quality evaluation sparse reward function r to evaluate the quality degree of the current mechanical arm sewing action.
Specifically, image filtering, image binarization, connected domain combination, Hough line segment detection and sewing stitch extraction are carried out in sequence on sewing stitch images identified by a local camera, and the extracted stitch slope l is calculated1Extracting local fabric boundaries through a Canny operator, and calculating the slope l of the extracted boundaries2The trace straightness l ═ l1-l2And taking the vertical distance between the sewing stitch and the local boundary of the fabric as a stitch translation amount d.
Thus, state s at time ttAction atThe reward function of (a) is:
Figure BDA0002508631240000051
in the formula I0Is the maximum threshold value of the stitch order, dminAs a minimum threshold for trace translation, dmaxIs the maximum threshold for trace translation. State stMeans the current sewing state of the fabric, st=(sI,sF) Wherein s isIAs a fabric image, sFIs the fabric tension.
When the sewing stitches are nearly parallel to the fabric boundary and are at a proper position away from the fabric boundary, the sewing is considered to be successful, otherwise, the sewing fails.
In this embodiment, when the stitch straightness is less than the maximum threshold l0And the translation amount of the stitch from the fabric boundary is dminAnd dmaxIn between, sewing was considered successful.
And (3): constructing and training a sewing operation skill learning network of the robot;
specifically, referring to fig. 2, the sewing operation skill learning network includes a strategy network and an evaluation network; the input of the strategy network is fabric state image information and fabric tension state information, and the output is the action value of the mechanical arm; the input of the evaluation network is fabric state image information and a mechanical arm action value, and the output is a Q function value.
The Q function is a state action value function that refers to the cumulative reward for the arm's action over a period of time. Is defined as:
Qμ(st,at)=E[r(st,at)+γQμ(st+1,μ(st+1))]
wherein s istA current sewing state, atSewing action, execution strategy mu, st+1To perform the sewing operation atThe next sewing state.
The policy network and the evaluation network are basic network models of the deep reinforcement learning algorithm under the framework of "policy-evaluation", the evaluation network is also commonly referred to as a value network, the target policy network corresponds to the current policy network, and the target evaluation network (target value network) corresponds to the current evaluation network (current value network).
The strategy network is a network model constructed by fitting and selecting a strategy mu aiming at the sewing action of the robot.
The evaluation network is a network model constructed by fitting a sewing action value function Q function.
Since a single network is unstable during training and learning, the corresponding current strategy network (with the parameter set to theta)μ) With the current evaluation network (parameter set to theta)Q) Setting a target policy network (parameter θ)μ′) With a target evaluation network (parameter θ)Q′) Collectively called as a target network, and the parameters of the target network are updated in the network learning and training process, and the updating method comprises the following steps:
θQ'←τθQ+(1-τ)θQ'
θμ'←τθμ+(1-τ)θμ'
in general, τ is usually 0.001.
Policy network mu (s | theta)μ) The network structure of (2) is shown in fig. 3, and the network input is a sewing state s ═(s)I,sF) The output is (θ) the motion value a of the robot arm123456) Network parameter is thetaμFabric state image sIBlending the fabric tension state s through two convolutional layers and one maximum pooling layerFAnd then obtaining an output action value through 3 full-connection layers, wherein the sizes of each convolution layer, each pooling layer and each full-connection layer are the same as those of each layer in the evaluation network structure. And the final layer of full connection adopts a tanh activation function, as shown in a formula (1).
Figure BDA0002508631240000061
the tanh function is a 0-mean value, which is more beneficial to improving the training efficiency.
Constructing a target policy network μ' (s | θ)μ′) Network architecture and policy network mu (s | theta)μ) The structure is the same and the same weights are initialized.
Evaluation network Q (s, a | θ)Q) The network structure of (2) is shown in fig. 4, and the network input is a sewing state s ═(s)I,sF) And action a ═ theta123456) The evaluation network (value network) outputs the reward value obtained after the mechanical arm sewing action is taken under the fabric state, namely the corresponding Q function value, and the network parameter is thetaQ
Sewn fabric state image sIAfter passing through two convolutional layers and one maximum pooling layer, the fabric tension state s is fusedFAnd then the output of the full connection layer 1 and the output of the full connection layer 2 are connected in series and then reach the full connection layer 3, and finally the output action Q value is obtained through the full connection layer 4. Wherein the convolution size of the convolution layer is 6 x 6, the convolution kernel number is 32, the pooling layer is maximum pooling, and the size is 4 x 4; the full connection layer comprises 512 units, and a Relu activation function is adopted, as shown in formula (2):
Figure BDA0002508631240000071
constructing a target evaluation network Q' (s, a | θ)Q′) The network structure is the same as the evaluation network structure and the same weight is initialized.
The strategy network selects sewing action according to the evaluation result of the value network, and the strategy mu'(s) obtained by the target strategy networki+1) And feeding back to the target evaluation network, and updating the target evaluation network by combining the current value network parameters.
The evaluation network (value network) is updated by constantly optimizing a loss function defined as:
Figure BDA0002508631240000072
wherein the predicted Q value yi=ri+γQ'(si+1,μ'(si+1μ')|θQ') And N represents the number of quadruples in the experience pool.
The strategy network adopts a Monte Carlo method to calculate the strategy gradient and updates:
Figure BDA0002508631240000073
constructing an experience pool R ═ s, a, R, s ', wherein s is the current sewing state of the mechanical arm, a is the action selected by the mechanical arm in the current sewing state, R is the reward obtained after the action a is executed, and s' is the sewing state after the mechanical arm executes the action a; the experience pool is used for storing the collected network training samples(s)t,at,rt,st+1)。
The training process for the robot sewing operation skill learning network is as follows:
step (3-1): initializing evaluation network parameter thetaQPolicy network parameter θμAnd copying the parameters to the corresponding target network parameters thetaQ′←θQ,θμ′←θμ
Step (3-2): the experience pool R memory space is initialized.
Step (3-3): t periodic training of the network is started. Since the training is based on the markov process, each training period includes N rounds of single-step training, and the number t of trained periods and the number N of trained rounds are set to 0 before the training is started.
Step (3-4): selecting an action a according to equation (3)tAnd transmitting the motion to the sewing environment to execute the motion.
Figure BDA0002508631240000081
In the formula (I), the compound is shown in the specification,
Figure BDA0002508631240000082
is a random process for generating random noise to improve the exploratory property of the strategy model, and the function adopts Ornstein-Uhlenbeck (OU) processThe generation mode is shown as formula (4):
dxt=θ(μ-xt)+σWt(4)
in the formula, xtFor the data to be generated, μ is the designed random variable expectation, W is the random variable generated by the Wiener process, which can be replaced by a simple random function.
Step (3-5): the robot arm performs action atObtaining the reward r and the next time state s in the sewing environmentt+1Then will(s)t,at,rt,st+1) Represented as a transition datum, is stored in the experience pool R.
Step (3-6): in an experience pool R, randomly sampling N transition data as a group of training data, taking a formula (5) as an objective function, and evaluating a network parameter theta by adopting an Adam algorithmQAnd (6) optimizing.
Figure BDA0002508631240000083
Step (3-7): taking the formula (6) as the gradient of the objective function, and adopting an Adam algorithm to measure a strategy network parameter thetaμAnd (6) optimizing.
Figure BDA0002508631240000084
Step (3-8): evaluating a parameter theta of the network for the target according to equation (7)Q′And a target policy network thetaμ′Updating is carried out;
Figure BDA0002508631240000091
in the formula, τ is generally 0.001.
Step (3-9): parameter thetaQ′And thetaμ′And after the updating is finished, N is equal to N +1, namely the training of the current round is finished, the next training round is started, and the steps 1 to 5 are repeated until N is equal to N.
Step (3-10): when the N rounds of single-step training are completed, t is t +1, namely the training of the next period is started,when T is T, the sewing operation skill learning network training is completed, μ' (s | θ)μ′) And obtaining a network training result.
And (4): using μ' (s | θ)μ′) The network is used as a mechanical arm sewing operation motion controller to control the angle of each joint of the mechanical arm; and inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm.
Example two
In one or more embodiments, a multimode fusion robot sewing system based on deep reinforcement learning is disclosed, which, with reference to fig. 5, specifically includes:
the state perception module is used for respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
the fusion decision module is used for processing the information acquired by the state sensing module into the input of a robot sewing operation skill learning network and applying mechanical arm sewing actions output by the network to the sewing environment module;
and the sewing environment module is used for receiving and executing the actions of the mechanical arm and simultaneously feeding back the changed environment state information to the state sensing module.
The state perception module is composed of a local camera, a six-dimensional force sensor and a global camera. The local camera is used for collecting stitch state images in the sewing process, the global camera is used for collecting fabric state images in the sewing process, and the six-dimensional force sensor is used for collecting fabric tension states in the sewing process.
The fusion decision module is used for processing the information collected by the state sensing module into robot sewing operation skill learning network input and applying mechanical arm sewing action output by the network to a sewing environment.
The sewing environment module receives the action of the mechanical arm, changes the state image of the fabric in the sewing environment and the tension state of the fabric, and feeds back the environment information to the state sensing module.
The system determines the strategy gradient based on depth, integrates force sense and visual multimode fabric state description, trains and learns in the constructed sewing environment, generates mechanical arm control quantity based on environment feedback, and further guides the mechanical arm to finish sewing action to obtain sewing skill.
The specific implementation process of each module corresponds to steps (1) to (4) in the first embodiment, and is not described again.
EXAMPLE III
In one or more embodiments, a robot controller is disclosed that includes a processor and a computer-readable storage medium, the processor to implement instructions; the computer-readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by the processor and executing the multimode fusion robot sewing method based on the deep reinforcement learning in the first embodiment, and for brevity, the detailed description is omitted.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In other embodiments, a robot is disclosed, which uses the multi-mode fusion robot sewing method based on deep reinforcement learning described in the first embodiment to sew a fabric.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A multimode fusion robot sewing method based on deep reinforcement learning is characterized by comprising the following steps:
respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
constructing and training a sewing operation skill learning network of the robot, wherein the sewing operation skill learning network comprises a strategy network and an evaluation network; the input of the strategy network is fabric state image information and fabric tension state information, and the output is the action value of the mechanical arm; the input of the evaluation network is fabric state image information and a mechanical arm action value, and the output is a Q function value;
and inputting the collected state information in the sewing process into the sewing operation skill learning network, and outputting the joint angle of the mechanical arm so as to control the action of the mechanical arm.
2. The multimode fusion robot sewing method based on the depth reinforcement learning as claimed in claim 1, wherein a reward function of the mechanical arm action is determined based on the stitch state image information to evaluate the quality of the current mechanical arm sewing action, and the specific process comprises:
after image filtering, image binarization and connected domain combination are carried out on the stitch state image information, Hough line segment detection is carried out to extract sewing stitches, and the slope of the extracted stitches is calculated;
extracting a local fabric boundary through a Canny operator, and calculating the slope of the extracted boundary and the stitch straightness;
taking the vertical distance between the sewing stitch and the local boundary of the fabric as the stitch translation amount;
determining the current fabric sewing state s based on the stitch straightness and the range of the stitch translation amounttNext, the reward function for the robot arm action.
3. The multi-mode fusion robot sewing method based on the depth reinforcement learning as claimed in claim 2, wherein when the sewing stitch straightness is less than the maximum threshold l0And the translation amount of the stitch from the fabric boundary is dminAnd dmaxIn between, it is considered to be sewnAnd (4) working.
4. The multi-mode fusion robot sewing method based on deep reinforcement learning as claimed in claim 3, wherein at time t, at state stIn the following, action atThe reward function of (a) is:
Figure FDA0002508631230000021
wherein l0Is the maximum threshold value of the stitch order, dminIs a minimum threshold for trace translation, dmaxIs the maximum threshold for trace translation.
5. The deep reinforcement learning-based multimode fusion robot sewing method according to claim 1, wherein the strategy network comprises:
the fabric state image information is fused with the fabric tension state information through two convolution layers and a maximum pooling layer, and then is subjected to 3 full-connection layers to obtain an output action value.
6. The multi-mode fusion robot sewing method based on deep reinforcement learning as claimed in claim 1, wherein the evaluation network comprises:
sewing fabric state image information, after passing through two convolution layers and one maximum pooling layer, fusing fabric tension state information, passing through a first full-connection layer to reach a third full-connection layer, enabling a mechanical arm action a to pass through the second full-connection layer to reach the third full-connection layer, connecting outputs of the first full-connection layer and the second full-connection layer in series, then reaching the third full-connection layer, passing through a fourth full-connection layer, and finally obtaining an output Q value.
7. The multimode fusion robot sewing method based on deep reinforcement learning according to claim 1, wherein the training process for the sewing operation skill learning network comprises:
initializing parameters of a sewing operation skill learning network;
setting and executing mechanical arm action at
Awarding a prize r and a next time status s in a sewing environmentt+1Then will(s)t,at,rt,st+1) The data is expressed as a transition data and stored in an experience pool R;
in an experience pool R, randomly sampling N transition data to serve as a group of training data;
respectively optimizing and updating the strategy network parameters and the evaluation network parameters by adopting an Adam algorithm;
when the single-step training of N rounds is completed, starting the training of the next period; and obtaining a training result until the training of the set period is completed.
8. The utility model provides a multimode fuses robot system of making based on degree of depth reinforcement learning which characterized in that includes:
the state perception module is used for respectively acquiring fabric state image information, stitch state image information and fabric tension state information in the sewing process;
the fusion decision module is used for processing the information acquired by the state sensing module into the input of a robot sewing operation skill learning network and applying mechanical arm sewing actions output by the network to the sewing environment module;
and the sewing environment module is used for receiving and executing the actions of the mechanical arm and simultaneously feeding back the state image and the fabric tension information of the fabric in the changed finger sewing environment to the state sensing module.
9. A robot controller comprising a processor and a computer readable storage medium, the processor for implementing instructions; the computer-readable storage medium is configured to store a plurality of instructions, wherein the instructions are adapted to be loaded by a processor and to perform the method for multimodal fusion robot based on deep reinforcement learning of any one of claims 1-7.
10. A robot comprising a robot controller, wherein the robot controller adopts the robot sewing method based on the multi-mode dictionary control strategy according to any one of claims 1 to 7 to realize sewing of fabrics.
CN202010453893.0A 2020-05-26 2020-05-26 Multi-mode fusion robot sewing method and system based on deep reinforcement learning Active CN111633647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010453893.0A CN111633647B (en) 2020-05-26 2020-05-26 Multi-mode fusion robot sewing method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010453893.0A CN111633647B (en) 2020-05-26 2020-05-26 Multi-mode fusion robot sewing method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111633647A true CN111633647A (en) 2020-09-08
CN111633647B CN111633647B (en) 2021-06-22

Family

ID=72324996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010453893.0A Active CN111633647B (en) 2020-05-26 2020-05-26 Multi-mode fusion robot sewing method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111633647B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329571A (en) * 2020-10-27 2021-02-05 同济大学 Self-adaptive human body posture optimization method based on posture quality evaluation
CN112894808A (en) * 2021-01-15 2021-06-04 山东大学 Robot screwing valve system and method based on deep reinforcement learning
CN113011526A (en) * 2021-04-23 2021-06-22 华南理工大学 Robot skill learning method and system based on reinforcement learning and unsupervised learning
CN113151989A (en) * 2021-04-19 2021-07-23 山东大学 Cloth processing method, cloth processing system and sewing robot
CN114660934A (en) * 2022-03-03 2022-06-24 西北工业大学 Mechanical arm autonomous operation strategy learning method based on vision-touch fusion
CN114723831A (en) * 2022-03-25 2022-07-08 山东大学 Heuristic-based robot flexible fabric flattening method and system
WO2023041022A1 (en) * 2021-09-17 2023-03-23 Huawei Technologies Co., Ltd. System and method for computer-assisted design of inductor for voltage-controlled oscillator

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN109457398A (en) * 2018-12-05 2019-03-12 郑州轻工业学院 Sweater automatic sewing method based on machine vision perception
CN109543823A (en) * 2018-11-30 2019-03-29 山东大学 A kind of flexible assembly system and method based on multimodal information description
CN109629122A (en) * 2018-12-25 2019-04-16 珞石(山东)智能科技有限公司 A kind of robot method of sewing based on machine vision
CN109840552A (en) * 2019-01-14 2019-06-04 湖北工业大学 A kind of dynamic image classification method
CN111005163A (en) * 2019-12-30 2020-04-14 深圳市越疆科技有限公司 Automatic leather sewing method, device, equipment and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN109543823A (en) * 2018-11-30 2019-03-29 山东大学 A kind of flexible assembly system and method based on multimodal information description
CN109457398A (en) * 2018-12-05 2019-03-12 郑州轻工业学院 Sweater automatic sewing method based on machine vision perception
CN109629122A (en) * 2018-12-25 2019-04-16 珞石(山东)智能科技有限公司 A kind of robot method of sewing based on machine vision
CN109840552A (en) * 2019-01-14 2019-06-04 湖北工业大学 A kind of dynamic image classification method
CN111005163A (en) * 2019-12-30 2020-04-14 深圳市越疆科技有限公司 Automatic leather sewing method, device, equipment and computer readable storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329571A (en) * 2020-10-27 2021-02-05 同济大学 Self-adaptive human body posture optimization method based on posture quality evaluation
CN112329571B (en) * 2020-10-27 2022-12-16 同济大学 Self-adaptive human body posture optimization method based on posture quality evaluation
CN112894808A (en) * 2021-01-15 2021-06-04 山东大学 Robot screwing valve system and method based on deep reinforcement learning
CN113151989A (en) * 2021-04-19 2021-07-23 山东大学 Cloth processing method, cloth processing system and sewing robot
CN113011526A (en) * 2021-04-23 2021-06-22 华南理工大学 Robot skill learning method and system based on reinforcement learning and unsupervised learning
CN113011526B (en) * 2021-04-23 2024-04-26 华南理工大学 Robot skill learning method and system based on reinforcement learning and unsupervised learning
WO2023041022A1 (en) * 2021-09-17 2023-03-23 Huawei Technologies Co., Ltd. System and method for computer-assisted design of inductor for voltage-controlled oscillator
CN114660934A (en) * 2022-03-03 2022-06-24 西北工业大学 Mechanical arm autonomous operation strategy learning method based on vision-touch fusion
CN114660934B (en) * 2022-03-03 2024-03-01 西北工业大学 Mechanical arm autonomous operation strategy learning method based on vision-touch fusion
CN114723831A (en) * 2022-03-25 2022-07-08 山东大学 Heuristic-based robot flexible fabric flattening method and system

Also Published As

Publication number Publication date
CN111633647B (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN111633647B (en) Multi-mode fusion robot sewing method and system based on deep reinforcement learning
CN109543823B (en) Flexible assembly system and method based on multi-mode information description
JP6810087B2 (en) Machine learning device, robot control device and robot vision system using machine learning device, and machine learning method
CN111881772B (en) Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
Meyes et al. Motion planning for industrial robots using reinforcement learning
CN107403426B (en) Target object detection method and device
CN111144580B (en) Hierarchical reinforcement learning training method and device based on imitation learning
Kartoun et al. A human-robot collaborative reinforcement learning algorithm
US10807234B2 (en) Component supply device and machine learning device
CN110253577B (en) Weak-rigidity part assembling system and method based on robot operation technology
US11897066B2 (en) Simulation apparatus
JP7458741B2 (en) Robot control device and its control method and program
US11059180B2 (en) Control device and machine learning device
US10549422B2 (en) Robot controller, machine learning device and machine learning method
Moosmann et al. Separating entangled workpieces in random bin picking using deep reinforcement learning
Li et al. Navigation of mobile robots based on deep reinforcement learning: Reward function optimization and knowledge transfer
CN115761905A (en) Diver action identification method based on skeleton joint points
CN116460843A (en) Multi-robot collaborative grabbing method and system based on meta heuristic algorithm
CN116702872A (en) Reinforced learning method and device based on offline pre-training state transition transducer model
CN114571456B (en) Electric connector assembling method and system based on robot skill learning
CN113977583B (en) Robot rapid assembly method and system based on near-end strategy optimization algorithm
CN114131149B (en) Laser vision weld joint tracking system, equipment and storage medium based on CenterNet
Paudel Learning for robot decision making under distribution shift: A survey
Konidaris et al. Sensorimotor abstraction selection for efficient, autonomous robot skill acquisition
SunWoo et al. Comparison of deep reinforcement learning algorithms: Path Search in Grid World

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant