WO2023020129A1 - Control method and apparatus for work machine, and work machine - Google Patents

Control method and apparatus for work machine, and work machine Download PDF

Info

Publication number
WO2023020129A1
WO2023020129A1 PCT/CN2022/102918 CN2022102918W WO2023020129A1 WO 2023020129 A1 WO2023020129 A1 WO 2023020129A1 CN 2022102918 W CN2022102918 W CN 2022102918W WO 2023020129 A1 WO2023020129 A1 WO 2023020129A1
Authority
WO
WIPO (PCT)
Prior art keywords
decision
behavior
making
state
sample
Prior art date
Application number
PCT/CN2022/102918
Other languages
French (fr)
Chinese (zh)
Inventor
王传宇
胡立辛
曾超
Original Assignee
上海三一重机股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海三一重机股份有限公司 filed Critical 上海三一重机股份有限公司
Priority to US18/065,804 priority Critical patent/US20230112014A1/en
Publication of WO2023020129A1 publication Critical patent/WO2023020129A1/en

Links

Images

Classifications

    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F9/00Component parts of dredgers or soil-shifting machines, not restricted to one of the kinds covered by groups E02F3/00 - E02F7/00
    • E02F9/20Drives; Control devices
    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F9/00Component parts of dredgers or soil-shifting machines, not restricted to one of the kinds covered by groups E02F3/00 - E02F7/00
    • E02F9/20Drives; Control devices
    • E02F9/2025Particular purposes of control systems not otherwise provided for
    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F9/00Component parts of dredgers or soil-shifting machines, not restricted to one of the kinds covered by groups E02F3/00 - E02F7/00
    • E02F9/26Indicating devices
    • E02F9/264Sensors and their calibration for indicating the position of the work tool
    • E02F9/265Sensors and their calibration for indicating the position of the work tool with follow-up actions (e.g. control signals sent to actuate the work tool)
    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F3/00Dredgers; Soil-shifting machines
    • E02F3/04Dredgers; Soil-shifting machines mechanically-driven
    • E02F3/28Dredgers; Soil-shifting machines mechanically-driven with digging tools mounted on a dipper- or bucket-arm, i.e. there is either one arm or a pair of arms, e.g. dippers, buckets
    • E02F3/36Component parts
    • E02F3/42Drives for dippers, buckets, dipper-arms or bucket-arms
    • E02F3/43Control of dipper or bucket position; Control of sequence of drive operations
    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F3/00Dredgers; Soil-shifting machines
    • E02F3/04Dredgers; Soil-shifting machines mechanically-driven
    • E02F3/28Dredgers; Soil-shifting machines mechanically-driven with digging tools mounted on a dipper- or bucket-arm, i.e. there is either one arm or a pair of arms, e.g. dippers, buckets
    • E02F3/36Component parts
    • E02F3/42Drives for dippers, buckets, dipper-arms or bucket-arms
    • E02F3/43Control of dipper or bucket position; Control of sequence of drive operations
    • E02F3/435Control of dipper or bucket position; Control of sequence of drive operations for dipper-arms, backhoes or the like
    • E02F3/437Control of dipper or bucket position; Control of sequence of drive operations for dipper-arms, backhoes or the like providing automatic sequences of movements, e.g. linear excavation, keeping dipper angle constant
    • EFIXED CONSTRUCTIONS
    • E02HYDRAULIC ENGINEERING; FOUNDATIONS; SOIL SHIFTING
    • E02FDREDGING; SOIL-SHIFTING
    • E02F9/00Component parts of dredgers or soil-shifting machines, not restricted to one of the kinds covered by groups E02F3/00 - E02F7/00
    • E02F9/20Drives; Control devices
    • E02F9/2025Particular purposes of control systems not otherwise provided for
    • E02F9/2045Guiding machines along a predetermined path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of mechanical engineering, in particular to a method and device for controlling an operating machine and an operating machine.
  • the operating machine control method, device and operating machine provided by this application are used to solve the problem of establishing an accurate control model for the operating machine in each operating state and performing a large number of debugging when intelligently controlling the operating machine in the prior art, which takes a long time. Costly technical issues.
  • the present application provides a method for controlling an operating machine, including:
  • control the working machine Based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work;
  • the state behavior decision-making model is obtained after training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site are determined; the actual position curve is determined based on the sample decision behavior.
  • the reward value is determined based on the following steps:
  • the reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
  • the determination of the reward value based on the distance between each position point and the corresponding position point and the position weight of each position point includes:
  • the reward value is determined based on the coincidence degree and the moving speed.
  • the state behavior decision-making model is trained based on the following steps:
  • the last decision-making behavior Using the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, respectively;
  • an initial model is trained to obtain the state-behavior decision-making model.
  • the initial model is trained based on the sample operation state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior to obtain the state-behavior decision-making model, include:
  • the training is stopped, and the trained initial model is used as the state behavior decision model.
  • the working machine is an excavator
  • the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
  • the present application also provides a working machine control device, including:
  • an acquisition unit configured to acquire the current working state of the working machine
  • a decision-making unit configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model
  • a control unit configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior
  • the state behavior decision-making model is obtained after intensive training based on the sample operation state of the operation machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site in the center are determined; the actual position curve is determined based on the sample decision behavior.
  • the present application also provides an operating machine, including the above-mentioned operating machine control device.
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the operating machine control method is realized A step of.
  • the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the working machine control method are realized.
  • the operating machine control method, device, and operating machine provided in this application perform intensive learning through the sample operating state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operating machine, and the obtained state behavior decision-making model can be based on the operating machine.
  • the current working state determines the current decision-making behavior of the working machine, and controls the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior.
  • the reward value is determined according to the actual position curve and the target position curve of the working part in the working machine, so that the work
  • the working part of the machine can be constructed according to the set target position curve, and there is no need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, reduces the debugging cost, and improves the Intelligent construction level of operating machinery.
  • FIG. 1 is a schematic flow chart of the operation machine control method provided by the present application.
  • Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application
  • FIG. 3 is a schematic diagram of deployment of the excavator leveling slope control model provided by the present application.
  • Fig. 4 is a schematic structural diagram of the operating machine control device provided by the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by the present application.
  • Reinforcement learning is an intelligent algorithm that trains artificial intelligence models based on continuous "trial and error” and rewards for superior strategies.
  • the technical solution in the embodiment of this application is to let the excavator start from some random initial inputs to perform automatic work, define rewards based on the difference between the actual trajectory and the expected flat path or slope path, and continuously Iteratively optimize the control strategy, and finally realize the artificial intelligence model (that is, the control algorithm) with the function of leveling control or slope control, and use this process to replace the manual debugging or calibration of the control algorithm.
  • Fig. 1 is a schematic flow chart of the operation machine control method provided by the present application. As shown in Fig. 1, the method includes:
  • Step 110 acquiring the current working state of the working machine.
  • the working machine is a construction machine capable of carrying out construction work, including an excavator, a crane, a concrete pump truck, a concrete mixer truck, and the like.
  • the current working state is a state parameter that can represent the state of the working machine when it is performing construction work at the current moment.
  • the current working state can be represented by the telescopic length and extension angle of the bucket, stick, boom, etc. Obtain. It can also include attitude signals and turning angle signals of the upper body.
  • Step 120 Determine the current decision-making behavior of the operation machine based on the current operation state and the state-behavior decision-making model; wherein, the state-behavior decision-making model is based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior for training The reward value is determined based on the actual position curve and the target position curve of the operating part in the operation machine; the actual position curve is determined based on the sample decision behavior.
  • the current decision-making behavior of the work machine is the construction action performed by the work machine at the current moment.
  • the working machine may have multiple candidate decision-making behaviors at the current moment, and the working machine needs to determine a candidate decision-making behavior as the current decision-making behavior.
  • its candidate decision-making behavior at the current moment may be that the bucket retracts inward, the bucket extends outward, and so on.
  • the method of reinforcement learning can be used to input the current operating state of the operating machine into the state behavior decision-making model, and the state-behavior decision-making model analyzes each parameter in the current operating state to determine the current decision-making behavior of the operating machine.
  • sample operation state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operation machine can be collected, and the state-behavior decision-making model can be obtained after training the initial model.
  • the operating principle of the state-behavior decision-making model is: if the operating machine makes a certain decision-making behavior according to the current operating state, and the decision-making behavior leads to an increase in its corresponding reward value, the tendency of the operating machine to adopt this decision-making behavior in the future will be enhanced.
  • the purpose of the state-behavior decision-making model is to find the optimal decision-making behavior at each moment, so that the operating machine can obtain the maximum reward value after adopting the optimal decision-making behavior.
  • the working part is the part that works on the working face when the working machine is performing construction work.
  • the bucket is the working part
  • the front hose for outputting concrete is the working part
  • the rammer is the working part.
  • the actual position curve of the working part is the curve formed by the actual position of the working part at each moment in the construction process.
  • the actual position curve of the working part can be determined according to the decision-making behavior, that is, it can be determined according to the control signal corresponding to the current decision-making behavior of the working machine after the construction operation.
  • the excavator controls each robot arm according to the control signal corresponding to the current decision-making behavior, and changes the displacement and inclination angle of each robot arm, so that the actual position of the bucket (working part) in contact with the working surface changes, thereby obtaining the excavation model.
  • the actual position curve of the working part of the machine is the curve formed by the actual position of the working part at each moment in the construction process.
  • the actual position curve of the working part can be determined according to the decision-making behavior, that is, it can be determined according to the control signal corresponding to the current decision-making behavior of the working machine after the construction operation.
  • the excavator controls each robot arm according to the control signal corresponding to the current decision-
  • the target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process.
  • the target position curve can be determined depending on the work task of the work machine. For example, for grading operations, the excavator's target position curve could be a straight line.
  • the bonus value can be determined according to the actual position curve and the target position curve of the working part in the working machine.
  • the reward value can be determined according to the actual position curve and the target position curve of the tip of the bucket during construction work.
  • determine the coincidence degree between the actual position curve and the target position curve, and the coincidence degree can be determined according to the distance between corresponding points on the two curves. The smaller the distance between corresponding points, the higher the coincidence degree, and the larger the distance between corresponding points, the lower the coincidence degree.
  • the higher the coincidence degree of the two curves it means that the bucket is flattening or brushing the slope according to the target position curve, and should get a higher reward value.
  • the reward value is proportional to the degree of coincidence, and different reward values can be set according to the degree of coincidence.
  • Step 130 based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work.
  • the operating machine is controlled to perform construction work according to the control signal corresponding to the current decision-making behavior.
  • the current decision-making behavior may be in a corresponding relationship with the opening signal of the operating handle of the excavator.
  • the opening signal of the operating handle of the excavator is also obtained. According to the opening signal of the operating handle, each mechanical arm of the excavator is controlled to move, so as to complete the construction operation at the current moment, and reciprocate in this way until the construction work is completed.
  • the working machine control method provided in the embodiment of the present application performs intensive learning through the sample working state of the working machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current working state of the working machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior.
  • the reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost.
  • the intelligent construction level of the operating machinery has been improved.
  • the reward value is determined based on the following steps:
  • a reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
  • a plurality of position points may be selected on the actual position curve first.
  • the selection of the position point can be the starting point, midpoint, end point, inflection point, etc. of the curve, and can also be segmented according to the shape of the curve, and the segment point can be used as the position point.
  • the embodiment of the present application does not set specific limitations on the selection of the position point.
  • the corresponding location points of each location point are determined on the target location curve.
  • the starting point of the actual position curve corresponds to the starting point of the target position curve
  • the end point of the actual position curve corresponds to the end point of the target position curve
  • the segment point of the actual position curve corresponds to the segment point of the target position curve
  • the position weight of each position point can be determined according to the specific position of each position point on the actual position curve, and the position weight indicates the degree of influence of the position point on the shape of the curve. The greater the position weight, the greater the influence of the position point on the shape of the curve. For example, the location weights of the start point, midpoint, and end point can be set to high weight, and the rest of the location points can be set to low weight.
  • the reward value is determined according to the distance between each location point and the corresponding location point, and the location weight of each location point. For example, the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point can be calculated first, and then the inverse of the product sum can be used as the reward value.
  • the reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point, including:
  • the reward value is determined.
  • the coincidence degree between the actual position curve and the target position curve may be determined according to the distance between each position point and the corresponding position point, and the position weight of each position point.
  • the coincidence degree is the reciprocal of the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point.
  • an additional indicator can be determined according to the moving speed of the operating part on the actual position curve, which is used to determine the reward value. The faster the moving speed of the operating part on the actual position curve, the higher the operating efficiency, and the greater the reward value.
  • the moving speed of the working part on the actual position curve can be determined according to the length of the actual position curve and the moving time of the working part.
  • the weight can be calculated according to the coincidence degree and the coincidence degree, and the moving speed and the moving speed can be used to calculate the weight, obtain the weighted sum, and then use the weighted sum as the reward value.
  • the state-behavior decision-making model is trained based on the following steps:
  • the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior;
  • the initial model is trained to obtain a state-behavior decision-making model.
  • the initial model of the state-behavior decision-making model may use a policy network (Policy Network), a deep Q-network (Deep Q-Network), etc., and the embodiment of the present application does not specifically limit the model type of the initial model.
  • Policy Network Policy Network
  • Deep Q-Network Deep Q-network
  • the state-behavior decision-making model can be trained. Specifically, it can be obtained through the following training methods:
  • the last job status is the job status at the previous moment at the current moment
  • the last decision-making behavior is the decision-making behavior at the previous moment at the current moment.
  • the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior.
  • sample data can also come from historical data when the work machine performs construction work.
  • the initial model is trained to improve the prediction ability of the initial model for the optimal decision-making behavior, and a state-behavior decision-making model is obtained.
  • the operation machine control method provided by the embodiment of the present application can obtain the state behavior decision model after training the initial model through the real-time data of the work machine, and can realize continuous training.
  • it uses the real-time data when the work machine performs the current construction operation During training, it is possible to make adjustments to the next action based on real-time data, which greatly shortens the debugging process.
  • the initial model is trained to obtain a state-behavior decision-making model, including:
  • the initial model is trained to determine the actual position curve of the operation part in the operation machine;
  • the training is stopped, and the trained initial model is used as the state behavior decision model.
  • the coincidence degree between the actual position curve of the working part and the target position curve is less than the preset coincidence threshold, it indicates that the current initial model training has achieved the training purpose, and the training can be stopped.
  • the coincidence degree between the actual position curve and the target position curve of the operating part is greater than or equal to the preset coincidence threshold, it indicates that the training of the current initial model has not achieved the training purpose, and the training should continue. At this point, the current decision-making behavior of the sample can be updated, and iterative training is repeated until the coincidence degree is less than the preset coincidence threshold.
  • the preset coincidence threshold can be set according to actual needs.
  • the target position curve of the working part in the working machine is determined based on the construction tasks performed by the working machine.
  • the construction task is an operation item undertaken by the operation machine.
  • its construction tasks may include leveling, slope brushing, and excavation.
  • the target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process.
  • the target position curve can be determined depending on the work task of the work machine. For example, for leveling operations, the target position curve of the excavator can be a straight line on the horizontal plane; for slope brushing operations, the target position curve of the excavator can be a straight line inclined to the horizontal plane; A curve can be a curve.
  • the state behavior decision model is stored in the memory of the work machine in the form of a computer program, so as to be read and executed by the processor of the work machine.
  • the state-behavior decision-making model can be used as a control algorithm and stored in the memory of the working machine in the form of a computer program.
  • the processor of the work machine can read the computer program in the memory and execute the work machine control method.
  • the working machine is an excavator
  • the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
  • the working machine in the embodiment of the present application may be an excavator, and correspondingly, the current working state may include the attitude parameters of the mechanical arm, the attitude parameters of the upper body, and the slewing angle of the upper body.
  • the attitude parameters of the robotic arm include the telescopic length and extension angle of each robotic arm.
  • the robotic arm here includes boom, stick and bucket.
  • the telescopic length of each mechanical arm can be obtained through the corresponding cylinder length sensor, and the extension angle of each mechanical arm can be obtained through the corresponding inclination sensor.
  • the attitude parameter of the upper body can be the three-dimensional attitude angle of the body part of the excavator, which can be obtained through a gyroscope installed on the slewing platform.
  • the slewing angle of the upper body can be the inclination angle of the body part of the excavator relative to the chassis part, which can be determined by the angle between the extension direction of the boom on the slewing platform and the forward direction of the vehicle.
  • the current working state may also include other parameters that can determine the working state of the excavator, for example, the moving speed and moving direction of the excavator.
  • control signal is the handle opening signal of the excavator.
  • controlling each mechanical arm to carry out construction work is mainly realized by controlling the opening of the handle.
  • the handle of an excavator includes a left operating handle and a right operating handle.
  • the left operating handle controls the stick and slewing platform, and the right operating handle controls the boom and bucket.
  • the opening signal of the handle controls the movement of the corresponding mechanical arm.
  • the present application provides an excavator leveling and slope brushing operation control method based on reinforcement learning, the method comprising:
  • Step 1 Define the state parameter group required by the reinforcement learning model, including the robot arm attitude sensor signal (cylinder displacement or inclination sensor), upper body attitude signal, upper body rotation angle signal, etc., that is, the combination of these parameters can uniquely determine the current The parameter group of the excavator state.
  • the robot arm attitude sensor signal cylinder displacement or inclination sensor
  • upper body attitude signal cylinder displacement or inclination sensor
  • upper body rotation angle signal etc.
  • Step 2 Define the policy function.
  • the input of the strategy function is the current state parameter set (part or all), and the output is the corresponding control signal (handle opening signal) output.
  • the coefficient matrix connecting the input and output parameters is part of the trainable model for reinforcement learning.
  • Step 3 Define the reward function. The smaller the distance between the actual tooth tip position curve and each point of the expected curve, that is, the higher the degree of overlap between the two curves, the greater the reward value.
  • Step 4 develop corresponding automatic development and debugging program.
  • Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application.
  • the training and debugging process of the control model is: real-time collection of sensor signals such as handle, digital oil cylinder and IMU (inertial sensor), and store them in Current state array; output control signal through measurement function; calculate the tooth tip position curve through the real-time sensor return signal; calculate the reward value by combining the obtained curve with the expected tooth tip curve; judge based on the reward value: a) the goal is reached, and the training stops; b ) does not reach the goal, update the strategy function, and iterate repeatedly until the goal is achieved.
  • sensor signals such as handle, digital oil cylinder and IMU (inertial sensor), and store them in Current state array
  • IMU intial sensor
  • FIG. 3 is a schematic diagram of the deployment of the excavator leveling and slope control model provided by this application. As shown in Figure 3, after the training of the enhanced computing model is completed, it can be directly embedded in the controller and function as a control algorithm Similarly, the state parameters collected in real time are used as input to output real-time control signals.
  • the excavator can automatically debug the control algorithm without human intervention, and traverse all state points for optimization.
  • the workload of debugging the control algorithm is greatly reduced, and the cost of debugging is reduced.
  • the developed control program can accelerate the development of control algorithms for subsequent excavator models.
  • the model completed by artificial intelligence training has a characteristic: the model can be migrated to similar application scenarios, and can match new application scenarios with only simpler training, that is, transfer learning. Therefore, it can greatly speed up the leveling of new excavator models, and the development of slope control algorithms.
  • the model developed based on reinforcement learning is a black box model rather than a logical mechanism model, and is not easy to be copied or reverse engineered.
  • Fig. 4 is a schematic structural view of the operating machine control device provided by the present application. As shown in Fig. 4, the device includes:
  • An acquisition unit 410 configured to acquire the current working state of the working machine
  • a decision-making unit 420 configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model
  • the control unit 430 is configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;
  • the state behavior decision model is obtained after intensive training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the actual position curve and target position of the operation part in the operation machine The coincidence degree between the curves is determined, and the actual position curve is determined based on the sample decision behavior.
  • the operating machine control device performs intensive learning through the sample operating state of the operating machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current operating state of the operating machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior.
  • the reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost.
  • the intelligent construction level of the working machinery has been improved.
  • Reward determination unit for selecting a plurality of position points on the actual position curve, and determining the corresponding position point of each position point on the target position curve; determining the position weight of each position point; based on each position point and the corresponding position The distance between points, and the position weight of each position point determine the reward value.
  • the reward determination unit is specifically configured to:
  • the reward value is determined.
  • the training unit is used to obtain the last working state of the working machine, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior;
  • the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior;
  • the initial model is trained to obtain a state-behavior decision-making model.
  • the training unit is also used for:
  • the initial model is trained to determine the actual position curve of the operation part in the operation machine;
  • the training is stopped, and the trained initial model is used as the state behavior decision model.
  • the target position curve of the working part in the working machine is determined based on the construction tasks performed by the working machine.
  • the working machine is an excavator
  • the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
  • control signal is the handle opening signal of the excavator.
  • an embodiment of the present application further provides a work machine, the work machine includes the above work machine control device.
  • the work machine may include the above work machine control device.
  • the above-mentioned control device is used to control the working machine, so that it replaces manual control, and can adjust the next construction action according to the real-time feedback data, shortening the debugging process.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by the present application.
  • the electronic device may include: a processor (Processor) 510, a communication interface (Communications Interface) 520, a memory ) 530 and a communication bus (Communications Bus) 540, wherein the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540.
  • the processor 510 can invoke logic commands in the memory 530 to perform the following methods:
  • the above-mentioned logic commands in the memory 530 may be implemented in the form of software function units and may be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several commands are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the processor in the electronic device provided by the embodiment of the present application can call the logic instruction in the memory to implement the above method, and its specific implementation mode is consistent with the above method implementation mode, and can achieve the same beneficial effect, and will not be repeated here.
  • An embodiment of the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to perform the methods provided by the above-mentioned embodiments, for example, including:
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • each implementation can be implemented by means of software plus a necessary general hardware platform, and of course also by hardware.
  • the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic Disc, CD, etc., including several commands to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Mining & Mineral Resources (AREA)
  • General Engineering & Computer Science (AREA)
  • Civil Engineering (AREA)
  • Structural Engineering (AREA)
  • Mechanical Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Paleontology (AREA)
  • Operation Control Of Excavators (AREA)

Abstract

Provided in the present application are a control method and apparatus for a work machine, and a work machine. The method comprises: acquiring the current work state of a work machine; determining the current decision-making behavior of the work machine on the basis of the current work state and a state behavior decision-making model; and controlling the work machine to perform construction work on the basis of a control signal corresponding to the current decision-making behavior. The state behavior decision-making model is obtained by means of training on the basis of a sample work state of the work machine, a sample decision-making behavior and a reward value corresponding to the sample decision-making behavior. The reward value is determined on the basis of an actual position curve and a target position curve of a work part in the work machine. The actual position curve is determined on the basis of the sample decision-making behavior. By means of the method and apparatus and the work machine provided in the present application, the debugging workload of an engineer is reduced, the debugging time is shortened, the debugging cost is decreased, and the intelligent construction level of a work machine is raised.

Description

作业机械控制方法、装置及作业机械Working machine control method, device and working machine
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年8月19日提交的申请号为202110956947.X,发明名称为“作业机械控制方法、装置及作业机械”的中国专利申请的优先权,其通过引用方式全部并入本文。This application claims the priority of the Chinese patent application filed on August 19, 2021 with the application number 202110956947.X, and the title of the invention is "Operating Machine Control Method, Device, and Operating Machine", which is incorporated herein by reference in its entirety.
技术领域technical field
本申请涉及机械工程技术领域,尤其涉及一种作业机械控制方法、装置及作业机械。The present application relates to the technical field of mechanical engineering, in particular to a method and device for controlling an operating machine and an operating machine.
背景技术Background technique
挖掘机进行平地或者刷坡等复合操作时,通常由有经验的操作手通过组合动作完成。When the excavator performs compound operations such as leveling or brushing slopes, it is usually completed by experienced operators through combined actions.
现有技术中,对挖掘机智能化功能的开发中,通常采用传统的控制算法进行调试,需要定义挖掘机工作的许多个状态点,在每个状态点都需要单独进行控制算法调试,使得平地或者刷坡的控制程序达到预期的精度。由于挖掘机系统较为复杂,这类控制算法调试难度很大,对工程师要求很高,很难完成。并且耗时很长,人力成本较高。In the prior art, in the development of the intelligent function of the excavator, the traditional control algorithm is usually used for debugging, and many state points of the excavator work need to be defined, and each state point needs to be individually debugged for the control algorithm, so that the flat ground Or the control program for brushing the slope achieves the expected accuracy. Due to the complexity of the excavator system, it is very difficult to debug this type of control algorithm, which requires high engineers and is difficult to complete. And it takes a long time and the labor cost is high.
发明内容Contents of the invention
本申请提供的作业机械控制方法、装置及作业机械,用于解决现有技术中对作业机械进行智能控制时,需要对作业机械在各个作业状态建立精确的控制模型并进行大量调试,耗时长,成本高的技术问题。The operating machine control method, device and operating machine provided by this application are used to solve the problem of establishing an accurate control model for the operating machine in each operating state and performing a large number of debugging when intelligently controlling the operating machine in the prior art, which takes a long time. Costly technical issues.
本申请提供一种作业机械控制方法,包括:The present application provides a method for controlling an operating machine, including:
获取作业机械的当前作业状态;Obtain the current operating status of the operating machine;
基于所述当前作业状态和状态行为决策模型,确定所述作业机械的当前决策行为;determining the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;
基于所述当前决策行为对应的控制信号,控制所述作业机械进行施工作业;Based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work;
其中,所述状态行为决策模型是基于所述作业机械的样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值进行训练后得到的;所述奖励值是基于所述作业机械中作业部位的实际位置曲线和目标位置曲线确定的;所述实际位置曲线是基于所述样本决策行为确定的。Wherein, the state behavior decision-making model is obtained after training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site are determined; the actual position curve is determined based on the sample decision behavior.
根据本申请提供的作业机械控制方法,所述奖励值是基于如下步骤确定的:According to the working machine control method provided in this application, the reward value is determined based on the following steps:
在所述实际位置曲线上选取多个位置点,并在所述目标位置曲线上确定每一位置点的对应位置点;selecting a plurality of position points on the actual position curve, and determining a corresponding position point of each position point on the target position curve;
确定每一位置点的位置权重;Determine the position weight of each position point;
基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述奖励值。The reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
根据本申请提供的作业机械控制方法,所述基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述奖励值,包括:According to the working machine control method provided in the present application, the determination of the reward value based on the distance between each position point and the corresponding position point and the position weight of each position point includes:
基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述实际位置曲线和所述目标位置曲线之间的重合度;determining the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;
确定所述作业部位在所述实际位置曲线上的移动速度;determining the moving speed of the working part on the actual position curve;
基于所述重合度和所述移动速度,确定所述奖励值。The reward value is determined based on the coincidence degree and the moving speed.
根据本申请提供的作业机械控制方法,所述状态行为决策模型是基于如下步骤训练得到的:According to the operating machine control method provided in the present application, the state behavior decision-making model is trained based on the following steps:
获取所述作业机械的上一作业状态、上一决策行为,以及所述上一决策行为对应的奖励值;Acquiring the last working state of the working machine, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior;
将所述上一作业状态、所述上一决策行为,以及所述上一决策行为对应的奖励值分别作为样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值;Using the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, respectively;
基于所述样本作业状态、所述样本决策行为,以及所述样本决策行为对应的奖励值,对初始模型进行训练,得到所述状态行为决策模型。Based on the sample job state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, an initial model is trained to obtain the state-behavior decision-making model.
根据本申请提供的作业机械控制方法,所述基于所述样本作业状态、所述样本决策行为,以及所述样本决策行为对应的奖励值,对初始模型进行训练,得到所述状态行为决策模型,包括:According to the operating machine control method provided in the present application, the initial model is trained based on the sample operation state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior to obtain the state-behavior decision-making model, include:
若所述作业机械中作业部位的实际位置曲线和目标位置曲线之间的 重合度小于预设重合阈值,则停止训练,并将训练后的初始模型作为所述状态行为决策模型。If the coincidence degree between the actual position curve and the target position curve of the operating part in the operation machine is less than the preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.
根据本申请提供的作业机械控制方法,所述作业机械为挖掘机,所述当前作业状态包括机械臂的姿态参数、上部车身的姿态参数和上部车身的回转角。According to the working machine control method provided in the present application, the working machine is an excavator, and the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
本申请还提供一种作业机械控制装置,包括:The present application also provides a working machine control device, including:
获取单元,用于获取作业机械的当前作业状态;an acquisition unit, configured to acquire the current working state of the working machine;
决策单元,用于基于所述当前作业状态和状态行为决策模型,确定所述作业机械的当前决策行为;A decision-making unit, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;
控制单元,用于基于所述当前决策行为对应的控制信号,控制所述作业机械进行施工作业;A control unit, configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;
其中,所述状态行为决策模型是基于所述作业机械的样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值进行强化训练后得到的;所述奖励值是基于所述作业机械中作业部位的实际位置曲线和目标位置曲线确定的;所述实际位置曲线是基于所述样本决策行为确定的。Wherein, the state behavior decision-making model is obtained after intensive training based on the sample operation state of the operation machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site in the center are determined; the actual position curve is determined based on the sample decision behavior.
本申请还提供一种作业机械,包括所述的作业机械控制装置。The present application also provides an operating machine, including the above-mentioned operating machine control device.
本申请还提供一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现所述作业机械控制方法的步骤。The present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the operating machine control method is realized A step of.
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述作业机械控制方法的步骤。The present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the working machine control method are realized.
本申请提供的作业机械控制方法、装置及作业机械,通过作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行强化学习,所得到的状态行为决策模型能够根据作业机械的当前作业状态,确定作业机械的当前决策行为,根据当前决策行为对应的控制信号,控制作业机械进行施工作业,奖励值是根据作业机械中作业部位的实际位置曲线和目标位置曲线确定的,使得作业机械的作业部位能够按照设定的目标位置曲线进行施工,并且无需对作业机械在各个作业状态建立精确的控制模型,减少了工程师的调试工作量,缩短了调试时间,降低了调试成本,提高了作业机械的智能化施工水平。The operating machine control method, device, and operating machine provided in this application perform intensive learning through the sample operating state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operating machine, and the obtained state behavior decision-making model can be based on the operating machine. The current working state determines the current decision-making behavior of the working machine, and controls the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the actual position curve and the target position curve of the working part in the working machine, so that the work The working part of the machine can be constructed according to the set target position curve, and there is no need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, reduces the debugging cost, and improves the Intelligent construction level of operating machinery.
附图说明Description of drawings
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present application or the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the present invention, those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.
图1为本申请提供的作业机械控制方法的流程示意图;FIG. 1 is a schematic flow chart of the operation machine control method provided by the present application;
图2为本申请提供的挖掘机平地刷坡控制模型的训练示意图;Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application;
图3为本申请提供的挖掘机平地刷坡控制模型的部署示意图;FIG. 3 is a schematic diagram of deployment of the excavator leveling slope control model provided by the present application;
图4为本申请提供的作业机械控制装置的结构示意图;Fig. 4 is a schematic structural diagram of the operating machine control device provided by the present application;
图5为本申请提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the accompanying drawings in this application. Obviously, the described embodiments are part of the embodiments of this application , but not all examples. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
强化学习是一种基于不断“试错”并给予优势策略奖励来训练人工智能模型的一种智能算法。受此启发,本申请实施例中的技术方案为,让挖掘机从一些随机的初始输入开始,进行自动工作,基于实际的轨迹与期望的平地路径或者刷坡路径的差别大小定义奖励,并不断迭代优化控制策略,最终实现得到具有平地控制功能或者刷坡控制功能的人工智能模型(也即控制算法),并以此过程替代人工进行的控制算法调试或标定。Reinforcement learning is an intelligent algorithm that trains artificial intelligence models based on continuous "trial and error" and rewards for superior strategies. Inspired by this, the technical solution in the embodiment of this application is to let the excavator start from some random initial inputs to perform automatic work, define rewards based on the difference between the actual trajectory and the expected flat path or slope path, and continuously Iteratively optimize the control strategy, and finally realize the artificial intelligence model (that is, the control algorithm) with the function of leveling control or slope control, and use this process to replace the manual debugging or calibration of the control algorithm.
图1为本申请提供的作业机械控制方法的流程示意图,如图1所示,该方法包括:Fig. 1 is a schematic flow chart of the operation machine control method provided by the present application. As shown in Fig. 1, the method includes:
步骤110,获取作业机械的当前作业状态。 Step 110, acquiring the current working state of the working machine.
具体地,作业机械为能够进行施工作业的工程机械,包括挖掘机、起重机、混凝土泵车和混凝土搅拌车等。Specifically, the working machine is a construction machine capable of carrying out construction work, including an excavator, a crane, a concrete pump truck, a concrete mixer truck, and the like.
当前作业状态为能够表征作业机械当前时刻进行施工作业时的状态参数。例如,对于挖掘机而言,当前作业状态可以为铲斗、斗杆、动臂等 部位的伸缩长度和伸展角度来表示,可以通过安装在挖掘机各个机械臂上的油缸位移传感器和倾角传感器来获取。还可以包括上部车身的姿态信号和回转角信号等。The current working state is a state parameter that can represent the state of the working machine when it is performing construction work at the current moment. For example, for an excavator, the current working state can be represented by the telescopic length and extension angle of the bucket, stick, boom, etc. Obtain. It can also include attitude signals and turning angle signals of the upper body.
步骤120,基于当前作业状态和状态行为决策模型,确定作业机械的当前决策行为;其中,状态行为决策模型是基于作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行训练后得到的;奖励值是基于作业机械中作业部位的实际位置曲线和目标位置曲线确定的;实际位置曲线是基于样本决策行为确定的。Step 120: Determine the current decision-making behavior of the operation machine based on the current operation state and the state-behavior decision-making model; wherein, the state-behavior decision-making model is based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior for training The reward value is determined based on the actual position curve and the target position curve of the operating part in the operation machine; the actual position curve is determined based on the sample decision behavior.
具体地,作业机械的当前决策行为为作业机械在当前时刻执行的施工动作。作业机械在当前时刻可能有多个候选决策行为,作业机械需要确定一个候选决策行为作为当前决策行为。例如,对于挖掘机进行平地作业时,其当前时刻的候选决策行为可以为铲斗向内收缩、铲斗向外伸展等。Specifically, the current decision-making behavior of the work machine is the construction action performed by the work machine at the current moment. The working machine may have multiple candidate decision-making behaviors at the current moment, and the working machine needs to determine a candidate decision-making behavior as the current decision-making behavior. For example, when an excavator performs leveling operations, its candidate decision-making behavior at the current moment may be that the bucket retracts inward, the bucket extends outward, and so on.
可以采用强化学习的方法,将作业机械的当前作业状态输入至状态行为决策模型,由状态行为决策模型对当前作业状态中的各个参数进行分析,确定作业机械的当前决策行为。The method of reinforcement learning can be used to input the current operating state of the operating machine into the state behavior decision-making model, and the state-behavior decision-making model analyzes each parameter in the current operating state to determine the current decision-making behavior of the operating machine.
可以收集作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练后,得到状态行为决策模型。The sample operation state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operation machine can be collected, and the state-behavior decision-making model can be obtained after training the initial model.
状态行为决策模型的运行原理为:如果作业机械根据当前的作业状态做出某个决策行为,该决策行为导致其对应的奖励值增加,则作业机械以后采取这个决策行为的趋势便会增强。状态行为决策模型的目的是在每个时刻发现最优的决策行为,使得作业机械采取最优的决策行为后能够获得最大的奖励值。The operating principle of the state-behavior decision-making model is: if the operating machine makes a certain decision-making behavior according to the current operating state, and the decision-making behavior leads to an increase in its corresponding reward value, the tendency of the operating machine to adopt this decision-making behavior in the future will be enhanced. The purpose of the state-behavior decision-making model is to find the optimal decision-making behavior at each moment, so that the operating machine can obtain the maximum reward value after adopting the optimal decision-making behavior.
作业部位为作业机械进行施工作业时工作在工作面的部位。例如,对于挖掘机,铲斗为作业部位,对于混凝土泵车,用于输出混凝土的前端软管为作业部位,对于打夯机,夯锤为作业部位。The working part is the part that works on the working face when the working machine is performing construction work. For example, for an excavator, the bucket is the working part, for a concrete pump truck, the front hose for outputting concrete is the working part, and for a rammer, the rammer is the working part.
作业部位的实际位置曲线为作业部位在施工过程中各个时刻的实际位置所形成的曲线。作业部位的实际位置曲线可以根据决策行为进行确定,即根据作业机械按照当前决策行为对应的控制信号进行施工作业后进行确定。例如,挖掘机根据当前决策行为对应的控制信号,对各个机械臂进行控制,改变各个机械臂的位移和倾角,使得与工作面接触的铲斗(作业 部位)发生实际的位置变化,从而得到挖掘机的作业部位的实际位置曲线。The actual position curve of the working part is the curve formed by the actual position of the working part at each moment in the construction process. The actual position curve of the working part can be determined according to the decision-making behavior, that is, it can be determined according to the control signal corresponding to the current decision-making behavior of the working machine after the construction operation. For example, the excavator controls each robot arm according to the control signal corresponding to the current decision-making behavior, and changes the displacement and inclination angle of each robot arm, so that the actual position of the bucket (working part) in contact with the working surface changes, thereby obtaining the excavation model. The actual position curve of the working part of the machine.
作业部位的目标位置曲线为作业部位在施工过程中各个时刻的期望位置所形成的曲线。目标位置曲线可以根据作业机械的作业任务进行确定。例如,对于平地作业,挖掘机的目标位置曲线可以为一条直线。The target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process. The target position curve can be determined depending on the work task of the work machine. For example, for grading operations, the excavator's target position curve could be a straight line.
奖励值可以根据作业机械中作业部位的实际位置曲线和目标位置曲线进行确定。例如,对于挖掘机而言,可以根据铲斗的齿尖在进行施工作业时的实际位置曲线和目标位置曲线确定奖励值。首先,确定实际位置曲线和目标位置曲线之间的重合度,重合度可以根据两条曲线上对应点之间的距离进行确定。对应点之间的距离越小,则重合度越高,对应点之间的距离越大,则重合度越低。两条曲线的重合度越高,则表示铲斗是按照目标位置曲线来进行平地或者刷坡的,应该得到较高的奖励值,两条曲线的重合度越低,则表示铲斗是没有按照目标位置曲线来进行平地或者刷坡的,应该得到较低的奖励值。奖励值与重合度呈正比例关系,可以根据重合度的大小,设置不同大小的奖励值。The bonus value can be determined according to the actual position curve and the target position curve of the working part in the working machine. For example, for an excavator, the reward value can be determined according to the actual position curve and the target position curve of the tip of the bucket during construction work. First, determine the coincidence degree between the actual position curve and the target position curve, and the coincidence degree can be determined according to the distance between corresponding points on the two curves. The smaller the distance between corresponding points, the higher the coincidence degree, and the larger the distance between corresponding points, the lower the coincidence degree. The higher the coincidence degree of the two curves, it means that the bucket is flattening or brushing the slope according to the target position curve, and should get a higher reward value. The lower the coincidence degree of the two curves, it means that the bucket is not following the target position curve. Those who use the target position curve to level ground or brush slopes should get a lower reward value. The reward value is proportional to the degree of coincidence, and different reward values can be set according to the degree of coincidence.
步骤130,基于当前决策行为对应的控制信号,控制作业机械进行施工作业。 Step 130, based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work.
具体地,获取状态行为决策模型输出的当前决策行为后,根据当前决策行为对应的控制信号,控制作业机械进行施工作业。例如,当前决策行为可以与挖掘机的操作手柄的开度信号呈对应关系。得到当前决策行为后,也就得到了挖掘机的操作手柄的开度信号。根据操作手柄的开度信号,控制挖掘机的各个机械臂进行动作,从而完成当前时刻的施工操作,以此往复,直至完成施工作业。Specifically, after obtaining the current decision-making behavior output by the state-behavior decision-making model, the operating machine is controlled to perform construction work according to the control signal corresponding to the current decision-making behavior. For example, the current decision-making behavior may be in a corresponding relationship with the opening signal of the operating handle of the excavator. After obtaining the current decision-making behavior, the opening signal of the operating handle of the excavator is also obtained. According to the opening signal of the operating handle, each mechanical arm of the excavator is controlled to move, so as to complete the construction operation at the current moment, and reciprocate in this way until the construction work is completed.
本申请实施例提供的作业机械控制方法,通过作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行强化学习,所得到的状态行为决策模型能够根据作业机械的当前作业状态,确定作业机械的当前决策行为,根据当前决策行为对应的控制信号,控制作业机械进行施工作业,奖励值是根据作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度确定的,使得作业机械的作业部位能够按照设定的目标位置曲线进行施工,并且无需对作业机械在各个作业状态建立精确的控制模型,减少了工程师的调试工作量,缩短了调试时间,降低了调试成本, 提高了作业机械的智能化施工水平。The working machine control method provided in the embodiment of the present application performs intensive learning through the sample working state of the working machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current working state of the working machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost. The intelligent construction level of the operating machinery has been improved.
基于上述实施例,奖励值是基于如下步骤确定的:Based on the above embodiments, the reward value is determined based on the following steps:
在实际位置曲线上选取多个位置点,并在目标位置曲线上确定每一位置点的对应位置点;Select multiple position points on the actual position curve, and determine the corresponding position point of each position point on the target position curve;
确定每一位置点的位置权重;Determine the position weight of each position point;
基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定奖励值。A reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
具体地,可以先在实际位置曲线上选取多个位置点。位置点的选取可以为曲线的起点、中点、终点、拐点等,还可以根据曲线的形状进行分段,将分段点作为位置点,本申请实施例对于位置点的选取不作具体限度。Specifically, a plurality of position points may be selected on the actual position curve first. The selection of the position point can be the starting point, midpoint, end point, inflection point, etc. of the curve, and can also be segmented according to the shape of the curve, and the segment point can be used as the position point. The embodiment of the present application does not set specific limitations on the selection of the position point.
确定位置点后,在目标位置曲线上确定每一位置点的对应位置点。例如,实际位置曲线的起点与目标位置曲线的起点对应,实际位置曲线的终点与目标位置曲线的终点对应,实际位置曲线的分段点与目标位置曲线的分段点对应等。After the location points are determined, the corresponding location points of each location point are determined on the target location curve. For example, the starting point of the actual position curve corresponds to the starting point of the target position curve, the end point of the actual position curve corresponds to the end point of the target position curve, the segment point of the actual position curve corresponds to the segment point of the target position curve, etc.
可以根据每一位置点在实际位置曲线上的具体位置,确定每一位置点的位置权重,位置权重表示该位置点对于曲线形状的影响程度。位置权重越大,则该位置点对曲线形状的影响程度越大。例如,起点、中点和终点的位置权重可以设置为高权重,其余的位置点可以设置为低权重。The position weight of each position point can be determined according to the specific position of each position point on the actual position curve, and the position weight indicates the degree of influence of the position point on the shape of the curve. The greater the position weight, the greater the influence of the position point on the shape of the curve. For example, the location weights of the start point, midpoint, and end point can be set to high weight, and the rest of the location points can be set to low weight.
根据每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定奖励值。例如,可以先求解每一位置点与对应位置点之间的距离,与每一位置点的位置权重的乘积之和,再将乘积之和的倒数作为奖励值。The reward value is determined according to the distance between each location point and the corresponding location point, and the location weight of each location point. For example, the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point can be calculated first, and then the inverse of the product sum can be used as the reward value.
基于上述任一实施例,基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定奖励值,包括:Based on any of the above embodiments, the reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point, including:
基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定实际位置曲线和目标位置曲线之间的重合度;Determine the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;
确定作业部位在实际位置曲线上的移动速度;Determine the moving speed of the operating part on the actual position curve;
基于重合度和移动速度,确定奖励值。Based on the coincidence degree and movement speed, the reward value is determined.
具体地,可以根据每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定实际位置曲线和目标位置曲线之间的重合度。例如,重合度为每一位置点与对应位置点之间的距离与每一位置点的位置权重 的乘积之和的倒数。Specifically, the coincidence degree between the actual position curve and the target position curve may be determined according to the distance between each position point and the corresponding position point, and the position weight of each position point. For example, the coincidence degree is the reciprocal of the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point.
除了重合度之外,还可以根据作业部位在实际位置曲线上的移动速度确定一个附加指标,用于确定奖励值。作业部位在实际位置曲线上的移动速度越快,说明作业效率越高,奖励值就相应越大。In addition to the coincidence degree, an additional indicator can be determined according to the moving speed of the operating part on the actual position curve, which is used to determine the reward value. The faster the moving speed of the operating part on the actual position curve, the higher the operating efficiency, and the greater the reward value.
作业部位在实际位置曲线上的移动速度可以根据实际位置曲线的长度和作业部位的移动时间进行确定。The moving speed of the working part on the actual position curve can be determined according to the length of the actual position curve and the moving time of the working part.
例如,可以根据重合度和重合度计算权重,以及移动速度和移动速度计算权重,得到加权和,然后将加权和作为奖励值。For example, the weight can be calculated according to the coincidence degree and the coincidence degree, and the moving speed and the moving speed can be used to calculate the weight, obtain the weighted sum, and then use the weighted sum as the reward value.
基于上述任一实施例,状态行为决策模型是基于如下步骤训练得到的:Based on any of the above-mentioned embodiments, the state-behavior decision-making model is trained based on the following steps:
获取作业机械的上一作业状态、上一决策行为,以及上一决策行为对应的奖励值;Obtain the previous operation status, last decision-making behavior, and reward value corresponding to the last decision-making behavior of the working machine;
将上一作业状态、上一决策行为,以及上一决策行为对应的奖励值分别作为样本作业状态、样本决策行为,以及样本决策行为对应的奖励值;The last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior;
基于样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,得到状态行为决策模型。Based on the sample job status, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain a state-behavior decision-making model.
具体地,状态行为决策模型的初始模型可以采用策略网络(Policy Network)、深度Q网络(Deep Q-Network)等,本申请实施例对于初始模型的模型种类不作具体限定。Specifically, the initial model of the state-behavior decision-making model may use a policy network (Policy Network), a deep Q-network (Deep Q-Network), etc., and the embodiment of the present application does not specifically limit the model type of the initial model.
可以训练得到状态行为决策模型,具体可以通过如下训练方式得到:The state-behavior decision-making model can be trained. Specifically, it can be obtained through the following training methods:
首先,实时收集作业机械的上一作业状态、上一决策行为,以及上一决策行为对应的奖励值。上一作业状态为当前时刻的上一时刻的作业状态,上一决策行为为当前时刻的上一时刻的决策行为。将上一作业状态、上一决策行为,以及上一决策行为对应的奖励值分别作为样本作业状态、样本决策行为,以及样本决策行为对应的奖励值。这些样本数据都来自作业机械执行当前施工作业时的实时数据。First, collect the last working state, last decision-making behavior, and reward value corresponding to the last decision-making behavior of the working machine in real time. The last job status is the job status at the previous moment at the current moment, and the last decision-making behavior is the decision-making behavior at the previous moment at the current moment. The last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior. These sample data come from the real-time data when the work machine is performing the current construction work.
此外,样本数据还可以来自作业机械执行施工作业时的历史数据。In addition, the sample data can also come from historical data when the work machine performs construction work.
其次,根据样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,以提高初始模型对于最优的决策行为的预测能力,得到状态行为决策模型。Secondly, according to the sample job status, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to improve the prediction ability of the initial model for the optimal decision-making behavior, and a state-behavior decision-making model is obtained.
本申请实施例提供的作业机械控制方法,可以通过作业机械的实时数 据对初始模型进行训练后得到状态行为决策模型,可以实现连续进行训练,当其采用作业机械执行当前施工作业时的实时数据进行训练时,能够根据实时数据对下次动作做出调整,大大缩短了这个调试过程。The operation machine control method provided by the embodiment of the present application can obtain the state behavior decision model after training the initial model through the real-time data of the work machine, and can realize continuous training. When it uses the real-time data when the work machine performs the current construction operation During training, it is possible to make adjustments to the next action based on real-time data, which greatly shortens the debugging process.
基于上述任一实施例,基于样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,得到状态行为决策模型,包括:Based on any of the above-mentioned embodiments, based on the sample job status, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain a state-behavior decision-making model, including:
基于样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,确定作业机械中作业部位的实际位置曲线;Based on the sample operation state, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to determine the actual position curve of the operation part in the operation machine;
若作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度小于预设重合阈值,则停止训练,并将训练后的初始模型作为状态行为决策模型。If the coincidence degree between the actual position curve and the target position curve of the working part in the working machine is less than the preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.
具体地,如果在作业机械根据样本当前决策行为对应的控制信号进行了施工作业后,可以得到作业部位的实际位置曲线,作业部位的实际位置曲线和目标位置曲线之间的重合度小于预设重合阈值,则表明当前的初始模型进行训练已经达到了训练目的,可以停止训练。Specifically, if the actual position curve of the working part can be obtained after the working machine performs the construction work according to the control signal corresponding to the current decision-making behavior of the sample, the coincidence degree between the actual position curve of the working part and the target position curve is less than the preset coincidence threshold, it indicates that the current initial model training has achieved the training purpose, and the training can be stopped.
若作业部位的实际位置曲线和目标位置曲线之间的重合度大于等于预设重合阈值,则表明当前的初始模型进行训练尚未达到训练目的,应当继续训练。此时,可以更新样本当前决策行为,反复迭代训练,直至重合度小于预设重合阈值。If the coincidence degree between the actual position curve and the target position curve of the operating part is greater than or equal to the preset coincidence threshold, it indicates that the training of the current initial model has not achieved the training purpose, and the training should continue. At this point, the current decision-making behavior of the sample can be updated, and iterative training is repeated until the coincidence degree is less than the preset coincidence threshold.
预设重合阈值可以根据实际需要进行设置。The preset coincidence threshold can be set according to actual needs.
基于上述任一实施例,作业机械中作业部位的目标位置曲线是基于作业机械所执行的施工任务确定的。Based on any of the above embodiments, the target position curve of the working part in the working machine is determined based on the construction tasks performed by the working machine.
具体地,施工任务为作业机械所承担的作业项目。例如,对于挖掘机,其施工任务可以包括平地、刷坡和挖掘等。Specifically, the construction task is an operation item undertaken by the operation machine. For example, for an excavator, its construction tasks may include leveling, slope brushing, and excavation.
作业部位的目标位置曲线为作业部位在施工过程中各个时刻的期望位置所形成的曲线。目标位置曲线可以根据作业机械的作业任务进行确定。例如,对于平地作业,挖掘机的目标位置曲线可以为一条水平面上的直线,对于刷坡作业,挖掘机的目标位置曲线可以为一条与水平面相倾斜的直线,对于挖掘作业,挖掘机的目标位置曲线可以为一条曲线。The target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process. The target position curve can be determined depending on the work task of the work machine. For example, for leveling operations, the target position curve of the excavator can be a straight line on the horizontal plane; for slope brushing operations, the target position curve of the excavator can be a straight line inclined to the horizontal plane; A curve can be a curve.
基于上述任一实施例,状态行为决策模型以计算机程序的方式存储在 作业机械的存储器中,以供作业机械的处理器读取并执行。Based on any of the above embodiments, the state behavior decision model is stored in the memory of the work machine in the form of a computer program, so as to be read and executed by the processor of the work machine.
具体地,状态行为决策模型可以作为控制算法,以计算机程序的方式存储在作业机械的存储器中。作业机械的处理器可以读取存储器中的计算机程序,执行作业机械控制方法。Specifically, the state-behavior decision-making model can be used as a control algorithm and stored in the memory of the working machine in the form of a computer program. The processor of the work machine can read the computer program in the memory and execute the work machine control method.
基于上述任一实施例,作业机械为挖掘机,当前作业状态包括机械臂的姿态参数、上部车身的姿态参数和上部车身的回转角。Based on any of the above embodiments, the working machine is an excavator, and the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
具体地,本申请实施例中的作业机械可以为挖掘机,相应地,当前作业状态可以包括机械臂的姿态参数、上部车身的姿态参数和上部车身的回转角。Specifically, the working machine in the embodiment of the present application may be an excavator, and correspondingly, the current working state may include the attitude parameters of the mechanical arm, the attitude parameters of the upper body, and the slewing angle of the upper body.
机械臂的姿态参数包括各个机械臂的伸缩长度和伸展角度。此处的机械臂包括动臂、斗杆和铲斗。各个机械臂的伸缩长度可以通过对应的油缸长度传感器获取,各个机械臂的伸展角度可以通过对应的倾角传感器获取。The attitude parameters of the robotic arm include the telescopic length and extension angle of each robotic arm. The robotic arm here includes boom, stick and bucket. The telescopic length of each mechanical arm can be obtained through the corresponding cylinder length sensor, and the extension angle of each mechanical arm can be obtained through the corresponding inclination sensor.
上部车身的姿态参数可以为挖掘机车体部分的三维姿态角,可以通过安装在回转平台上的陀螺仪获取。The attitude parameter of the upper body can be the three-dimensional attitude angle of the body part of the excavator, which can be obtained through a gyroscope installed on the slewing platform.
上部车身的回转角可以为挖掘机车体部分相对于底盘部分的倾斜角度,可以通过回转平台上动臂的伸展方向与车辆前进方向上的夹角进行确定。The slewing angle of the upper body can be the inclination angle of the body part of the excavator relative to the chassis part, which can be determined by the angle between the extension direction of the boom on the slewing platform and the forward direction of the vehicle.
当前作业状态还可以包括安装在其它可以确定挖掘机工作状态的参数,例如,挖掘机的移动速度和移动方向等。The current working state may also include other parameters that can determine the working state of the excavator, for example, the moving speed and moving direction of the excavator.
基于上述任一实施例,控制信号为挖掘机的手柄开度信号。Based on any of the above embodiments, the control signal is the handle opening signal of the excavator.
具体地,对于挖掘机来说,控制各个机械臂进行施工作业主要是通过控制手柄的开度来实现的。例如,挖掘机的手柄包括左操作手柄和右操作手柄。左操作手柄控制斗杆和回转平台,右操作手柄控制动臂和铲斗。手柄的开度信号控制了对应的机械臂的动作。Specifically, for an excavator, controlling each mechanical arm to carry out construction work is mainly realized by controlling the opening of the handle. For example, the handle of an excavator includes a left operating handle and a right operating handle. The left operating handle controls the stick and slewing platform, and the right operating handle controls the boom and bucket. The opening signal of the handle controls the movement of the corresponding mechanical arm.
基于上述任一实施例,本申请提供一种基于强化学习的挖掘机平地、刷坡作业控制方法,该方法包括:Based on any of the above-mentioned embodiments, the present application provides an excavator leveling and slope brushing operation control method based on reinforcement learning, the method comprising:
步骤一、定义强化学习模型所需的状态参数组,包含机械臂姿态传感器信号(油缸位移或倾角传感器)、上车身姿态信号、上车身回转角信号等,即是这些参数的组合能唯一确定当前挖机状态的参数组。Step 1. Define the state parameter group required by the reinforcement learning model, including the robot arm attitude sensor signal (cylinder displacement or inclination sensor), upper body attitude signal, upper body rotation angle signal, etc., that is, the combination of these parameters can uniquely determine the current The parameter group of the excavator state.
步骤二、定义策略函数。策略函数的输入为当前的状态参数集合(部 分或全部),输出为对应的控制信号(手柄开度信号)输出。而连接输入和输出参数的系数矩阵即为该强化学习的可训练模型的一部分。Step 2. Define the policy function. The input of the strategy function is the current state parameter set (part or all), and the output is the corresponding control signal (handle opening signal) output. The coefficient matrix connecting the input and output parameters is part of the trainable model for reinforcement learning.
步骤三、定义奖励函数,实际的齿尖位置曲线与期望曲线各点距离越小,即两条曲线重合度越高,奖励数值越大。Step 3. Define the reward function. The smaller the distance between the actual tooth tip position curve and each point of the expected curve, that is, the higher the degree of overlap between the two curves, the greater the reward value.
步骤四、开发相应自动化开发调试程序。图2为本申请提供的挖掘机平地刷坡控制模型的训练示意图,如图2所示,控制模型的训练调试过程为:实时采集手柄,数字油缸和IMU(惯性传感器)等传感器信号,存入当前状态数组;通过测量函数输出控制信号;通过实时传感器返回信号计算出齿尖位置曲线;所得曲线结合与期望齿尖曲线计算出奖励值;基于奖励值判断:a)达到目标,训练停止;b)未达到目标,更新策略函数,反复迭代直至目标达成。Step 4, develop corresponding automatic development and debugging program. Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application. As shown in Fig. 2, the training and debugging process of the control model is: real-time collection of sensor signals such as handle, digital oil cylinder and IMU (inertial sensor), and store them in Current state array; output control signal through measurement function; calculate the tooth tip position curve through the real-time sensor return signal; calculate the reward value by combining the obtained curve with the expected tooth tip curve; judge based on the reward value: a) the goal is reached, and the training stops; b ) does not reach the goal, update the strategy function, and iterate repeatedly until the goal is achieved.
步骤五、图3为本申请提供的挖掘机平地刷坡控制模型的部署示意图,如图3所示,强化计算模型训练完成之后,可直接嵌入式部署于控制器中,作用与一种控制算法类似,以实时采集的状态参数作为输入,输出实时控制信号。Step 5. Figure 3 is a schematic diagram of the deployment of the excavator leveling and slope control model provided by this application. As shown in Figure 3, after the training of the enhanced computing model is completed, it can be directly embedded in the controller and function as a control algorithm Similarly, the state parameters collected in real time are used as input to output real-time control signals.
本申请实施例提供的基于强化学习的挖掘机平地、刷坡作业控制方法,具有以下优点:The reinforcement learning-based excavator leveling and slope brushing operation control method provided by the embodiment of the present application has the following advantages:
1、在设定好自动化的强化学习训练程序后,不需要人为干预,即可让挖机自动进行控制算法调试工作,并遍历所有状态点进行优化。大大降低控制算法调试的工作量,降低调试的成本。1. After setting the automatic reinforcement learning training program, the excavator can automatically debug the control algorithm without human intervention, and traverse all state points for optimization. The workload of debugging the control algorithm is greatly reduced, and the cost of debugging is reduced.
2、由于可以实现连续进行调试,相比人工调试,精度能达到或超越人工,且由于实时根据传回数据来对下次动作做出调整,整个调试过程时间会被大大缩短。2. Since continuous debugging can be realized, compared with manual debugging, the accuracy can reach or exceed that of manual debugging, and because the next action is adjusted in real time according to the returned data, the entire debugging process time will be greatly shortened.
3、通过开发的控制程序对于后续挖掘机机型控制算法的开发有加速推进作用。人工智能训练完成的模型有个特点:模型可以迁移至相似的应用场景中,且只进行更简单的训练就可以匹配新的应用场景,也即迁移学习。所以能大大加快新挖掘机机型的平地,刷坡控制算法开发。3. The developed control program can accelerate the development of control algorithms for subsequent excavator models. The model completed by artificial intelligence training has a characteristic: the model can be migrated to similar application scenarios, and can match new application scenarios with only simpler training, that is, transfer learning. Therefore, it can greatly speed up the leveling of new excavator models, and the development of slope control algorithms.
4、基于强化学习开发的模型属于黑盒模型而非逻辑机理模型,不容易被复制或逆向工程。4. The model developed based on reinforcement learning is a black box model rather than a logical mechanism model, and is not easy to be copied or reverse engineered.
基于上述任一实施例,图4为本申请提供的作业机械控制装置的结构 示意图,如图4所示,该装置包括:Based on any of the above-mentioned embodiments, Fig. 4 is a schematic structural view of the operating machine control device provided by the present application. As shown in Fig. 4, the device includes:
获取单元410,用于获取作业机械的当前作业状态;An acquisition unit 410, configured to acquire the current working state of the working machine;
决策单元420,用于基于当前作业状态和状态行为决策模型,确定作业机械的当前决策行为;A decision-making unit 420, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;
控制单元430,用于基于当前决策行为对应的控制信号,控制作业机械进行施工作业;The control unit 430 is configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;
其中,状态行为决策模型是基于作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行强化训练后得到的;奖励值是基于作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度确定的,实际位置曲线是基于样本决策行为确定的。Among them, the state behavior decision model is obtained after intensive training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the actual position curve and target position of the operation part in the operation machine The coincidence degree between the curves is determined, and the actual position curve is determined based on the sample decision behavior.
本申请实施例提供的作业机械控制装置,通过作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行强化学习,所得到的状态行为决策模型能够根据作业机械的当前作业状态,确定作业机械的当前决策行为,根据当前决策行为对应的控制信号,控制作业机械进行施工作业,奖励值是根据作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度确定的,使得作业机械的作业部位能够按照设定的目标位置曲线进行施工,并且无需对作业机械在各个作业状态建立精确的控制模型,减少了工程师的调试工作量,缩短了调试时间,降低了调试成本,提高了作业机械的智能化施工水平。The operating machine control device provided in the embodiment of the present application performs intensive learning through the sample operating state of the operating machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current operating state of the operating machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost. The intelligent construction level of the working machinery has been improved.
基于上述任一实施例,还包括:Based on any of the above embodiments, it also includes:
奖励确定单元,用于在实际位置曲线上选取多个位置点,并在目标位置曲线上确定每一位置点的对应位置点;确定每一位置点的位置权重;基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定奖励值。Reward determination unit, for selecting a plurality of position points on the actual position curve, and determining the corresponding position point of each position point on the target position curve; determining the position weight of each position point; based on each position point and the corresponding position The distance between points, and the position weight of each position point determine the reward value.
基于上述任一实施例,奖励确定单元具体用于:Based on any of the above embodiments, the reward determination unit is specifically configured to:
基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定实际位置曲线和目标位置曲线之间的重合度;Determine the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;
确定作业部位在实际位置曲线上的移动速度;Determine the moving speed of the operating part on the actual position curve;
基于重合度和移动速度,确定奖励值。Based on the coincidence degree and movement speed, the reward value is determined.
基于上述任一实施例,还包括:Based on any of the above embodiments, it also includes:
训练单元,用于获取作业机械的上一作业状态、上一决策行为,以及上一决策行为对应的奖励值;The training unit is used to obtain the last working state of the working machine, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior;
将上一作业状态、上一决策行为,以及上一决策行为对应的奖励值分别作为样本作业状态、样本决策行为,以及样本决策行为对应的奖励值;The last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior;
基于样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,得到状态行为决策模型。Based on the sample job status, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain a state-behavior decision-making model.
基于上述任一实施例,训练单元还用于:Based on any of the above embodiments, the training unit is also used for:
基于样本作业状态、样本决策行为,以及样本决策行为对应的奖励值,对初始模型进行训练,确定作业机械中作业部位的实际位置曲线;Based on the sample operation state, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to determine the actual position curve of the operation part in the operation machine;
若作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度小于预设重合阈值,则停止训练,并将训练后的初始模型作为状态行为决策模型。If the coincidence degree between the actual position curve and the target position curve of the working part in the working machine is less than the preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.
基于上述任一实施例,作业机械中作业部位的目标位置曲线是基于作业机械所执行的施工任务确定的。Based on any of the above embodiments, the target position curve of the working part in the working machine is determined based on the construction tasks performed by the working machine.
基于上述任一实施例,作业机械为挖掘机,当前作业状态包括机械臂的姿态参数、上部车身的姿态参数和上部车身的回转角。Based on any of the above embodiments, the working machine is an excavator, and the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.
基于上述任一实施例,控制信号为挖掘机的手柄开度信号。Based on any of the above embodiments, the control signal is the handle opening signal of the excavator.
基于上述任一实施例,本申请实施例还提供一种作业机械,该作业机械包括上述作业机械控制装置。Based on any of the above embodiments, an embodiment of the present application further provides a work machine, the work machine includes the above work machine control device.
具体地,作业机械可以包括上述作业机械控制装置。上述控制装置用于对作业机械进行控制,使其代替人工控制,能够根据实时回传数据对下次施工动作进行调整,缩短调试过程。Specifically, the work machine may include the above work machine control device. The above-mentioned control device is used to control the working machine, so that it replaces manual control, and can adjust the next construction action according to the real-time feedback data, shortening the debugging process.
基于上述任一实施例,图5为本申请提供的电子设备的结构示意图,如图5所示,该电子设备可以包括:处理器(Processor)510、通信接口(Communications Interface)520、存储器(Memory)530和通信总线(Communications Bus)540,其中,处理器510,通信接口520,存储器530通过通信总线540完成相互间的通信。处理器510可以调用存储器530中的逻辑命令,以执行如下方法:Based on any of the above-mentioned embodiments, FIG. 5 is a schematic structural diagram of an electronic device provided by the present application. As shown in FIG. 5 , the electronic device may include: a processor (Processor) 510, a communication interface (Communications Interface) 520, a memory ) 530 and a communication bus (Communications Bus) 540, wherein the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540. The processor 510 can invoke logic commands in the memory 530 to perform the following methods:
获取作业机械的当前作业状态;基于当前作业状态和状态行为决策模型,确定作业机械的当前决策行为;基于当前决策行为对应的控制信号, 控制作业机械进行施工作业;其中,状态行为决策模型是基于作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行训练后得到的;奖励值是基于作业机械中作业部位的实际位置曲线和目标位置曲线确定的,实际位置曲线是基于样本决策行为确定的。Obtain the current operating state of the operating machine; determine the current decision-making behavior of the operating machine based on the current operating state and the state-behavior decision-making model; The sample operation state, sample decision-making behavior of the working machine, and the reward value corresponding to the sample decision-making behavior are obtained after training; the reward value is determined based on the actual position curve and the target position curve of the working part in the working machine, and the actual position curve is based on The sample decision behavior is determined.
此外,上述的存储器530中的逻辑命令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic commands in the memory 530 may be implemented in the form of software function units and may be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several commands are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
本申请实施例提供的电子设备中的处理器可以调用存储器中的逻辑指令,实现上述方法,其具体的实施方式与前述方法实施方式一致,且可以达到相同的有益效果,此处不再赘述。The processor in the electronic device provided by the embodiment of the present application can call the logic instruction in the memory to implement the above method, and its specific implementation mode is consistent with the above method implementation mode, and can achieve the same beneficial effect, and will not be repeated here.
本申请实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的方法,例如包括:An embodiment of the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to perform the methods provided by the above-mentioned embodiments, for example, including:
获取作业机械的当前作业状态;基于当前作业状态和状态行为决策模型,确定作业机械的当前决策行为;基于当前决策行为对应的控制信号,控制作业机械进行施工作业;其中,状态行为决策模型是基于作业机械的样本作业状态、样本决策行为,以及样本决策行为对应的奖励值进行训练后得到的;奖励值是基于作业机械中作业部位的实际位置曲线和目标位置曲线确定的,实际位置曲线是基于样本决策行为确定的。Obtain the current operating state of the operating machine; determine the current decision-making behavior of the operating machine based on the current operating state and the state-behavior decision-making model; The sample operation state, sample decision-making behavior of the working machine, and the reward value corresponding to the sample decision-making behavior are obtained after training; the reward value is determined based on the actual position curve and the target position curve of the working part in the working machine, and the actual position curve is based on The sample decision behavior is determined.
本申请实施例提供的非暂态计算机可读存储介质上存储的计算机程序被执行时,实现上述方法,其具体的实施方式与前述方法实施方式一致,且可以达到相同的有益效果,此处不再赘述。When the computer program stored on the non-transitory computer-readable storage medium provided by the embodiment of the present application is executed, the above-mentioned method is realized, and its specific implementation mode is consistent with the aforementioned method implementation mode, and can achieve the same beneficial effect. Let me repeat.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以 是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the implementations, those skilled in the art can clearly understand that each implementation can be implemented by means of software plus a necessary general hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic Disc, CD, etc., including several commands to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, rather than limiting them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present application.

Claims (10)

  1. 一种作业机械控制方法,包括:A method for controlling a work machine, comprising:
    获取作业机械的当前作业状态;Obtain the current operating status of the operating machine;
    基于所述当前作业状态和状态行为决策模型,确定所述作业机械的当前决策行为;determining the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;
    基于所述当前决策行为对应的控制信号,控制所述作业机械进行施工作业;Based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work;
    其中,所述状态行为决策模型是基于所述作业机械的样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值进行训练后得到的;所述奖励值是基于所述作业机械中作业部位的实际位置曲线和目标位置曲线确定的;所述实际位置曲线是基于所述样本决策行为确定的。Wherein, the state behavior decision-making model is obtained after training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site are determined; the actual position curve is determined based on the sample decision behavior.
  2. 根据权利要求1所述的作业机械控制方法,其中,所述奖励值是基于如下步骤确定的:The working machine control method according to claim 1, wherein the reward value is determined based on the following steps:
    在所述实际位置曲线上选取多个位置点,并在所述目标位置曲线上确定每一位置点的对应位置点;selecting a plurality of position points on the actual position curve, and determining a corresponding position point of each position point on the target position curve;
    确定每一位置点的位置权重;Determine the position weight of each position point;
    基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述奖励值。The reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
  3. 根据权利要求2所述的作业机械控制方法,其中,所述基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述奖励值,包括:The working machine control method according to claim 2, wherein the determining the reward value based on the distance between each location point and the corresponding location point and the location weight of each location point includes:
    基于每一位置点与对应位置点之间的距离,以及每一位置点的位置权重,确定所述实际位置曲线和所述目标位置曲线之间的重合度;determining the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;
    确定所述作业部位在所述实际位置曲线上的移动速度;determining the moving speed of the working part on the actual position curve;
    基于所述重合度和所述移动速度,确定所述奖励值。The reward value is determined based on the coincidence degree and the moving speed.
  4. 根据权利要求3所述的作业机械控制方法,其中,所述状态行为决策模型是基于如下步骤训练得到的:The operation machine control method according to claim 3, wherein the state behavior decision-making model is trained based on the following steps:
    获取所述作业机械的上一作业状态、上一决策行为,以及所述上一 决策行为对应的奖励值;Obtaining the last working state, the last decision-making behavior of the working machine, and the reward value corresponding to the last decision-making behavior;
    将所述上一作业状态、所述上一决策行为,以及所述上一决策行为对应的奖励值分别作为样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值;Using the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, respectively;
    基于所述样本作业状态、所述样本决策行为,以及所述样本决策行为对应的奖励值,对初始模型进行训练,得到所述状态行为决策模型。Based on the sample job state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, an initial model is trained to obtain the state-behavior decision-making model.
  5. 根据权利要求4所述的作业机械控制方法,其中,所述基于所述样本作业状态、所述样本决策行为,以及所述样本决策行为对应的奖励值,对初始模型进行训练,得到所述状态行为决策模型,包括:The operation machine control method according to claim 4, wherein, based on the sample operation state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain the state Behavioral decision-making models, including:
    若所述作业机械中作业部位的实际位置曲线和目标位置曲线之间的重合度小于预设重合阈值,则停止训练,并将训练后的初始模型作为所述状态行为决策模型。If the coincidence degree between the actual position curve and the target position curve of the working part in the working machine is less than a preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.
  6. 根据权利要求1至5任一项所述的作业机械控制方法,其中,所述作业机械为挖掘机,所述当前作业状态包括机械臂的姿态参数、上部车身的姿态参数和上部车身的回转角。The method for controlling an operating machine according to any one of claims 1 to 5, wherein the operating machine is an excavator, and the current operating state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body, and the turning angle of the upper body .
  7. 一种作业机械控制装置,包括:A work machine control device comprising:
    获取单元,用于获取作业机械的当前作业状态;an acquisition unit, configured to acquire the current working state of the working machine;
    决策单元,用于基于所述当前作业状态和状态行为决策模型,确定所述作业机械的当前决策行为;A decision-making unit, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;
    控制单元,用于基于所述当前决策行为对应的控制信号,控制所述作业机械进行施工作业;A control unit, configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;
    其中,所述状态行为决策模型是基于所述作业机械的样本作业状态、样本决策行为,以及所述样本决策行为对应的奖励值进行强化训练后得到的;所述奖励值是基于所述作业机械中作业部位的实际位置曲线和目标位置曲线确定的;所述实际位置曲线是基于所述样本决策行为确定的。Wherein, the state behavior decision-making model is obtained after intensive training based on the sample operation state of the operation machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site in the center are determined; the actual position curve is determined based on the sample decision behavior.
  8. 一种作业机械,包括权利要求7中所述的作业机械控制装置。A working machine comprising the working machine control device as claimed in claim 7 .
  9. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至6任一项所述作业机械控制方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program according to any one of claims 1 to 6 is realized. The steps of the work machine control method described in the item.
  10. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述作业机械控制方法的步骤。A non-transitory computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the working machine control method according to any one of claims 1 to 6 are implemented.
PCT/CN2022/102918 2021-08-19 2022-06-30 Control method and apparatus for work machine, and work machine WO2023020129A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/065,804 US20230112014A1 (en) 2021-08-19 2022-12-14 Working machine control method, working machine control device and working machine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110956947.XA CN113684885B (en) 2021-08-19 2021-08-19 Working machine control method and device and working machine
CN202110956947.X 2021-08-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/065,804 Continuation US20230112014A1 (en) 2021-08-19 2022-12-14 Working machine control method, working machine control device and working machine

Publications (1)

Publication Number Publication Date
WO2023020129A1 true WO2023020129A1 (en) 2023-02-23

Family

ID=78580782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102918 WO2023020129A1 (en) 2021-08-19 2022-06-30 Control method and apparatus for work machine, and work machine

Country Status (3)

Country Link
US (1) US20230112014A1 (en)
CN (1) CN113684885B (en)
WO (1) WO2023020129A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113684885B (en) * 2021-08-19 2022-09-02 上海三一重机股份有限公司 Working machine control method and device and working machine
CN114351785B (en) * 2022-01-04 2022-09-23 大连理工大学 Hydraulic excavator system flow matching optimization method based on reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109778941A (en) * 2019-03-25 2019-05-21 江苏徐工工程机械研究院有限公司 A kind of semi-autonomous digging system and method based on intensified learning
US20190196854A1 (en) * 2017-12-21 2019-06-27 Caterpillar Inc. System and method for using virtual machine operator model
JP2019183421A (en) * 2018-04-03 2019-10-24 清水建設株式会社 Estimation device and estimation method
WO2019222745A1 (en) * 2018-05-18 2019-11-21 Google Llc Sample-efficient reinforcement learning
JP2020082314A (en) * 2018-11-29 2020-06-04 京セラドキュメントソリューションズ株式会社 Learning device, robot control method, and robot control system
CN112299254A (en) * 2019-07-31 2021-02-02 利勃海尔液压挖掘机有限公司 Method for automatically moving a working device and working device
CN113684885A (en) * 2021-08-19 2021-11-23 上海三一重机股份有限公司 Working machine control method and device and working machine

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105259888B (en) * 2015-10-29 2018-04-03 上海华兴数字科技有限公司 A kind of excavator teaching control system, method and excavator
EP3919687A4 (en) * 2019-04-04 2022-11-16 Komatsu Ltd. System including work machinery, computer-executed method, production method for trained attitude estimation models, and learning data
US11624171B2 (en) * 2020-07-31 2023-04-11 Baidu Usa Llc Engineering machinery equipment, and method, system, and storage medium for operation trajectory planning thereof
CN112598150B (en) * 2020-11-09 2024-03-08 西安君能清洁能源有限公司 Method for improving fire detection effect based on federal learning in intelligent power plant
CN112947180B (en) * 2021-02-04 2022-06-24 中国地质大学(武汉) Heavy machinery operation state identification and prediction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190196854A1 (en) * 2017-12-21 2019-06-27 Caterpillar Inc. System and method for using virtual machine operator model
JP2019183421A (en) * 2018-04-03 2019-10-24 清水建設株式会社 Estimation device and estimation method
WO2019222745A1 (en) * 2018-05-18 2019-11-21 Google Llc Sample-efficient reinforcement learning
JP2020082314A (en) * 2018-11-29 2020-06-04 京セラドキュメントソリューションズ株式会社 Learning device, robot control method, and robot control system
CN109778941A (en) * 2019-03-25 2019-05-21 江苏徐工工程机械研究院有限公司 A kind of semi-autonomous digging system and method based on intensified learning
CN112299254A (en) * 2019-07-31 2021-02-02 利勃海尔液压挖掘机有限公司 Method for automatically moving a working device and working device
CN113684885A (en) * 2021-08-19 2021-11-23 上海三一重机股份有限公司 Working machine control method and device and working machine

Also Published As

Publication number Publication date
US20230112014A1 (en) 2023-04-13
CN113684885A (en) 2021-11-23
CN113684885B (en) 2022-09-02

Similar Documents

Publication Publication Date Title
WO2023020129A1 (en) Control method and apparatus for work machine, and work machine
CN102304932B (en) Land leveler leveling control system, control method and land leveler
CN112962709B (en) Engineering mechanical equipment, operation track planning method and system thereof and storage medium
US6076030A (en) Learning system and method for optimizing control of autonomous earthmoving machinery
JPH10217170A (en) Method and system to control motion
Azulay et al. Wheel loader scooping controller using deep reinforcement learning
JP7169328B2 (en) Neural Task Planner for Autonomous Vehicles
US11697917B2 (en) Anticipatory modification of machine settings based on predicted operational state transition
CN112943751B (en) Auxiliary job control method, device, electronic equipment and storage medium
CN109778941B (en) Semi-autonomous mining system and method based on reinforcement learning
CN113605483A (en) Automatic operation control method and device for excavator
US20180313061A1 (en) Control system using fuzzy logic to display machine productivity data
EP3896231A1 (en) System and method for automatically performing an earthmoving operation
CN113338371B (en) Excavator flat ground control method and system
JP2022532740A (en) Control mapping of hydraulic machinery
CN114411840B (en) Land leveling control method and device and excavator
US11248365B2 (en) Automated control for excavators
CN114194719A (en) Self-adaptive control method and system for tail scraper and reversed loader of heading machine
JPH11315556A (en) Learning system and method optimizing autonomous control of earth-moving machine
Jin et al. Blended shared control with subgoal adjustment
Borngrund et al. Autonomous navigation of wheel loaders using task decomposition and reinforcement learning
CN115544768A (en) Autonomous excavation operation track generation method and system
CN116197898A (en) Caterpillar robot swing arm control method based on ATD3QN reinforcement learning
CN117738256A (en) Excavator control method and device and excavator
Heimbach et al. Training a robot with limited computing resources to crawl using reinforcement learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE