WO2023020129A1

WO2023020129A1 - Control method and apparatus for work machine, and work machine

Info

Publication number: WO2023020129A1
Application number: PCT/CN2022/102918
Authority: WO
Inventors: 王传宇; 胡立辛; 曾超
Original assignee: 上海三一重机股份有限公司
Priority date: 2021-08-19
Filing date: 2022-06-30
Publication date: 2023-02-23
Also published as: US20230112014A1; CN113684885A; CN113684885B

Abstract

Provided in the present application are a control method and apparatus for a work machine, and a work machine. The method comprises: acquiring the current work state of a work machine; determining the current decision-making behavior of the work machine on the basis of the current work state and a state behavior decision-making model; and controlling the work machine to perform construction work on the basis of a control signal corresponding to the current decision-making behavior. The state behavior decision-making model is obtained by means of training on the basis of a sample work state of the work machine, a sample decision-making behavior and a reward value corresponding to the sample decision-making behavior. The reward value is determined on the basis of an actual position curve and a target position curve of a work part in the work machine. The actual position curve is determined on the basis of the sample decision-making behavior. By means of the method and apparatus and the work machine provided in the present application, the debugging workload of an engineer is reduced, the debugging time is shortened, the debugging cost is decreased, and the intelligent construction level of a work machine is raised.

Description

Working machine control method, device and working machine

Cross References to Related Applications

This application claims the priority of the Chinese patent application filed on August 19, 2021 with the application number 202110956947.X, and the title of the invention is "Operating Machine Control Method, Device, and Operating Machine", which is incorporated herein by reference in its entirety.

technical field

The present application relates to the technical field of mechanical engineering, in particular to a method and device for controlling an operating machine and an operating machine.

Background technique

When the excavator performs compound operations such as leveling or brushing slopes, it is usually completed by experienced operators through combined actions.

In the prior art, in the development of the intelligent function of the excavator, the traditional control algorithm is usually used for debugging, and many state points of the excavator work need to be defined, and each state point needs to be individually debugged for the control algorithm, so that the flat ground Or the control program for brushing the slope achieves the expected accuracy. Due to the complexity of the excavator system, it is very difficult to debug this type of control algorithm, which requires high engineers and is difficult to complete. And it takes a long time and the labor cost is high.

Contents of the invention

The operating machine control method, device and operating machine provided by this application are used to solve the problem of establishing an accurate control model for the operating machine in each operating state and performing a large number of debugging when intelligently controlling the operating machine in the prior art, which takes a long time. Costly technical issues.

The present application provides a method for controlling an operating machine, including:

Obtain the current operating status of the operating machine;

determining the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;

Based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work;

Wherein, the state behavior decision-making model is obtained after training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site are determined; the actual position curve is determined based on the sample decision behavior.

According to the working machine control method provided in this application, the reward value is determined based on the following steps:

selecting a plurality of position points on the actual position curve, and determining a corresponding position point of each position point on the target position curve;

Determine the position weight of each position point;

The reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.

According to the working machine control method provided in the present application, the determination of the reward value based on the distance between each position point and the corresponding position point and the position weight of each position point includes:

determining the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;

determining the moving speed of the working part on the actual position curve;

The reward value is determined based on the coincidence degree and the moving speed.

According to the operating machine control method provided in the present application, the state behavior decision-making model is trained based on the following steps:

Acquiring the last working state of the working machine, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior;

Using the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, respectively;

Based on the sample job state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, an initial model is trained to obtain the state-behavior decision-making model.

According to the operating machine control method provided in the present application, the initial model is trained based on the sample operation state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior to obtain the state-behavior decision-making model, include:

If the coincidence degree between the actual position curve and the target position curve of the operating part in the operation machine is less than the preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.

According to the working machine control method provided in the present application, the working machine is an excavator, and the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.

The present application also provides a working machine control device, including:

an acquisition unit, configured to acquire the current working state of the working machine;

A decision-making unit, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;

A control unit, configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;

Wherein, the state behavior decision-making model is obtained after intensive training based on the sample operation state of the operation machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site in the center are determined; the actual position curve is determined based on the sample decision behavior.

The present application also provides an operating machine, including the above-mentioned operating machine control device.

The present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the operating machine control method is realized A step of.

The present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the working machine control method are realized.

The operating machine control method, device, and operating machine provided in this application perform intensive learning through the sample operating state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operating machine, and the obtained state behavior decision-making model can be based on the operating machine. The current working state determines the current decision-making behavior of the working machine, and controls the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the actual position curve and the target position curve of the working part in the working machine, so that the work The working part of the machine can be constructed according to the set target position curve, and there is no need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, reduces the debugging cost, and improves the Intelligent construction level of operating machinery.

Description of drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the present invention, those of ordinary skill in the art can also obtain other drawings based on these drawings on the premise of not paying creative efforts.

FIG. 1 is a schematic flow chart of the operation machine control method provided by the present application;

Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application;

FIG. 3 is a schematic diagram of deployment of the excavator leveling slope control model provided by the present application;

Fig. 4 is a schematic structural diagram of the operating machine control device provided by the present application;

FIG. 5 is a schematic structural diagram of an electronic device provided by the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the accompanying drawings in this application. Obviously, the described embodiments are part of the embodiments of this application , but not all examples. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

Reinforcement learning is an intelligent algorithm that trains artificial intelligence models based on continuous "trial and error" and rewards for superior strategies. Inspired by this, the technical solution in the embodiment of this application is to let the excavator start from some random initial inputs to perform automatic work, define rewards based on the difference between the actual trajectory and the expected flat path or slope path, and continuously Iteratively optimize the control strategy, and finally realize the artificial intelligence model (that is, the control algorithm) with the function of leveling control or slope control, and use this process to replace the manual debugging or calibration of the control algorithm.

Fig. 1 is a schematic flow chart of the operation machine control method provided by the present application. As shown in Fig. 1, the method includes:

Step 110, acquiring the current working state of the working machine.

Specifically, the working machine is a construction machine capable of carrying out construction work, including an excavator, a crane, a concrete pump truck, a concrete mixer truck, and the like.

The current working state is a state parameter that can represent the state of the working machine when it is performing construction work at the current moment. For example, for an excavator, the current working state can be represented by the telescopic length and extension angle of the bucket, stick, boom, etc. Obtain. It can also include attitude signals and turning angle signals of the upper body.

Step 120: Determine the current decision-making behavior of the operation machine based on the current operation state and the state-behavior decision-making model; wherein, the state-behavior decision-making model is based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior for training The reward value is determined based on the actual position curve and the target position curve of the operating part in the operation machine; the actual position curve is determined based on the sample decision behavior.

Specifically, the current decision-making behavior of the work machine is the construction action performed by the work machine at the current moment. The working machine may have multiple candidate decision-making behaviors at the current moment, and the working machine needs to determine a candidate decision-making behavior as the current decision-making behavior. For example, when an excavator performs leveling operations, its candidate decision-making behavior at the current moment may be that the bucket retracts inward, the bucket extends outward, and so on.

The method of reinforcement learning can be used to input the current operating state of the operating machine into the state behavior decision-making model, and the state-behavior decision-making model analyzes each parameter in the current operating state to determine the current decision-making behavior of the operating machine.

The sample operation state, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior of the operation machine can be collected, and the state-behavior decision-making model can be obtained after training the initial model.

The operating principle of the state-behavior decision-making model is: if the operating machine makes a certain decision-making behavior according to the current operating state, and the decision-making behavior leads to an increase in its corresponding reward value, the tendency of the operating machine to adopt this decision-making behavior in the future will be enhanced. The purpose of the state-behavior decision-making model is to find the optimal decision-making behavior at each moment, so that the operating machine can obtain the maximum reward value after adopting the optimal decision-making behavior.

The working part is the part that works on the working face when the working machine is performing construction work. For example, for an excavator, the bucket is the working part, for a concrete pump truck, the front hose for outputting concrete is the working part, and for a rammer, the rammer is the working part.

The actual position curve of the working part is the curve formed by the actual position of the working part at each moment in the construction process. The actual position curve of the working part can be determined according to the decision-making behavior, that is, it can be determined according to the control signal corresponding to the current decision-making behavior of the working machine after the construction operation. For example, the excavator controls each robot arm according to the control signal corresponding to the current decision-making behavior, and changes the displacement and inclination angle of each robot arm, so that the actual position of the bucket (working part) in contact with the working surface changes, thereby obtaining the excavation model. The actual position curve of the working part of the machine.

The target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process. The target position curve can be determined depending on the work task of the work machine. For example, for grading operations, the excavator's target position curve could be a straight line.

The bonus value can be determined according to the actual position curve and the target position curve of the working part in the working machine. For example, for an excavator, the reward value can be determined according to the actual position curve and the target position curve of the tip of the bucket during construction work. First, determine the coincidence degree between the actual position curve and the target position curve, and the coincidence degree can be determined according to the distance between corresponding points on the two curves. The smaller the distance between corresponding points, the higher the coincidence degree, and the larger the distance between corresponding points, the lower the coincidence degree. The higher the coincidence degree of the two curves, it means that the bucket is flattening or brushing the slope according to the target position curve, and should get a higher reward value. The lower the coincidence degree of the two curves, it means that the bucket is not following the target position curve. Those who use the target position curve to level ground or brush slopes should get a lower reward value. The reward value is proportional to the degree of coincidence, and different reward values can be set according to the degree of coincidence.

Step 130, based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work.

Specifically, after obtaining the current decision-making behavior output by the state-behavior decision-making model, the operating machine is controlled to perform construction work according to the control signal corresponding to the current decision-making behavior. For example, the current decision-making behavior may be in a corresponding relationship with the opening signal of the operating handle of the excavator. After obtaining the current decision-making behavior, the opening signal of the operating handle of the excavator is also obtained. According to the opening signal of the operating handle, each mechanical arm of the excavator is controlled to move, so as to complete the construction operation at the current moment, and reciprocate in this way until the construction work is completed.

The working machine control method provided in the embodiment of the present application performs intensive learning through the sample working state of the working machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current working state of the working machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost. The intelligent construction level of the operating machinery has been improved.

Based on the above embodiments, the reward value is determined based on the following steps:

Select multiple position points on the actual position curve, and determine the corresponding position point of each position point on the target position curve;

Determine the position weight of each position point;

A reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.

Specifically, a plurality of position points may be selected on the actual position curve first. The selection of the position point can be the starting point, midpoint, end point, inflection point, etc. of the curve, and can also be segmented according to the shape of the curve, and the segment point can be used as the position point. The embodiment of the present application does not set specific limitations on the selection of the position point.

After the location points are determined, the corresponding location points of each location point are determined on the target location curve. For example, the starting point of the actual position curve corresponds to the starting point of the target position curve, the end point of the actual position curve corresponds to the end point of the target position curve, the segment point of the actual position curve corresponds to the segment point of the target position curve, etc.

The position weight of each position point can be determined according to the specific position of each position point on the actual position curve, and the position weight indicates the degree of influence of the position point on the shape of the curve. The greater the position weight, the greater the influence of the position point on the shape of the curve. For example, the location weights of the start point, midpoint, and end point can be set to high weight, and the rest of the location points can be set to low weight.

The reward value is determined according to the distance between each location point and the corresponding location point, and the location weight of each location point. For example, the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point can be calculated first, and then the inverse of the product sum can be used as the reward value.

Based on any of the above embodiments, the reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point, including:

Determine the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;

Determine the moving speed of the operating part on the actual position curve;

Based on the coincidence degree and movement speed, the reward value is determined.

Specifically, the coincidence degree between the actual position curve and the target position curve may be determined according to the distance between each position point and the corresponding position point, and the position weight of each position point. For example, the coincidence degree is the reciprocal of the sum of the product of the distance between each position point and the corresponding position point and the position weight of each position point.

In addition to the coincidence degree, an additional indicator can be determined according to the moving speed of the operating part on the actual position curve, which is used to determine the reward value. The faster the moving speed of the operating part on the actual position curve, the higher the operating efficiency, and the greater the reward value.

The moving speed of the working part on the actual position curve can be determined according to the length of the actual position curve and the moving time of the working part.

For example, the weight can be calculated according to the coincidence degree and the coincidence degree, and the moving speed and the moving speed can be used to calculate the weight, obtain the weighted sum, and then use the weighted sum as the reward value.

Based on any of the above-mentioned embodiments, the state-behavior decision-making model is trained based on the following steps:

Obtain the previous operation status, last decision-making behavior, and reward value corresponding to the last decision-making behavior of the working machine;

The last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior;

Based on the sample job status, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain a state-behavior decision-making model.

Specifically, the initial model of the state-behavior decision-making model may use a policy network (Policy Network), a deep Q-network (Deep Q-Network), etc., and the embodiment of the present application does not specifically limit the model type of the initial model.

The state-behavior decision-making model can be trained. Specifically, it can be obtained through the following training methods:

First, collect the last working state, last decision-making behavior, and reward value corresponding to the last decision-making behavior of the working machine in real time. The last job status is the job status at the previous moment at the current moment, and the last decision-making behavior is the decision-making behavior at the previous moment at the current moment. The last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior are respectively used as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior. These sample data come from the real-time data when the work machine is performing the current construction work.

In addition, the sample data can also come from historical data when the work machine performs construction work.

Secondly, according to the sample job status, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to improve the prediction ability of the initial model for the optimal decision-making behavior, and a state-behavior decision-making model is obtained.

The operation machine control method provided by the embodiment of the present application can obtain the state behavior decision model after training the initial model through the real-time data of the work machine, and can realize continuous training. When it uses the real-time data when the work machine performs the current construction operation During training, it is possible to make adjustments to the next action based on real-time data, which greatly shortens the debugging process.

Based on any of the above-mentioned embodiments, based on the sample job status, sample decision-making behavior, and reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain a state-behavior decision-making model, including:

Based on the sample operation state, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to determine the actual position curve of the operation part in the operation machine;

If the coincidence degree between the actual position curve and the target position curve of the working part in the working machine is less than the preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.

Specifically, if the actual position curve of the working part can be obtained after the working machine performs the construction work according to the control signal corresponding to the current decision-making behavior of the sample, the coincidence degree between the actual position curve of the working part and the target position curve is less than the preset coincidence threshold, it indicates that the current initial model training has achieved the training purpose, and the training can be stopped.

If the coincidence degree between the actual position curve and the target position curve of the operating part is greater than or equal to the preset coincidence threshold, it indicates that the training of the current initial model has not achieved the training purpose, and the training should continue. At this point, the current decision-making behavior of the sample can be updated, and iterative training is repeated until the coincidence degree is less than the preset coincidence threshold.

The preset coincidence threshold can be set according to actual needs.

Based on any of the above embodiments, the target position curve of the working part in the working machine is determined based on the construction tasks performed by the working machine.

Specifically, the construction task is an operation item undertaken by the operation machine. For example, for an excavator, its construction tasks may include leveling, slope brushing, and excavation.

The target position curve of the operation site is the curve formed by the expected position of the operation site at each moment in the construction process. The target position curve can be determined depending on the work task of the work machine. For example, for leveling operations, the target position curve of the excavator can be a straight line on the horizontal plane; for slope brushing operations, the target position curve of the excavator can be a straight line inclined to the horizontal plane; A curve can be a curve.

Based on any of the above embodiments, the state behavior decision model is stored in the memory of the work machine in the form of a computer program, so as to be read and executed by the processor of the work machine.

Specifically, the state-behavior decision-making model can be used as a control algorithm and stored in the memory of the working machine in the form of a computer program. The processor of the work machine can read the computer program in the memory and execute the work machine control method.

Based on any of the above embodiments, the working machine is an excavator, and the current working state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body and the turning angle of the upper body.

Specifically, the working machine in the embodiment of the present application may be an excavator, and correspondingly, the current working state may include the attitude parameters of the mechanical arm, the attitude parameters of the upper body, and the slewing angle of the upper body.

The attitude parameters of the robotic arm include the telescopic length and extension angle of each robotic arm. The robotic arm here includes boom, stick and bucket. The telescopic length of each mechanical arm can be obtained through the corresponding cylinder length sensor, and the extension angle of each mechanical arm can be obtained through the corresponding inclination sensor.

The attitude parameter of the upper body can be the three-dimensional attitude angle of the body part of the excavator, which can be obtained through a gyroscope installed on the slewing platform.

The slewing angle of the upper body can be the inclination angle of the body part of the excavator relative to the chassis part, which can be determined by the angle between the extension direction of the boom on the slewing platform and the forward direction of the vehicle.

The current working state may also include other parameters that can determine the working state of the excavator, for example, the moving speed and moving direction of the excavator.

Based on any of the above embodiments, the control signal is the handle opening signal of the excavator.

Specifically, for an excavator, controlling each mechanical arm to carry out construction work is mainly realized by controlling the opening of the handle. For example, the handle of an excavator includes a left operating handle and a right operating handle. The left operating handle controls the stick and slewing platform, and the right operating handle controls the boom and bucket. The opening signal of the handle controls the movement of the corresponding mechanical arm.

Based on any of the above-mentioned embodiments, the present application provides an excavator leveling and slope brushing operation control method based on reinforcement learning, the method comprising:

Step 1. Define the state parameter group required by the reinforcement learning model, including the robot arm attitude sensor signal (cylinder displacement or inclination sensor), upper body attitude signal, upper body rotation angle signal, etc., that is, the combination of these parameters can uniquely determine the current The parameter group of the excavator state.

Step 2. Define the policy function. The input of the strategy function is the current state parameter set (part or all), and the output is the corresponding control signal (handle opening signal) output. The coefficient matrix connecting the input and output parameters is part of the trainable model for reinforcement learning.

Step 3. Define the reward function. The smaller the distance between the actual tooth tip position curve and each point of the expected curve, that is, the higher the degree of overlap between the two curves, the greater the reward value.

Step 4, develop corresponding automatic development and debugging program. Fig. 2 is the training schematic diagram of the excavator brush slope control model provided by the application. As shown in Fig. 2, the training and debugging process of the control model is: real-time collection of sensor signals such as handle, digital oil cylinder and IMU (inertial sensor), and store them in Current state array; output control signal through measurement function; calculate the tooth tip position curve through the real-time sensor return signal; calculate the reward value by combining the obtained curve with the expected tooth tip curve; judge based on the reward value: a) the goal is reached, and the training stops; b ) does not reach the goal, update the strategy function, and iterate repeatedly until the goal is achieved.

Step 5. Figure 3 is a schematic diagram of the deployment of the excavator leveling and slope control model provided by this application. As shown in Figure 3, after the training of the enhanced computing model is completed, it can be directly embedded in the controller and function as a control algorithm Similarly, the state parameters collected in real time are used as input to output real-time control signals.

The reinforcement learning-based excavator leveling and slope brushing operation control method provided by the embodiment of the present application has the following advantages:

1. After setting the automatic reinforcement learning training program, the excavator can automatically debug the control algorithm without human intervention, and traverse all state points for optimization. The workload of debugging the control algorithm is greatly reduced, and the cost of debugging is reduced.

2. Since continuous debugging can be realized, compared with manual debugging, the accuracy can reach or exceed that of manual debugging, and because the next action is adjusted in real time according to the returned data, the entire debugging process time will be greatly shortened.

3. The developed control program can accelerate the development of control algorithms for subsequent excavator models. The model completed by artificial intelligence training has a characteristic: the model can be migrated to similar application scenarios, and can match new application scenarios with only simpler training, that is, transfer learning. Therefore, it can greatly speed up the leveling of new excavator models, and the development of slope control algorithms.

4. The model developed based on reinforcement learning is a black box model rather than a logical mechanism model, and is not easy to be copied or reverse engineered.

Based on any of the above-mentioned embodiments, Fig. 4 is a schematic structural view of the operating machine control device provided by the present application. As shown in Fig. 4, the device includes:

An acquisition unit 410, configured to acquire the current working state of the working machine;

A decision-making unit 420, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;

The control unit 430 is configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;

Among them, the state behavior decision model is obtained after intensive training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the actual position curve and target position of the operation part in the operation machine The coincidence degree between the curves is determined, and the actual position curve is determined based on the sample decision behavior.

The operating machine control device provided in the embodiment of the present application performs intensive learning through the sample operating state of the operating machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, and the obtained state behavior decision-making model can be based on the current operating state of the operating machine. , determine the current decision-making behavior of the working machine, and control the working machine to carry out construction work according to the control signal corresponding to the current decision-making behavior. The reward value is determined according to the coincidence degree between the actual position curve and the target position curve of the working part in the working machine, It enables the working part of the working machine to be constructed according to the set target position curve, and does not need to establish an accurate control model for the working machine in each working state, which reduces the debugging workload of the engineer, shortens the debugging time, and reduces the debugging cost. The intelligent construction level of the working machinery has been improved.

Based on any of the above embodiments, it also includes:

Reward determination unit, for selecting a plurality of position points on the actual position curve, and determining the corresponding position point of each position point on the target position curve; determining the position weight of each position point; based on each position point and the corresponding position The distance between points, and the position weight of each position point determine the reward value.

Based on any of the above embodiments, the reward determination unit is specifically configured to:

Determine the moving speed of the operating part on the actual position curve;

Based on any of the above embodiments, it also includes:

The training unit is used to obtain the last working state of the working machine, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior;

Based on any of the above embodiments, the training unit is also used for:

Based on any of the above embodiments, an embodiment of the present application further provides a work machine, the work machine includes the above work machine control device.

Specifically, the work machine may include the above work machine control device. The above-mentioned control device is used to control the working machine, so that it replaces manual control, and can adjust the next construction action according to the real-time feedback data, shortening the debugging process.

Based on any of the above-mentioned embodiments, FIG. 5 is a schematic structural diagram of an electronic device provided by the present application. As shown in FIG. 5 , the electronic device may include: a processor (Processor) 510, a communication interface (Communications Interface) 520, a memory ) 530 and a communication bus (Communications Bus) 540, wherein the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540. The processor 510 can invoke logic commands in the memory 530 to perform the following methods:

Obtain the current operating state of the operating machine; determine the current decision-making behavior of the operating machine based on the current operating state and the state-behavior decision-making model; The sample operation state, sample decision-making behavior of the working machine, and the reward value corresponding to the sample decision-making behavior are obtained after training; the reward value is determined based on the actual position curve and the target position curve of the working part in the working machine, and the actual position curve is based on The sample decision behavior is determined.

In addition, the above-mentioned logic commands in the memory 530 may be implemented in the form of software function units and may be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several commands are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

The processor in the electronic device provided by the embodiment of the present application can call the logic instruction in the memory to implement the above method, and its specific implementation mode is consistent with the above method implementation mode, and can achieve the same beneficial effect, and will not be repeated here.

An embodiment of the present application also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is implemented to perform the methods provided by the above-mentioned embodiments, for example, including:

When the computer program stored on the non-transitory computer-readable storage medium provided by the embodiment of the present application is executed, the above-mentioned method is realized, and its specific implementation mode is consistent with the aforementioned method implementation mode, and can achieve the same beneficial effect. Let me repeat.

The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

Through the above description of the implementations, those skilled in the art can clearly understand that each implementation can be implemented by means of software plus a necessary general hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic Disc, CD, etc., including several commands to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, rather than limiting them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present application.

Claims

A method for controlling a work machine, comprising:

Obtain the current operating status of the operating machine;

determining the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;

Based on the control signal corresponding to the current decision-making behavior, control the working machine to perform construction work;

Wherein, the state behavior decision-making model is obtained after training based on the sample operation state of the operation machine, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site are determined; the actual position curve is determined based on the sample decision behavior.
The working machine control method according to claim 1, wherein the reward value is determined based on the following steps:

selecting a plurality of position points on the actual position curve, and determining a corresponding position point of each position point on the target position curve;

Determine the position weight of each position point;

The reward value is determined based on the distance between each location point and the corresponding location point, and the location weight of each location point.
The working machine control method according to claim 2, wherein the determining the reward value based on the distance between each location point and the corresponding location point and the location weight of each location point includes:

determining the coincidence degree between the actual position curve and the target position curve based on the distance between each position point and the corresponding position point, and the position weight of each position point;

determining the moving speed of the working part on the actual position curve;

The reward value is determined based on the coincidence degree and the moving speed.
The operation machine control method according to claim 3, wherein the state behavior decision-making model is trained based on the following steps:

Obtaining the last working state, the last decision-making behavior of the working machine, and the reward value corresponding to the last decision-making behavior;

Using the last job state, the last decision-making behavior, and the reward value corresponding to the last decision-making behavior as the sample job status, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, respectively;

Based on the sample job state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, an initial model is trained to obtain the state-behavior decision-making model.
The operation machine control method according to claim 4, wherein, based on the sample operation state, the sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior, the initial model is trained to obtain the state Behavioral decision-making models, including:

If the coincidence degree between the actual position curve and the target position curve of the working part in the working machine is less than a preset coincidence threshold, the training is stopped, and the trained initial model is used as the state behavior decision model.
The method for controlling an operating machine according to any one of claims 1 to 5, wherein the operating machine is an excavator, and the current operating state includes the attitude parameters of the mechanical arm, the attitude parameters of the upper body, and the turning angle of the upper body .
A work machine control device comprising:

an acquisition unit, configured to acquire the current working state of the working machine;

A decision-making unit, configured to determine the current decision-making behavior of the working machine based on the current working state and the state-behavior decision-making model;

A control unit, configured to control the working machine to perform construction work based on the control signal corresponding to the current decision-making behavior;

Wherein, the state behavior decision-making model is obtained after intensive training based on the sample operation state of the operation machine, sample decision-making behavior, and the reward value corresponding to the sample decision-making behavior; the reward value is based on the The actual position curve and the target position curve of the operation site in the center are determined; the actual position curve is determined based on the sample decision behavior.
A working machine comprising the working machine control device as claimed in claim 7 .
An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program according to any one of claims 1 to 6 is realized. The steps of the work machine control method described in the item.
A non-transitory computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of the working machine control method according to any one of claims 1 to 6 are implemented.