CN113848880A

CN113848880A - Agricultural machinery path optimization method based on improved Q-learning

Info

Publication number: CN113848880A
Application number: CN202111006894.1A
Authority: CN
Inventors: 董笑辰; 陶斯友; 纪铁生
Original assignee: CRRC Dalian R&D Co Ltd
Current assignee: CRRC Dalian R&D Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-12-28
Anticipated expiration: 2041-08-30
Also published as: CN113848880B

Abstract

The invention discloses an agricultural machinery path optimization method based on improved Q-learning, which comprises the following steps: s1: determining initial parameters of path planning; s2: translating the original field block boundary to the inside of the field block boundary by a distance L; s3: calculating the minimum span of the working area of the agricultural machine; s4: generating parallel paths of an agricultural machinery working area; s5: calculating the length of the turning path; s6: and optimizing the global path based on the improved Q-learning algorithm. According to the invention, the optimal rotation angle of the original field is calculated, then the parallel path of the working area of the agricultural machine parallel to the boundary when the optimal rotation angle is rotated is generated, and the parallel path is superposed with one edge of the field at the moment, so that the calculation is greatly simplified, and the overall path is optimized based on the improved Q-learning algorithm, and the minimum total length of the agricultural machine in working is determined. The planned overall path of the agricultural machinery is shortest, and the purpose of improving the working efficiency is achieved.

Description

Agricultural machinery path optimization method based on improved Q-learning

Technical Field

The invention relates to the technical field of agricultural machinery path optimization, in particular to an agricultural machinery path optimization method based on improved Q-learning.

Background

According to the working characteristics of the agricultural machine, the travel path of the agricultural machine in the working area is generally a straight path and is required to cover the whole working area with minimum repetition. The turning path connecting two adjacent straight paths is usually determined by the relationship between the distance between the adjacent straight paths and the turning radius of the agricultural machinery, and the distances of different turning paths are different. Because the turning can seriously affect the working efficiency compared with the straight running, and the turning process of the agricultural machine can be approximately regarded as uniform speed, the working efficiency of the agricultural machine can be improved by reducing the turning times and optimizing the turning path.

For a simple convex polygonal field, all straight paths can be connected through simple turning paths, and the main factor influencing the path length of the agricultural machinery is the connection sequence of the straight paths. Therefore, the global path can be optimized by adjusting the sequence of the linear paths, and the problem of agricultural path optimization can be further converted into a hybrid optimization problem. The hybrid optimization problem is an NP-hard problem, and the traditional dynamic planning method, backtracking method and the like have a large amount of calculation by trial and error, and a globally optimal solution is not easy to find.

For a complex field with a complex shape or a barrier in the middle, if the straight paths in the whole area are planned in the same direction, repeated and omitted areas are increased easily, and the working efficiency is reduced. In addition, the turn path may become complex, presenting difficulties to both path planning and path tracking.

Disclosure of Invention

The invention provides an agricultural machinery path optimization method based on improved Q-learning, and aims to solve the technical problems that the traditional dynamic planning method, backtracking method and the like are large in calculation amount through trial and error methods, and a globally optimal solution is not easy to find.

In order to achieve the purpose, the technical scheme of the invention is as follows:

an agricultural machinery path optimization method based on improved Q-learning comprises the following steps:

s1: determining initial parameters of path planning, wherein the initial parameters comprise an original field boundary point set P, the scanning width w of each line of the agricultural machine and the minimum turning radius R of the agricultural machine;

s2: translating the original field block boundary point set P to the interior of the field block boundary by a distance L so as to determine the boundary of the agricultural machinery working area;

s3: establishing an x-y axis rectangular coordinate system, and calculating the minimum span of the agricultural machinery working area to determine the optimal rotation angle of the boundary of the original field block relative to the x axis;

s4: generating a parallel path of the boundary when the working area of the agricultural machine is parallel to the optimal rotation angle so as to determine a straight path of the agricultural machine;

s5: determining the type of the turning path, and calculating the length of the turning path;

s6: and optimizing the agricultural machinery working area path based on an improved Q-learning algorithm to determine the optimal total agricultural machinery working length.

Further, in S3, the method for calculating the minimum span of the working area of the agricultural machine is as follows:

when the boundary of the working area of the agricultural machine is a convex polygon, the turning times n of the agricultural machine are as follows:

where D is the distance from one boundary of the agricultural work area to the apex of the agricultural work area,

y＝y_min,x∈[x_min,x_max] (2)

wherein, y_minIs the minimum value of the boundary of the working area of the agricultural machine on the y axis; y is_maxIs the maximum value of the boundary of the working area of the agricultural machine on the y axis; x is the number of_minIs the minimum value of the boundary of the working area of the agricultural machine on the x axis; x is the number of_maxIs the maximum value of the boundary of the working area of the agricultural machine on the x axis;

the rotation angle of each time of the original field piece is as follows:

in the formula，[x₁,y₁]Is the starting point coordinate of the side parallel to the x axis in the boundary of the working area of the agricultural machine; [ x ] of₂,y₂]Is the end point coordinate of the side parallel to the x-axis in the boundary of the working area of the agricultural machine, theta_tIs a rotation angle, and after the rotation is carried out for multiple times, the rotation angle when the span D is minimum is the optimal rotation angle theta of the working area of the agricultural machine^*。

Further, in S4, the method for determining the straight path of the agricultural machine includes:

s41: the optimal rotation angle is theta^*The straight line is used as a scanning line to translate towards the interior of the working area of the agricultural machinery, and the line is translated for w each time; calculating the number of intersection points of the scanning lines and the boundary of the agricultural machinery working area after each translation;

s42: if the number of the intersection points is 2, the coordinates of the two intersection points are considered to be still in the boundary of the working area of the agricultural machine, and scanning is continued; if the number of the intersection points is 1 or 0, judging that the intersection points exceed the boundary range of the agricultural machinery working area, stopping scanning, and completing the generation process of the parallel path.

Further, the turning path in S5 includes a semicircular shape, a fishtail shape, and a pi shape;

if the distance between adjacent straight paths is equal to two times of the turning radius, namely w is 2R, the turning path is semicircular, and the length of the turning path is pi multiplied by R;

if the distance between adjacent straight paths is less than two times of the turning radius, w is less than 2R; the turning path is fishtail type; the straight-line distance that agricultural machinery needs to travel at this moment is:

l_r＝2R-w (4)

in the formula I_rThe straight line distance required to be driven when the fishtail type turning path agricultural machinery turns is obtained; r is the turning radius; the length of the turning path is (2+ pi) multiplied by R-w;

if the distance between adjacent straight paths is more than two times of the turning radius, w is more than 2R; the turning path is pi-shaped; the straight-line distance that agricultural machinery needs to travel at this moment is:

l_f＝w-2R (5)

in the formula I_fWhen the agricultural machinery is turned along the Pi-shaped turning pathThe required straight-line distance to travel; the length of the turning path at this time is (pi-2) × R + w.

Further, the method for optimizing the global path based on the Q-learning algorithm in step S6 is as follows:

s61: defining a Q value table and initializing the Q value table to start calculation;

s62: defining a state Flag table Flag to store a state quantity of whether each straight line path is connected;

s63: randomly selecting an initial path of the agricultural machine to determine a straight path when the agricultural machine starts to work;

s64: judging whether the straight path is connected or not;

s65: if the straight-line path is connected, directly judging the convergence condition of the Q value table;

if the straight path is not connected, selecting a next action set based on the current state of the agricultural machinery, and calculating the reward value of the next action set to obtain the next action when the reward value is maximum; updating the Q value table and the state Flag table Flag; then judging the convergence condition of the Q value table;

s66: if the calculation Q value table is converged, the calculation is finished;

if the calculated Q value table is not converged, judging whether a convergence element exists in the current Q value table or not;

s67: if the current Q value table does not have the convergence element, repeating S65-S66;

if the convergence element is not present in the current Q value table, the status is latched, the Q value table is updated, and the process is repeated S66.

Further, the calculation method for determining whether the straight path is connected in S64 is as follows:

wherein: f(s)_n) Is the state quantity of whether each path is connected.

Further, the Q value function established by determining the convergence status of the Q value table in step S65 is:

Q(s_c,a′_c)＝Q(s_l,a′_l)+γ*max(r(s_c,f(s_c,δ))) (7)

wherein Q(s)_c,a′_c) Based on the current state s of the agricultural machinery_cAction a 'corresponding to the current state reward of agricultural machinery'_cQ value of(s)_lIs the last state of the current state of the agricultural machine; a'_lThe action which corresponds to the maximum reward of the last state of the current state of the agricultural machinery; a discount factor of γ; r(s)_c,s_n) Is from the current state s of the agricultural machinery_cTo the next state s of the current state of the agricultural machine_nIs the reward function of f(s)_cδ) representation is based on the current state s of the agricultural machine_cAnd at the present state s of the agricultural machine_cThe state set of the optional action set δ, namely:

wherein the content of the first and second substances,

is the first selectable action and m is the number of selectable actions.

Further, the reward function of calculating the reward value in step S65 is set up as:

in the formula, D(s)_c,s_n) Is from the current state of the agricultural machine, s_cTo the next state s of the current state of the agricultural machine_nThe length of the turn path of (a); l is a weighting coefficient.

Further, the method further includes, before the steps S1 to S6:

if there is an obstacle in the field to be optimized or the field to be optimized is not a convex polygon field, the complex field is firstly divided into a plurality of convex polygon sub-regions, and path planning is performed on the convex polygon sub-regions by the method of the steps S1-S6.

Further, after the steps S1 to S6, the method further includes: and planning paths of the sub-areas of the convex polygons to obtain the optimal path of the whole field.

Has the advantages that: according to the agricultural machinery path optimization method based on the improved Q-learning, the optimal rotation angle of an original field block is calculated, then the parallel path of the boundary when the agricultural machinery working area is parallel to the optimal rotation angle is generated, the parallel path is overlapped with one edge of the field block at the moment, therefore, the calculation is greatly simplified, the turning path is planned, the global path is optimized based on the improved Q-learning algorithm, and the minimum total length of the agricultural machinery working is determined. The planned overall path of the agricultural machinery is shortest, and the purpose of improving the working efficiency is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of an overall agricultural machine path optimization method of the present invention;

FIG. 2 is a schematic diagram of planning parallel paths according to the present invention;

FIG. 3 is a schematic diagram of a complex field being divided into a plurality of convex polygons by a cell-decomposition method;

FIG. 4a is a schematic view of a fishtail turning path and its parameters in accordance with the present invention;

FIG. 4b is a schematic view of a semicircular turn path and its parameters in accordance with the present invention;

FIG. 4c is a schematic diagram of a pi turn path and its parameters according to the present invention;

FIG. 5 is a flow chart of an agricultural machinery path optimization method based on improved Q-learning according to the present invention.

Wherein: 1. a straight path; 2. agricultural machinery work area boundary.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment provides an agricultural machinery path optimization method based on improved Q-learning, which comprises the following steps, as shown in the attached figure 1:

s1: determining initial parameters of path planning, wherein the initial parameters comprise an original field boundary point set P, the width w of each line scanning of the agricultural machine and the minimum turning radius of the agricultural machine;

s2: translating the original field block boundary point set P to the inside of the field block boundary by a distance L to determine the boundary of an agricultural machine working area, and further planning a linear path in the boundary of the agricultural machine working area;

in S3, the method for calculating the minimum span of the agricultural machinery working area is as follows: the linear path has simple working mode and high coverage rate, and is used for covering the main working area of the agricultural machinery. The planning of the straight path focuses on finding a better advancing direction so as to reduce the number of turns. For a convex polygonal field without obstacles inside, straight paths in all directions are continuous, and the planning method is simpler. Because the working content of the agricultural machinery is generally fixed and covers the whole field, the distance between every two adjacent straight paths is equal, and therefore, the number of turns is minimized, namely, the minimum span of the convex polygon in the direction perpendicular to the straight paths is found. Since the convex polygon minimum span always occurs in the boundary of a vertex and an edge as shown in fig. 2. Therefore, the optimal direction is parallel to a certain edge of the convex polygon, the specific determination method is to calculate the span between the certain edge and the point which is farthest away from the edge, and the direction of the edge corresponding to the minimum span is selected as the optimal linear path direction of the convex polygon field.

Specifically, when the boundary of the agricultural machinery working area is a convex polygon, the number n of times of turning of the agricultural machinery is as follows:

d is the distance from one boundary of the agricultural machinery working area to the vertex of the agricultural machinery working area, and the scanning line is a straight line parallel to the x axis in the x-y axis rectangular coordinate system;

y＝y_min,x∈[x_min,x_max] (2)

wherein, y_minIs the minimum value of the boundary of the working area of the agricultural machine on the y axis; y is_maxIs the maximum value of the boundary of the working area of the agricultural machine on the y axis; x is the number of_minIs the minimum value of the boundary of the working area of the agricultural machine on the x axis; x is the number of_maxIs the maximum value of the boundary of the working area of the agricultural machine on the x axis; d ═ y_max-y_minTherefore, the number of turns is reduced, i.e., D is reduced;

the rotation angle of each time of the original field piece is as follows:

wherein [ x ]₁,y₁]Is the starting point coordinate of the side parallel to the x axis in the boundary of the working area of the agricultural machine; [ x ] of₂,y₂]Is the end point coordinate of the side parallel to the x-axis in the boundary of the working area of the agricultural machine, theta_tIs a rotation angle, and after the rotation is carried out for multiple times, the rotation angle when the span D is minimum is the optimal rotation angle theta of the working area of the agricultural machine^*；

S4: generating a parallel path of the boundary when the working area of the agricultural machine is parallel to the optimal rotation angle so as to determine a straight path of the agricultural machine; the method specifically comprises the following steps:

s41: the optimal rotation angle is theta^*The straight line is used as a scanning line to translate towards the interior of the working area of the agricultural machine, and the line translates w each time; calculating the number of intersection points of the scanning lines and the boundary of the agricultural machinery working area after each translation;

specifically, the turning path in S5 includes a semicircular shape, a fishtail shape and a pi shape, as shown in fig. 4;

specifically, since the running speed of the agricultural machine is slow, the turning path of the agricultural machine can be regarded as an arc with a fixed radius, and the turning radius of the planned path is assumed to be R. In addition, the distance between two adjacent straight paths, namely the width w scanned by each row of the agricultural machinery, is equal, so that the turning path is related to the distance between the adjacent straight paths and the turning radius.

if the spacing between adjacent straight paths is less than twice the turn radius, i.e. w<2R; the turning path is fishtail type; at the moment, the agricultural machinery can not turn for 180 degrees at one time. So that the vehicle is first turned to a quarter circle and then driven straight backwards_rAfter the distance, turn one quarter circle again and finish turning, wherein:

l_r＝2R-w (4)

if the spacing between adjacent straight paths is greater than twice the turn radius, i.e. w>2R; rotating shaftThe curved path is pi-shaped; at this time, the agricultural machinery firstly rotates a quarter circle and then moves forwards in a straight line_fAfter the distance, turn one quarter circle again and finish turning, wherein:

l_f＝w-2R (5)

in the formula I_fThe straight line distance required to be driven when the agricultural machinery turns on the pi-shaped turning path; the length of the turning path at this time is (pi-2) × R + w.

S6: optimizing the path of the agricultural machinery working area based on a Q-learning algorithm to determine the optimal total working length of the agricultural machinery;

the Q-learning-based algorithm is a classic reinforcement learning algorithm, and the algorithm optimizes the decision of the intelligent agent through the interaction result of the intelligent agent and the environment. Specifically, r (S, a) is the instant award given to the agent by the environment (i.e., the straight path of the agricultural machinery when performing action a) when the agent (i.e., the agricultural machinery) performs action a in the S state (S ∈ S) at a certain time. The agent will make an assessment of each action by performing a series of actions, from an initial state to a target state, the environment will select the optimal sequence of actions by maximizing the reward.

The method for optimizing the global path based on the improved Q-learning algorithm comprises the following steps:

s61: defining a Q value table and initializing the Q value table so as to start iterative calculation; the initial values of the Q value table are all 0;

s62: defining a state Flag table Flag to store a state quantity of whether each path is connected; the initial values of the state Flag table Flag are all 1;

s63: randomly selecting an initial straight path of the agricultural machine to determine the straight path when the agricultural machine starts to work;

s64: judging whether the straight path is connected or not;

preferably, the calculation method for determining whether the straight path is connected in S64 is as follows:

F(s_n) Is the state quantity of whether each path is connected or not, and is used for recording whether the straight path is connected or not, if s_nHas been connected, it is set to 0, otherwise to 1Namely:

specifically, the agricultural path optimization is to reduce the total length of the path, and on the premise of not repeatedly walking any straight path, the global path length is related to the length of the turning path, and the length of the turning path is related to the type of the turning path, so that the global path planning can be optimized by adjusting the type of the turning path. Meanwhile, the type of the turning path is determined according to the relation between the distance and the turning radius of two adjacent (i.e. front-back connected) straight paths, so that the current straight path of the agricultural machinery (i.e. the intelligent agent) can be regarded as the current state s_cThe selection of the next straight path is an action a_cThe next linear path selected is the state s at the next time_nThus, the Q-value function to be optimized is established as:

Q(s_c,a′_c)＝Q(s_l,a′_l)+γ*max(r(s_c,f(s_c,δ))) (7)

wherein Q(s)_c,a′_c) Based on the current state s of the agricultural machinery_cAction a 'corresponding to the current state reward of agricultural machinery'_cQ value of(s)_lIs the last state of the current state of the agricultural machine; a'_lThe action which corresponds to the maximum reward of the last state of the current state of the agricultural machinery; discount factor of gamma: (0<γ<1)；r(s_c,s_n) Is from the current state s of the agricultural machinery_cTo the next state s of the current state of the agricultural machine_nIs the reward function of f(s)_cA) represents s based on the current state of the agricultural machine_cAnd at the present state s of the agricultural machine_cThe next possible state set for optional action set a, namely:

wherein the content of the first and second substances,

is the first selectable action, m is the number of selectable actions;

for the problem of agricultural path optimization, the reward function of calculating the reward value in step S65 is set up as:

in the formula, D(s)_c,s_n) Is derived from the current state (i.e. straight path) s of the agricultural machine_cTo the next state s of the current state of the agricultural machine_nThe length of the turn path of (a); l is a weighting coefficient, and the length of the longest turning path or straight path can be selected, so that the front and the rear items are in the same order of magnitude;

if the convergence element is not present in the current Q-value table, the state is latched, the Q-value table is updated according to the transfer relationship, and the step S66 is repeated.

Preferably, the method for optimizing the global path based on the improved Q-learning algorithm of the present invention is shown in fig. 5: assuming that a certain problem has i states and j inputs, and combining the agricultural machinery path optimization problem, firstly defining a Q value table with the initial value of each element being 0 and a state Flag table with the initial value of each element being 1, wherein the state Flag table is used for storing the state quantity of whether each path is connected or not. In this embodiment, the dimension of the Q-value table is (k × 4), where m ═ d/w ], where [ ] is the rounding calculation, and k is the number of straight paths; the dimension of the state Flag table Flag is (k × 1); secondly, determining a calculated initial state (namely an agricultural machinery starting straight path), and then finding an optimal path by the method for optimizing the global path based on the Q-learning algorithm.

The specific one-time iteration method comprises the steps of determining a possible next state set according to the current state, calculating the reward value corresponding to each state in the set, and selecting the state with the maximum reward to update a Q value table and a state Flag table Flag; and (5) the iterative process is circulated until the Q value table is converged, namely the optimal path is found. It is noted that in the iterative process, when an element of the Q-value table converges, it is locked, i.e. the decision of no longer optimizing the action of the corresponding state of the element is made. Since there are a finite number of actions corresponding to a state, when an element is locked, the next optimal state can be determined, and the state can be locked at the same time. The calculation method of the invention can avoid calculating the reward values of all possible next states each time, and after an optimal state is determined, the next state can be quickly determined according to the transfer relationship, namely the current state, thereby greatly reducing the calculation amount.

Theoretically, all the straight-line paths can become the next state, which greatly increases the iteration complexity of the Q-learning algorithm. Moreover, the repeated path increases the length of the path, which is contrary to the optimization goal, and if a path too far is used as an optional path, the calculation amount becomes large. In addition, for a pi-shaped turning path, if l_fToo large is detrimental to minimizing the total path length, so the present invention sets f(s) of the next possible states_cδ) is limited to a distance close to the current straight path, in particular the current state (i.e. straight path) s of the agricultural machine being connected_cNext state s of current state of agricultural machine_nThe distance between the two is less than or equal to 4 w.

Preferably, in this embodiment, before the steps S1-S6, the method further includes: if the inside of the field to be optimized is provided with the obstacle or the field to be optimized is not a convex polygonal field, the path planning process is complex, the path is also complex, and the repetition rate and the leakage rate of the working path of the agricultural machine are improved. Then, firstly, a complex field is divided into a plurality of convex polygonal sub-areas by adopting a mature cell-decomposition method, as shown in fig. 3, specifically, the field is divided into a plurality of small areas, such as the areas numbered 1-7 in fig. 3, by making parallel lines through vertexes on all boundaries (including outer boundaries, obstacles and the like) of the field. The areas divided in this way are all convex polygons, the invention verifies whether the area composed of the sub-areas adjacent to each other (with common boundary) is a convex polygon, if so, the area composed is the final divided sub-area (such as the area composed of sub-areas 1 and 2 and 5 and 6 in fig. 3). So as to plan the path of the sub-regions of the convex polygons by the methods of the steps S1-S6.

Preferably, after the steps S1-S6, the method further comprises: and planning paths of the sub-areas of the convex polygons to obtain the optimal path of the whole field. Specifically, for a complex field, after path planning in the sub-area of each small convex polygon is completed, paths of the small convex polygons are connected to complete path planning of the whole field. In order to minimize the total path length, the path between the sub-areas is a straight path. Automated agricultural machinery generally works in large farmlands, so the number of convex polygons is much lower than the number of straight paths in a convex polygon. In this embodiment, the shortest connection path between convex polygons is selected by an enumeration method. Specifically, an initial field block is selected first, and then any convex polygon is selected as a target to be connected until all convex polygons are connected. And taking the sequence of all the fields as a combination, calculating the path lengths of all the combinations, and selecting the shortest combination as an optimal connection scheme to finish path planning of the whole field.

The invention has the advantages that:

1: the method decomposes the complicated farmland into a plurality of simple sub-regions, decomposes the planning problem of the global path into the problems of path planning in the sub-regions and path planning between the sub-regions, and reduces the complexity of farmland path planning. Meanwhile, an agricultural machinery path optimization method based on improved Q-learning is introduced for path planning, so that the working efficiency of the agricultural machinery is improved.

2: the method solves the problem of large calculation amount of the traditional algorithm by introducing the improved Q-learning algorithm, introduces the state mark table by combining the characteristics of the agricultural machinery path, and designs the appropriate path planning termination condition.

3: the invention locks the state which has reached the optimum in the algorithm iteration process, and then completes the optimization of the adjacent state rapidly through the transfer relationship. The computational load of the iterative process can be greatly reduced.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An agricultural machinery path optimization method based on improved Q-learning is characterized by comprising the following steps:

2. The agricultural machinery path optimization method based on the improved Q-learning of claim 1, wherein in the step S3, the method for calculating the minimum span of the agricultural machinery working area is as follows:

y＝y_min,x∈[x_min,x_max] (2)

the rotation angle of each time of the original field piece is as follows:

wherein [ x ]₁,y₁]Is the starting point coordinate of the side parallel to the x axis in the boundary of the working area of the agricultural machine; [ x ] of₂,y₂]Is the end point coordinate of the side parallel to the x-axis in the boundary of the working area of the agricultural machine, theta_tIs a rotation angle, and after the rotation is carried out for multiple times, the rotation angle when the span D is minimum is the optimal rotation angle theta of the working area of the agricultural machine^*。

3. The method for optimizing the path of an agricultural machine based on the improved Q-learning of claim 1, wherein in S4, the method for determining the straight path of the agricultural machine is as follows:

4. The method for optimizing the agricultural machinery path based on the improved Q-learning of claim 1, wherein the turning path in S5 comprises a semicircle type, a fishtail type and a pi type;

l_r＝2R-w (4)

l_f＝w-2R (5)

5. The method for optimizing the agricultural machinery path based on the improved Q-learning of claim 1, wherein the method for optimizing the global path based on the Q-learning algorithm in the step S6 is as follows:

s64: judging whether the straight path is connected or not;

6. The agricultural machinery path optimization method based on the improved Q-learning of claim 5, wherein the calculation method for judging whether the straight path is connected in the S64 is as follows:

wherein: f(s)_n) Is the state quantity of whether each path is connected.

7. The method as claimed in claim 5, wherein the Q-value function established by determining the convergence status of the Q-value table in step S65 is:

Q(s_c,a′_c)＝Q(s_l,a′_l)+γ*max(r(s_c,f(s_c,δ))) (7)

wherein the content of the first and second substances,

is the first selectable action and m is the number of selectable actions.

8. The method for optimizing Q-learning based agricultural machinery path according to claim 5, wherein the reward function of calculating the reward value in step S65 is established as:

in the formula, D(s)_c,s_n) Is thatFrom the current state of the agricultural machine, s_cTo the next state s of the current state of the agricultural machine_nThe length of the turn path of (a); l is a weighting coefficient.

9. The method for optimizing the agricultural machinery path based on the improved Q-learning of claim 1, further comprising, before the steps S1-S6:

10. The method for optimizing the agricultural machinery path based on the improved Q-learning of claim 1, further comprising after the steps S1-S6: and planning paths of the sub-areas of the convex polygons to obtain the optimal path of the whole field.