CN114564521A

CN114564521A - Method and system for determining working time period of agricultural machine based on clustering algorithm

Info

Publication number: CN114564521A
Application number: CN202210214084.3A
Authority: CN
Inventors: 高一平; 高佳杰; 黄登道; 杨光元; 王桢; 陈小秋
Original assignee: Zoomlion Smart Agriculture Co ltd
Current assignee: Zoomlion Smart Agriculture Co ltd
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-05-31

Abstract

The invention discloses a method and a system for determining the working time period of an agricultural machine based on a clustering algorithm, which comprises the steps of collecting longitude and latitude data of the movement of the agricultural machine and recording the collection time; preprocessing the longitude and latitude data to generate a data set which can be directly used for clustering operation; carrying out clustering operation on the data in the data set to obtain a primary clustering result, wherein the primary clustering result comprises a road data point set and a field area data point set; correcting the data point set of the field area to obtain a final clustering result; screening out data point sets of the field areas from the final clustering result, and calculating the minimum value and the maximum value of sampling time in each field area data point set to determine the start-up period of the agricultural machinery; the method has few types of collected data, only needs to collect time and longitude and latitude coordinate data of agricultural equipment, and can effectively improve the running speed and efficiency of a program while solving problems in practical application so as to realize processing of a large amount of data within limited time.

Description

Method and system for determining working time period of agricultural machine based on clustering algorithm

Technical Field

The invention relates to the technical field of agricultural production management, in particular to a method and a system for determining an agricultural machinery working time period based on a clustering algorithm.

Background

The judgment of the working state of the agricultural machine and the statistics of the working time are common work in agricultural production, and for the agricultural machine, the working time is an important basis for judging the actual working measurement and judging whether maintenance is needed; in addition, the state greatly promotes the development of intelligent agriculture, and the working time statistics of agricultural machinery is an important basis for subsidy of agricultural machinery.

The existing method for judging the working state of the agricultural machinery and recording the working time mainly comprises the steps of manually recording the corresponding time when the equipment starts and finishes working and timing by utilizing an equipment sensor; however, the manual labor for working hour statistics is time-consuming and labor-consuming, and is easy to make mistakes, and the problems of low efficiency and low accuracy are caused; and the sensor timing is utilized to install the sensor on the equipment, but some equipment which leaves factory earlier does not have corresponding hardware conditions.

With the development of the technology, a method for calculating the working time of agricultural equipment by using a GPS track exists at present, a method for clustering the movement track of the agricultural machine is used for calculating the working time of the agricultural machine, but the number of clustered points is only used for calculating the working state and the working time of the agricultural machine, the situations of in-situ stay or small-range turning and the like after the equipment is started are not considered, and the problems of easy misjudgment and low accuracy rate exist; although various misjudgments of the track at the joint of the road and the field are discussed and processed in the patent CN112905576A, the problem of program misjudgments is solved, these methods are mostly suitable for farmland mu-counting or field-road distinguishing, are not applied to man-hour statistics, and have many types of data to be collected, and there are defects that some data cannot be collected, the program running speed is slow, and the efficiency is low in practical application.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method for determining an agricultural machinery working time period based on a clustering algorithm, the method has few types of collected data, only needs to collect time and longitude and latitude coordinate data of agricultural machinery equipment, and in practical application, can effectively improve the running speed and efficiency of a program while solving problems, so as to realize processing of a large amount of data within a limited time.

The invention also provides a system for determining the working time period of the agricultural machinery based on the clustering algorithm.

The first technical scheme adopted by the invention is as follows: a method for determining an agricultural machinery working time period based on a clustering algorithm comprises the following steps:

s100: collecting longitude and latitude data of the movement of the agricultural machine and recording the collection time;

s200: preprocessing the longitude and latitude data to generate a data set which can be directly used for clustering operation;

s300: performing clustering operation on the data in the data set to obtain a primary clustering result, wherein the primary clustering result comprises a road data point set and a field area data point set;

s400: correcting the field area data point set to obtain a final clustering result;

s500: and screening out data point sets of the field areas from the final clustering result, and calculating the minimum value and the maximum value of the sampling time in each field area data point set to determine the start-up period of the agricultural machinery.

Preferably, the preprocessing of the latitude and longitude data in the step S200 includes the following sub-steps:

s210: resampling the longitude and latitude data based on the acquisition time;

s220: performing coordinate transformation on the data after resampling based on a Gaussian projection method to obtain a plane coordinate data set;

s230: the planar coordinate data set is format converted to generate a data set that can be used directly for clustering operations.

Preferably, the step S210 specifically includes: and resampling longitude and latitude data of the agricultural machinery movement every 9-11 s based on the acquisition time.

Preferably, the step S220 includes:

and converting the resampled longitude and latitude data into plane coordinates by adopting a Gaussian projection method, obtaining a minimum value of horizontal and vertical coordinates from the plane coordinates of all the points as a new coordinate origin, and performing translation processing on the plane coordinates of all the points based on the new coordinate origin to obtain a plane coordinate data set.

Preferably, the step S230 includes: and converting the format of the plane coordinate data by adopting a map function to generate a data set in a list format.

Preferably, the preliminary clustering result in step S300 is obtained by the following sub-steps:

s310: clustering operation is carried out on the data in the data set based on a clustering algorithm to obtain the distribution condition of points in a coordinate system;

s320: and performing preliminary segmentation on the operation track of the agricultural machine to obtain a preliminary clustering result based on the distribution condition of points in the coordinate system, the working characteristics of the agricultural machine and the running characteristics on the road.

Preferably, the step S400 of correcting the field region data point set in the preliminary clustering result includes the following sub-steps:

s410: performing point screening on the data point sets of the field block areas in the primary clustering result to remove the data point sets which do not belong to the field block areas;

s420: and calculating the area of the field block area data point set subjected to point number screening by adopting a polygon area calculation method so as to remove the data point set which does not belong to the field block area.

Preferably, the step S410 includes: and if the number of the points in the field area data point set is less than a first set threshold value, removing the field area data point set.

Preferably, the step S420 includes:

if the number of the points in the data point set of the field area exceeds a first set threshold value, the area of the data point set of the field area is further calculated by adopting a polygon area calculation method, and if the area of the data point set of the field area is smaller than a second set threshold value, the data point set of the field area is removed.

The second technical scheme adopted by the invention is as follows: a system for determining the working time period of an agricultural machine based on a clustering algorithm comprises an acquisition module, a preprocessing module, a clustering operation module, a correction module and a working time period determination module;

the acquisition module is used for acquiring longitude and latitude data of agricultural machinery movement and recording acquisition time;

the preprocessing module is used for preprocessing the longitude and latitude data to generate a data set which can be directly used for clustering operation;

the clustering operation module is used for carrying out clustering operation on the data in the data set to obtain a primary clustering result, and the primary clustering result comprises a road data point set and a field area data point set;

the correction module is used for correcting the field area data point set to obtain a final clustering result;

and the working time period determining module is used for screening out data point sets of the field areas from the final clustering result and calculating the minimum value and the maximum value of the sampling time in each data point set of the field areas so as to determine the working time period of the agricultural machinery.

The beneficial effects of the above technical scheme are that:

(1) the method for determining the working time period of the agricultural machine based on the clustering algorithm is few in data types, only time and longitude and latitude coordinate data of agricultural equipment need to be acquired, and in practical application, the problems can be solved, the running speed and efficiency of a program can be effectively improved, and a large amount of data can be processed within limited time.

(2) According to the method for determining the working time period of the agricultural machinery based on the clustering algorithm, the map function is utilized to process data, and the program operation efficiency is effectively improved; by adopting the method for determining the working time period of the agricultural machinery, the calculation time of the data volume of a single device per day is reduced from the original 30min to about 2min, and the processing time is reduced by 93%.

(3) In the invention, the clustering area is roughly calculated by a polygon area calculation method, so that an error result caused by in-situ stay of agricultural equipment is eliminated; by screening the points of the cluster point set, error results caused by short-time stay of the agricultural machine are screened out, and the accuracy rate of determining the working time period of the agricultural machine is effectively improved.

(4) According to the method, the specific working time periods of the agricultural machinery equipment are obtained through the agricultural machinery track information, the starting and stopping time error of each working time period is not more than 5min, and the accuracy is high.

Drawings

Fig. 1 is a schematic flow chart of a method for determining an agricultural machinery working time period based on a clustering algorithm according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the clustering effect provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a system for determining an agricultural machinery working time period based on a clustering algorithm according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following detailed description of the embodiments and the accompanying drawings are provided to illustrate the principles of the invention and are not intended to limit the scope of the invention, which is defined by the claims, i.e., the invention is not limited to the preferred embodiments described.

In the description of the present invention, it is to be noted that, unless otherwise specified, "a plurality" means two or more; the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; the specific meaning of the above terms in the present invention can be understood as appropriate to those of ordinary skill in the art.

Example one

Fig. 1 is a method for determining an agricultural machinery working time period based on a clustering algorithm according to an embodiment of the present invention, including the following steps:

s100: the method comprises the following steps of collecting longitude and latitude data of agricultural machinery movement by using a GPS data sensor installed on the agricultural machinery and recording collection time;

installing a positioning equipment terminal (GPS data sensor) on the agricultural machinery, and acquiring GPS positioning information every 3-5 s, wherein the GPS positioning information comprises longitude, latitude and acquisition time; because the acquired data volume is too large (1-2 GB per day on average), and the mysql database cannot bear the load, the method adopts the hdfs database to store the acquired data, the hdfs database is a column type storage method, and the hivesql is used to call the data stored in the hdfs database when needed.

S200: preprocessing collected longitude and latitude data of agricultural machinery movement to generate a data set which can be directly used for clustering operation;

the preprocessing of the collected longitude and latitude data of the agricultural machinery movement comprises the following substeps:

s210: resampling longitude and latitude data of agricultural machinery movement based on the acquisition time;

resampling the collected longitude and latitude data of the agricultural machinery movement every 9-11 s by taking the collection time as a reference standard, wherein the time difference between two adjacent data after resampling is 9-11 s, and preferably 10 s; the collected longitude and latitude data of the agricultural machinery movement are resampled based on the collection time, so that the technical effects of facilitating calculation and reducing the calculation amount are achieved.

S220: carrying out coordinate transformation on the data after resampling based on a Gaussian projection method to obtain a plane coordinate data set;

converting the resampled longitude and latitude data into plane coordinates by adopting a Gaussian projection method, and obtaining the minimum value (x) of the horizontal and vertical coordinates from the plane coordinates (projection results) of all the points_min,y_min) As a new origin of coordinates, x_minAnd y_minNot necessarily belonging to the same point; and translating the plane coordinates of all the points based on the new coordinate origin to obtain a new data point set, namely a plane coordinate data set, and storing the plane coordinate data set in a corresponding data table of the hdfs database for subsequent calling.

The Gaussian projection method is a projection method for projecting a curved surface to a plane, and spherical coordinates of longitude and latitude data are converted into plane coordinates through the Gaussian projection method, so that the size of subsequent squares is effectively reduced, the subsequent calculation amount is reduced, and the precision of a result is improved; because the sql language does not have a function for directly realizing the gaussian projection method, a self-programming function is needed and embedded into the sql to run, the function is written in java language, and the final expression form is as follows:

transformstogaussxy(longitude,latitude)as gaussxy

wherein, transformstogaussxy is a self-defined sql function, and has the function of converting a spherical longitude and latitude coordinate (latitude) into a plane coordinate; the gausssxy is a projection result obtained by using a Gaussian projection method, is a two-dimensional coordinate, and can obtain horizontal and vertical data gaussx and gausssy of the coordinate through splitting; the sql code will be arranged to run on spark to improve efficiency.

S230: carrying out format conversion on the plane coordinate data set to generate a data set which can be directly used for clustering operation;

because the plane coordinate data stored in the hdfs database is column-type data, the clustering operation needs to calculate the distance between each piece of data, namely the distance of the physical position of the agricultural machinery during sampling, and judges the distribution condition and the density condition of points representing the actual physical position of the agricultural machinery in a coordinate system according to the calculation results, namely logically, the clustering operation needs to read the data in a row unit; although the results of the read call using the sql statement are consistent in form (all tabular and with column names), in internal logic, each column is held together, with the invention employing data (x)_n,y_n,t_n) For example, the column-type storage is to store all x, y and t separately and read them sequentially according to the sequence of x, y and t when called, and this storage method can store more data, but it is not like the line-type storage in the occasion that needs to finely process the data because it can not modify a certain line or a certain data separately; therefore, since the plane coordinate data stored in the hdfs database is stored in units of columns, it cannot be directly used for the clustering operation of reading data in units of rows.

The method adopts the map function to convert the format of the plane coordinate data to generate the data set in the list format, and is realized by the following steps:

when data are extracted from the database, each piece of data has seven values (horizontal, work data protocol, unique, tertiary, time, longitudinal _ gauss, and latitude _ gauss), the first four values are used for describing the operation equipment, and the last three values are respectively the operation time, the gaussian abscissa and the gaussian ordinate of the operation equipment; when the map function is adopted for data conversion, the data are grouped by taking the former four values as keys, each group of data is ensured to be generated by the same equipment, and specific codes are as follows:

groupdf

＝df.map(lambdax:((x.vehicleid,x.workdataprotocol,x.uniqueid,x.terminalid),(x.longitude_gauss,x.latitude_gauss,x.time_gauss,x.time))).groupByKey()

the time _ gauss is a converted result of the acquisition time, and the calculation process is similar to coordinate conversion so as to facilitate subsequent calculation; the calculation result groupdf is a required list set, and the operation trajectory data of a single device can be obtained by splitting for subsequent calculation.

According to the invention, the plane coordinate data is subjected to format conversion by adopting the map function so as to generate data which can be directly used for clustering operation, so that the efficiency of subsequent calculation is effectively improved; compared with the data format conversion by adopting a Python self-contained toPandas function, the method adopts the map function to convert the format of the plane coordinate data, effectively improves the conversion efficiency and further can meet the requirement of daily operation.

S300: directly carrying out clustering operation on the data in the data set to obtain a primary clustering result, wherein the primary clustering result comprises a road data point set and a field area data point set;

s310: clustering operation is carried out on data in the data set based on a clustering algorithm to obtain the distribution condition of points in a coordinate system;

the data after format conversion meets the requirements of a Clustering algorithm, and a DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) Clustering algorithm is adopted to perform Clustering operation on a data set directly used for Clustering operation in three dimensions of x, y and t; the essence of the clustering algorithm is to calculate the distance between each data point and judge the distribution and density of the points in the coordinate system according to the operation result.

S320: based on the distribution condition of points in a coordinate system, the working characteristics of agricultural machinery and the running characteristics on roads, preliminarily dividing the operation track of the agricultural machinery to obtain a preliminary clustering result, wherein the preliminary clustering result comprises a road data point set (road part) and a field area data point set (farmland part);

(1) the operating characteristics of the agricultural machine include: firstly, the moving speed is slower than that of the road driving, and the trace point density is higher in the point set; a working scene is generally rectangular and has a certain area; positive correlation is formed between the working duration and the size of the working scene;

(2) the operation characteristics of the agricultural machine when running on the road comprise: firstly, the moving speed is high, and the points are distributed sparsely; ② the shape of the point set is generally strip-shaped, and the number of the points is less.

The agricultural machinery equipment can acquire a series of position information in the operation process to form an operation track, and whether the equipment is in a working state or not is judged according to the distribution condition of the operation track; for a single device, a set of job trajectories sorted by acquisition time may be defined as:

P＝｛(x₁,y₁,t₁),(x₁,y₂,t₂)…,(x_n,y_n,t_n)}

in the formula (x)_i,y_i,t_i) (i ═ 1,2, …, n) indicates the abscissa, ordinate, and data sampling time of the ith data point, respectively.

In the application scenario of the invention, points of the agricultural machinery working between fields are more dense, points on the road are relatively sparse, according to the differentiation of different densities of the agricultural machinery working tracks on the fields and the road, an algorithm divides all the points into two parts, namely points in the fields and points on the road, wherein the points on the road form a set (road data point set), and the points in the fields are divided into a plurality of sets (field area data point sets) according to different density areas and time areas so as to represent different working periods.

Compared with the dividing and hierarchical clustering method, the DBSCAN adopted by the invention defines the clusters as the maximum set of points connected by density, can divide the area with high enough density into clusters, and can find clusters with any shapes in a noise spatial database.

S400: and correcting the data point set of the field area in the primary clustering result to obtain a final clustering result.

Performing preliminary segmentation according to the working characteristics of agricultural machinery and the running characteristics on roads, so that partial error results exist in the preliminary clustering result, for example, when agricultural equipment is started for preheating or turns in a small range, although no displacement exists, the sensor still uploads the position data of the equipment, and at the moment, the point set characteristics are that the density of the point set is high, but the contained range of the whole point set is small; therefore, the preliminary clustering result needs to be further screened based on the working characteristics of agricultural equipment, and some misjudgment conditions need to be screened and corrected.

The step of correcting the field block area data point set in the preliminary clustering result comprises the following substeps:

s410: performing point screening on the field region data point sets in the primary clustering result to remove data point sets which do not belong to the field region;

in the data preprocessing stage, a program can perform primary screening on a running track point set of equipment, so that the acquisition time interval of two adjacent points is kept about 10 s; after the preliminary clustering is carried out, if the number of points in a field block area data point set representing the field working period is less than a first set threshold (for example, 60), namely the working time of the equipment is less than 10min, the equipment in the period is considered not to be in a working state; at the moment, the density of the data points reaches the standard (can be identified by a clustering algorithm) because the equipment stays in place or turns at the intersection and does not accord with the requirement of field work, so that the result which does not accord with the requirement is eliminated (the wrong primary clustering result, namely the data point set is not the data point set of the field area);

s420: calculating the area of the field block area data point set subjected to point number screening by adopting a polygon area calculation method to remove the data point set which does not belong to the field block area;

for the field area data point set with the point number exceeding a first set threshold (for example, 60), the area of the field area data point set is calculated by adopting a polygon area calculation method, if the area covered by the field area data point set is smaller than a second set threshold (for example, 10 m)²) (the second set threshold is set based on the experience that the land cultivated by using the agricultural equipment should have a larger area), the equipment in the time period corresponding to the field block data point set is not in the field, but stays in the original place or turns at the intersection, and is not in accordance with the requirements of the field work, so that the non-satisfactory result (the wrong primary clustering result) is eliminated.

S500: and screening the data point sets of the field areas based on the final clustering result, and calculating the minimum value and the maximum value of the sampling time in each data point set of the field areas to determine the start-up period of the agricultural machinery.

As shown in fig. 2, each field area in the final clustering result is clearly distinguished, wherein the black sparse points represent the road part in the track, and the field areas marked by black frames and densely clustered by points are the field parts in the track;

after DBSCAN clustering and subsequent screening work, obtaining all field area data point sets meeting the requirements (each data point set represents one field working time period of the equipment); and calculating the minimum value and the maximum value of the data sampling time in each data point set of the field area to obtain the start time and the stop time of the agricultural equipment in the working time period.

Example two

Fig. 3 is a system for determining an agricultural machinery working time period based on a clustering algorithm according to an embodiment of the present invention, which includes an acquisition module, a preprocessing module, a clustering operation module, a correction module, and a working time period determination module;

and the working time period determining module is used for screening out the data point sets of the field areas from the final clustering result and calculating the minimum value and the maximum value of the sampling time in each data point set of the field areas so as to determine the working time period of the agricultural machinery.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for determining an agricultural machinery working time period based on a clustering algorithm is characterized by comprising the following steps:

2. The method for determining the working time period of the agricultural machinery as claimed in claim 1, wherein the preprocessing of the latitude and longitude data in the step S200 comprises the following sub-steps:

s210: resampling the longitude and latitude data based on the acquisition time;

3. The method for determining the working time period of the agricultural machine according to claim 2, wherein the step S210 specifically comprises: and resampling longitude and latitude data of the agricultural machinery movement every 9-11 s based on the acquisition time.

4. The method for determining an agricultural machinery working time period according to claim 2, wherein the step S220 comprises:

5. The method for determining an agricultural machinery working time period according to claim 2, wherein the step S230 comprises: and converting the format of the plane coordinate data by adopting a map function to generate a data set in a list format.

6. The method for determining the agricultural machinery working time period according to claim 1, wherein the preliminary clustering result in the step S300 is obtained by the following sub-steps:

7. The method for determining an agricultural machinery working time period according to claim 1, wherein the step S400 of correcting the field block data point set in the preliminary clustering result comprises the following sub-steps:

8. The method for determining an agricultural machinery working time period according to claim 7, wherein the step S410 comprises: and if the number of the points in the field area data point set is less than a first set threshold value, removing the field area data point set.

9. The method for determining an agricultural machinery working time period according to claim 8, wherein the step S420 comprises:

10. A system for determining the working time period of agricultural machinery based on a clustering algorithm is characterized by comprising an acquisition module, a preprocessing module, a clustering operation module, a correction module and a working time period determination module;