CN115310719B

CN115310719B - Design method of farmland soil sampling plan based on three-stage k-means

Info

Publication number: CN115310719B
Application number: CN202211125514.0A
Authority: CN
Inventors: 齐清文; 王永吉
Original assignee: Institute of Geographic Sciences and Natural Resources of CAS
Current assignee: Institute of Geographic Sciences and Natural Resources of CAS
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-04-18
Anticipated expiration: 2042-09-16
Also published as: CN115310719A

Abstract

The invention discloses a method for designing a farmland soil sampling scheme based on three-stage k-means, including: S1, based on DEM data, using the k-means method and the geographic detector method to form farmland sub-regions with spatial differentiation, and According to the area ratio, allocate the number of soil samples in the farmland sub-region; S2. Based on the NDVI data, use the local variation coefficient CV and k-means method to subdivide the farmland sub-region obtained by k-means in the first stage to form a local space A series of sub-patches with similar variation levels, and obtain the number of soil samples for each sub-patch according to the area and local spatial variation level; S3. Based on the remote sensing yield estimation data, use the k-means method and variance statistics to determine the representative soil The spatial location of the sample. The invention comprehensively considers the spatial heterogeneity of soil and the level of local spatial variation, and can more scientifically and rationally design farmland soil sampling schemes to obtain good-quality soil spatial data.

Description

Design method of farmland soil sampling plan based on three-stage k-means

技术领域technical field

本发明属于地理信息系统技术领域，公开了一种基于三阶段k-means的农田土壤采样方案设计方法。The invention belongs to the technical field of geographic information systems, and discloses a three-stage k-means-based farmland soil sampling scheme design method.

背景技术Background technique

中国以约占世界耕地面积7%的土地，养活了世界将近20%的人口。但近年来，我国在现代农业发展过程中存在诸多问题（农业生产成本高、化肥或农药的不合理使用、农业机械化程度低以及土壤板结等），导致人口与环境安全和粮食安全的矛盾日益突出。精准农业是现代农业的新趋势，是现代农业从资源投入型转变为科技型的关键手段，是农业信息化、机械化和现代化相结合的标志，将有助于解决上述矛盾。国家也一直非常重视精准农业的发展问题。近年来，随着城镇化进程的不断加快，许多农民进城务工，导致农村土地被闲置，农村土地集约化经营将成为新的时代趋势，这也为精准农业的顺利发展和实施提供了新契机。精准农业的具体含义是根据农田土壤养分、种植作物生长所需土壤养分以及目标产量，合理调整农业物资投入，在满足作物生长需要前提下，达到提高农业物资利用率和农业生产力、节约资源、保护农业环境等目的。China feeds nearly 20% of the world's population with about 7% of the world's arable land. However, in recent years, there have been many problems in the development of modern agriculture in my country (high agricultural production costs, unreasonable use of chemical fertilizers or pesticides, low degree of agricultural mechanization, and soil compaction, etc.), leading to increasingly prominent contradictions between population, environmental security and food security . Precision agriculture is a new trend of modern agriculture, a key means for modern agriculture to transform from resource-input to technology-based, and a symbol of the combination of agricultural informatization, mechanization and modernization, and will help resolve the above contradictions. The country has always attached great importance to the development of precision agriculture. In recent years, with the continuous acceleration of urbanization, many farmers have migrated to cities to work, resulting in idle rural land. Intensive management of rural land will become a new trend of the times, which also provides a new opportunity for the smooth development and implementation of precision agriculture. . The specific meaning of precision agriculture is to rationally adjust the input of agricultural materials according to the soil nutrients in the farmland, the soil nutrients required for the growth of crops, and the target yield, so as to improve the utilization rate of agricultural materials and agricultural productivity, save resources, and protect Agricultural environment and other purposes.

土壤作为地球表层系统的重要组成部分，具有空间分异性。土壤养分供应与作物生长需求养分在空间上的不协调性是影响作物产量的根本性原因。因此，充分了解农田土壤养分的空间分异情况，将有助于实现精准农业的发展目标。目前来看，掌握土壤养分空间分布的主要方法还是通过采集农田实地土壤样本，在实验室内进行理化分析测定土壤养分含量。然而，大规模地、密集地采集土壤样本，会耗费大量的人力、物力和财力。土壤采样就是通过获取少量的关键土壤样点养分信息，进而通过一定的土壤预测模型来估计农田土壤养分空间分布的科学方法，能够使采样费用和期望的土壤预测精度之间达到较好的平衡。不同的采样方案将会获取不同的农田土壤养分空间分布特征。研究和选择合理的土壤采样方法将是获取可靠的农田土壤养分空间分布的有力保障。As an important part of the earth's surface system, soil has spatial differentiation. The spatial inconsistency between soil nutrient supply and crop growth demand is the fundamental reason affecting crop yield. Therefore, fully understanding the spatial differentiation of farmland soil nutrients will help to achieve the development goals of precision agriculture. At present, the main method to grasp the spatial distribution of soil nutrients is to collect field soil samples from farmland and perform physical and chemical analysis in the laboratory to determine soil nutrient content. However, collecting soil samples on a large scale and intensively will consume a lot of manpower, material resources and financial resources. Soil sampling is a scientific method to estimate the spatial distribution of farmland soil nutrients through a certain soil prediction model by obtaining a small amount of key soil sample point nutrient information, which can achieve a better balance between sampling costs and expected soil prediction accuracy. Different sampling schemes will obtain different spatial distribution characteristics of farmland soil nutrients. Research and selection of reasonable soil sampling methods will be a powerful guarantee for obtaining reliable spatial distribution of soil nutrients in farmland.

现阶段土壤采样方法主要分为基于设计的采样方法和基于模型的采样方法。基于设计的土壤采样方法是根据先验知识定量化确定最优样本量，然后利用随机抽样、系统抽样和分层抽样等方式确定样本布局。例如，王劲峰等人基于分层抽样，并考虑空间分异性，提出了“三明治”空间抽样模型。另外，部分研究根据土壤与环境变量的协同变化关系，利用环境变量辅助设计抽样方案以提高土壤抽样效率。基于模型的土壤采样方法则是以地统计学理论为基础，通过一定准则使该实现能够准确拟合总体。该方法通过最小化模型估算方差来设计最优的样点数量和空间分布格局。例如，王振华等人将模糊集理论与抽样检验理论相结合，提出了基于空间数据质量检验的二级抽样模型。除上述土壤采样方法外，最近的研究动态还包括基于空间推测不确定性的补样方法和考虑可达性的采样方法等。At present, soil sampling methods are mainly divided into design-based sampling methods and model-based sampling methods. The design-based soil sampling method is to quantitatively determine the optimal sample size based on prior knowledge, and then use random sampling, systematic sampling and stratified sampling to determine the sample layout. For example, Wang Jinfeng et al. proposed a “sandwich” spatial sampling model based on stratified sampling and considering spatial variability. In addition, according to the synergistic relationship between soil and environmental variables, some studies use environmental variables to assist in the design of sampling plans to improve soil sampling efficiency. Model-based soil sampling methods are based on geostatistical theory, and through certain criteria, the implementation can accurately fit the population. This method designs the optimal sample number and spatial distribution pattern by minimizing the model estimation variance. For example, Wang Zhenhua and others combined fuzzy set theory with sampling inspection theory and proposed a two-level sampling model based on spatial data quality inspection. In addition to the above-mentioned soil sampling methods, recent research trends also include resampling methods based on spatially extrapolated uncertainties and sampling methods considering accessibility, etc.

基于模型的土壤采样方法较基于设计的土壤采样方法而言，可以充分挖掘研究区域土壤的结构信息，从而更能获取关键点的土壤信息。但基于模型的土壤采样方法大多是以地统计学理论为基础，使用该类型的方法进行采样方案的设计需要依赖于变异函数，而变异函数通常只有在采样后才能知道，或者采用该研究区之前的土壤资料进行变异函数建模。农业区域由于人类活动的强干预可能导致不同年份之间土壤的空间分布情况出现较大的差异。在这种情况下，将无法使用该农业区域之前的土壤资料进行变异系数建模。因此，基于设计的土壤采样方法可能更适合应用于精准农业。不同的作物类型吸收的土壤养分不同，农业区域复杂的种植结构以及人为的施肥管理将会导致该区域的土壤出现一定的局部空间分异性，迫切需要发展一种新的基于设计的土壤采样方法，通过考虑农业区域土壤的局部空间分异性以识别和获取关键点的土壤信息，进而提高精准农业数字土壤制图的精准度。Compared with the design-based soil sampling method, the model-based soil sampling method can fully excavate the structural information of the soil in the study area, so that the soil information of key points can be obtained better. However, most model-based soil sampling methods are based on geostatistical theory. The design of sampling schemes using this type of method needs to rely on the variation function, and the variation function is usually known only after sampling, or before adopting the study area. Variation function modeling of the soil data. Due to the strong intervention of human activities in agricultural areas, there may be large differences in the spatial distribution of soil between different years. In this case, it will not be possible to model the coefficient of variation using previous soil data for that agricultural area. Therefore, design-based soil sampling methods may be more suitable for application in precision agriculture. Different crop types absorb different soil nutrients. The complex planting structure and artificial fertilization management in agricultural areas will lead to certain local spatial differences in the soil in this area. It is urgent to develop a new soil sampling method based on design. By considering the local spatial variability of soil in agricultural areas to identify and obtain soil information at key points, the accuracy of digital soil mapping for precision agriculture can be improved.

发明内容Contents of the invention

针对上述技术问题，本发明提供一种基于三阶段k-means的农田土壤采样方案设计方法，力图通过挖掘与土壤息息相关的环境信息，利用三阶段k-means方法，推断土壤的局部空间分异性，确定代表性样本的空间位置，形成农田土壤采样方案。其中，第一阶段k-means处理，构建具有空间分异性的农田子区域并合理分配土壤采样数量；第二阶段k-means处理，形成具有相似的土壤局部空间变异水平的子斑块并获取相应子斑块的采样数量；第三阶段k-means处理，确定代表性样本的空间位置。In view of the above technical problems, the present invention provides a method for designing a farmland soil sampling scheme based on three-stage k-means, and tries to infer the local spatial variability of soil by mining the environmental information closely related to the soil and using the three-stage k-means method. Determine the spatial location of representative samples to form a farmland soil sampling plan. Among them, the first stage of k-means processing is to construct farmland sub-regions with spatial differentiation and reasonably allocate the number of soil samples; the second stage of k-means processing is to form sub-patches with similar soil local spatial variation levels and obtain corresponding The sampling number of sub-patches; the third stage of k-means processing determines the spatial location of representative samples.

基于三阶段k-means的农田土壤采样方案设计方法，包括如下步骤：The design method of farmland soil sampling scheme based on three-stage k-means includes the following steps:

S1、基于DEM数据，利用k-means方法和地理探测器方法，形成具有空间分异性的农田子区域，并依据面积比例，分配农田子区域的土壤采样数量，实现第一阶段的k-means处理；S1. Based on the DEM data, use the k-means method and the geographic detector method to form farmland sub-regions with spatial differentiation, and allocate the number of soil samples in the farmland sub-regions according to the area ratio to realize the first stage of k-means processing ;

S1具体包括以下子步骤：S1 specifically includes the following sub-steps:

S11、利用k-means方法构建具有空间分异性的农田子区域：S11. Using the k-means method to construct farmland sub-regions with spatial differentiation:

基于DEM数据，利用k-means方法，对农田进行划分，形成区域内地形条件相似、区域间地形条件不同的农田子区域，即具有空间分异性；Based on the DEM data, the k-means method is used to divide the farmland to form farmland sub-regions with similar topographic conditions within the region and different topographic conditions between regions, that is, spatial differentiation;

S12、利用地理探测器确定最佳的农田子区域数量：S12. Using geographic detectors to determine the optimal number of farmland sub-regions:

基于DEM数据，利用地理探测器中的Q值来探测不同数量下的农田子区域的空间分异状况，得到Q值的变化曲线，并选择拐点对应的数量作为最佳的农田子区域数量；Based on the DEM data, use the Q value in the geographic detector to detect the spatial differentiation of farmland sub-regions under different numbers, obtain the change curve of Q value, and select the number corresponding to the inflection point as the optimal number of farmland sub-regions;

S13、分配各农田子区域的土壤采样数量：S13. Assign the number of soil samples for each farmland sub-region:

土壤采样方案设计均是先确定样本量，再确定样本位置；基于农田子区域划分结果，计算其面积，并以此为权重，分配各农田子区域的土壤采样数量。The design of the soil sampling plan is to determine the sample size first, and then determine the sample location; based on the results of the division of farmland sub-regions, calculate its area, and use this as a weight to allocate the number of soil samples for each farmland sub-region.

S2、基于NDVI数据，利用局部变异系数CV和k-means方法，在第一阶段k-means获取的农田子区域内部进行细分，形成局部空间变异水平相似的一系列子斑块，并依据面积和局部空间变异水平，获取各子斑块的土壤采样数量，实现第二阶段的k-means处理；S2. Based on the NDVI data, use the local variation coefficient CV and k-means method to subdivide the farmland sub-region obtained by k-means in the first stage to form a series of sub-patches with similar levels of local spatial variation. and the level of local spatial variation, obtain the number of soil samples of each sub-patch, and realize the second stage of k-means processing;

S21、推断农田土壤的局部空间变异水平：S21. Infer the local spatial variation level of farmland soil:

选择与作物长势息息相关的NDVI数据，利用局部变异系数CV，推断农田土壤的局部空间变异水平，为土壤样点的进一步分配，提供数据支撑；Select the NDVI data that is closely related to crop growth, use the local variation coefficient CV to infer the local spatial variation level of farmland soil, and provide data support for the further allocation of soil samples;

S22、生成具有相似的土壤局部空间变异水平的子斑块S22. Generate sub-patches with similar soil local spatial variation levels

基于农田土壤的局部空间变异水平计算结果，利用k-means方法，在农田子区域内部，形成具有相似的土壤局部空间变异水平的子斑块；各农田子区域内部的聚类数目按照各农田子区域的土壤采样数量的一半进行决定；Based on the calculation results of the local spatial variation level of farmland soil, using the k-means method, sub-patches with similar soil local spatial variation levels are formed within the farmland sub-region; the number of clusters in each farmland sub-region is determined according to the Half of the number of soil samples in the area shall be determined;

S23、获取相应子斑块的采样数量：S23. Obtain the sampling quantity of the corresponding sub-patch:

基于子斑块划分结果，计算其面积和局部变异系数CV，并以此为权重，进一步分配土壤样点，从而获取相应子斑块的采样数量，为确定土壤采样的空间位置，提供定量支持。Based on the sub-patch division results, calculate its area and local variation coefficient CV, and use this as a weight to further allocate soil samples, so as to obtain the sampling number of the corresponding sub-patch, and provide quantitative support for determining the spatial location of soil sampling.

S3、基于遥感估产数据，利用k-means方法和方差统计手段，确定代表性土壤样本的空间位置，实现第三阶段的k-means处理，完成农田土壤采样方案的设计。S3. Based on the remote sensing yield estimation data, use the k-means method and variance statistics to determine the spatial location of representative soil samples, realize the third stage of k-means processing, and complete the design of the farmland soil sampling plan.

S3具体包括以下子步骤：S3 specifically includes the following sub-steps:

S31、在子斑块内部形成作物产量水平相似的子集：S31, forming subsets with similar crop yield levels inside the subpatch:

基于作物的遥感估产数据，利用k-means方法，在各子斑块内部，形成具有相似作物产量水平的子集；各子斑块的聚类数等于相应子斑块的采样数量；Based on crop remote sensing yield estimation data, using the k-means method, a subset with similar crop yield levels is formed within each sub-patch; the clustering number of each sub-patch is equal to the sampling number of the corresponding sub-patch;

S32、确定代表性样本的空间位置：S32. Determine the spatial position of the representative sample:

经历上述过程后，每个子集具有相似的地形条件、局部土壤变异水平以及作物产量水平；为确定代表性样本的空间位置，计算期望方差；然后，在各子集内部随机选择一个样点组成样点集合，并搜索最接近期望方差的样点集合作为采样点，从而确定代表性样本的空间位置，形成农田土壤采样方案。After going through the above process, each subset has similar terrain conditions, local soil variation levels, and crop yield levels; in order to determine the spatial location of representative samples, the expected variance is calculated; then, a sampling point is randomly selected within each subset to form a sample The sample point set is searched for the sample point set closest to the expected variance as the sampling point, so as to determine the spatial position of the representative sample and form the farmland soil sampling plan.

本发明综合考虑土壤的空间分异性以及局部空间变异水平，可更加科学地、合理地设计农田土壤采样方案，以获取良好质量的土壤空间数据，并进一步生产精准的农田土壤空间分布结果，为变量施肥、种植结构调整等精准农业决策提供技术和数据支撑。The present invention comprehensively considers the spatial heterogeneity of soil and the level of local spatial variation, and can more scientifically and rationally design farmland soil sampling schemes to obtain good-quality soil spatial data, and further produce accurate results of farmland soil spatial distribution, which are variables Provide technical and data support for precision agricultural decisions such as fertilization and planting structure adjustment.

附图说明Description of drawings

图1为本发明的总体技术流程图Fig. 1 is overall technical flow chart of the present invention

图2为实施例的研究区Fig. 2 is the research area of embodiment

图3为实施例的第一阶段k-means处理结果Fig. 3 is the first stage k-means processing result of embodiment

图4为实施例的图4局部变异系数CV计算及第二阶段k-means处理结果Fig. 4 is Fig. 4 local variation coefficient CV calculation of embodiment and second stage k-means processing result

图5为实施例的第三阶段k-means处理及土壤采样方案结果；Fig. 5 is the third stage k-means processing and soil sampling program result of embodiment;

图6为不同采样方法对比：（a）本发明的三阶段k-means采样，（b）分层随机采样，（c）k-means采样和（d）规则格网采样；Figure 6 is a comparison of different sampling methods: (a) three-stage k-means sampling of the present invention, (b) stratified random sampling, (c) k-means sampling and (d) regular grid sampling;

图7为不同采样方法的土壤SOM属性制图及误差分布：（a）本发明的三阶段k-means采样，（b）分层随机采样，（c）k-means采样和（d）规则格网采样。Fig. 7 is the soil SOM attribute mapping and error distribution of different sampling methods: (a) three-stage k-means sampling of the present invention, (b) stratified random sampling, (c) k-means sampling and (d) regular grid sampling.

具体实施方法Specific implementation method

结合实施例说明本发明的具体技术方案。The specific technical solutions of the present invention are described in conjunction with the examples.

如图1所示，基于三阶段k-means的农田土壤采样方案设计方法，本实施例的研究区域如图2所示。该方法包括如下步骤：As shown in FIG. 1 , the design method of farmland soil sampling scheme based on three-stage k-means, the research area of this embodiment is shown in FIG. 2 . The method comprises the steps of:

S1、构建具有空间分异性的农田子区域，合理分配土壤采样数量（第一阶段k-means）S1. Construct farmland sub-regions with spatial differentiation and rationally distribute the number of soil samples (k-means in the first stage)

S11 利用k-means方法构建具有空间分异性的农田子区域S11 Use the k-means method to construct farmland sub-regions with spatial differentiation

土壤采样的最终目的就是通过少量的土壤样本获取和表达研究区域土壤的空间分布状态。土壤作为地理的一种数据类型，具有空间分异性。空间分异性，全称空间分层异质性（spatial stratified heterogeneity），是指某一属性值在不同类型或区域之间存在差异，例如土地利用图、气候分带、生态分区、地理区划等等。部分学者会借助与土壤息息相关的环境变量，将研究区域划分为具有空间分异性的子区域，以辅助土壤采样方案的设计。其中，地形是最常用的环境变量之一。鉴于此，在本实施例中，基于DEM数据，利用k-means方法，对农田进行划分，形成区域内地形条件相似、区域间地形条件不同的农田子区域（即具有空间分异性）。The ultimate goal of soil sampling is to obtain and express the spatial distribution of soil in the study area through a small amount of soil samples. As a data type of geography, soil has spatial differentiation. Spatial heterogeneity, the full name of which is spatial stratified heterogeneity, refers to the difference in the value of an attribute between different types or regions, such as land use maps, climate zoning, ecological zoning, geographical zoning, etc. Some scholars will use environmental variables closely related to soil to divide the research area into sub-areas with spatial differentiation to assist in the design of soil sampling schemes. Among them, terrain is one of the most commonly used environment variables. In view of this, in this embodiment, based on the DEM data, the k-means method is used to divide the farmland to form farmland sub-regions with similar terrain conditions within the region and different terrain conditions between regions (that is, with spatial differentiation).

k-means方法是一种迭代求解的聚类分析算法，其步骤是，预将数据分为K组，则随机选取K个对象作为初始的聚类中心，然后计算每个对象与各个种子聚类中心之间的距离，把每个对象分配给距离它最近的聚类中心。聚类中心以及分配给它们的对象就代表一个聚类。每分配一个样本，聚类的聚类中心会根据聚类中现有的对象被重新计算。这个过程将不断重复直到满足某个终止条件。The k-means method is a clustering analysis algorithm for iterative solution. Its steps are to pre-divide the data into K groups, then randomly select K objects as the initial cluster centers, and then calculate the relationship between each object and each seed cluster. The distance between centers, assigning each object to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Each time a sample is assigned, the cluster center of the cluster is recalculated based on the existing objects in the cluster. This process will be repeated until a certain termination condition is met.

S12利用地理探测器确定最佳的农田子区域数量S12 Determining the Optimal Number of Farmland Subregions Using Geographic Detectors

在利用k-means方法获取具有空间分异性的农田子区域前，需要指定具体的聚类数目。不同的聚类数目，会导致农田子区域具有不同的空间分异性。空间分异可用地理探测器Q统计来识别、检验、寻找和归因。Before using the k-means method to obtain spatially differentiated farmland sub-regions, it is necessary to specify a specific number of clusters. Different numbers of clusters will lead to different spatial differentiation of farmland sub-regions. Spatial differentiation can be identified, tested, found, and attributed using geodetector Q statistics.

地理探测器是探测空间分层异质性并揭示其内部驱动力的一种方法。其核心思想是，若某一自变量对因变量有重要影响的话，那么该自变量的空间分布应该与因变量的空间分布具有相似性，是探测并利用地理现象空间分异性的工具。地理探测器包含分异和因子探测、交互作用探测、生态探测和风险区探测。分异和因子探测的目标是探测因变量的空间分异性，并探测某一自变量多大程度上解释了因变量的空间分异。交互探测的目标是识别不同因变量之间的交互作用，评估交互因子的共同作用会增强还是减弱其对因变量的解释程度。Geodetectors are a way to probe spatially stratified heterogeneity and reveal its inner driving forces. Its core idea is that if an independent variable has an important influence on the dependent variable, then the spatial distribution of the independent variable should be similar to that of the dependent variable, and it is a tool for detecting and utilizing the spatial differentiation of geographical phenomena. Geographic detectors include differentiation and factor detection, interaction detection, ecological detection and risk area detection. The goal of differentiation and factor detection is to detect the spatial variation of the dependent variable and to detect the extent to which an independent variable explains the spatial variation of the dependent variable. The goal of interaction detection is to identify the interaction between different dependent variables and to assess whether the interaction of the interaction factors will enhance or weaken their explanation of the dependent variable.

本方法中选择地理探测器中分异和因子探测（Q统计）来分析农田子区域的空间分异性，模型如下：In this method, the differentiation and factor detection (Q statistics) in geographic detectors are selected to analyze the spatial differentiation of farmland sub-regions. The model is as follows:

（1） (1)

式中，为农田的分类或分区（农田子区域），和分别为农田的分类和全区的面积；和分别是DEM变量的农田分类和全区的方差。，值越大，说明其空间分异性越明显。In the formula, for the classification or zoning of croplands (cropland subregions), and Respectively, the classification of farmland and the area of the whole area; and are the variance of the farmland classification and the whole area of the DEM variable, respectively. , The larger the value, the more obvious the spatial differentiation.

在本实施例中，基于DEM数据，利用地理探测器中的Q值来探测不同数量下的农田子区域的空间分异状况，并绘制Q值的变化曲线，并选择拐点对应的数量作为最佳的农田子区域数量。In this embodiment, based on the DEM data, the Q value in the geographical detector is used to detect the spatial differentiation of different numbers of farmland sub-regions, and the change curve of the Q value is drawn, and the number corresponding to the inflection point is selected as the optimal The number of farmland subregions.

S13合理分配各农田子区域的土壤采样数量S13 Rationally allocate the number of soil samples in each farmland sub-region

具有空间分异性的农田子区域，是合理分配土壤采样数量的最根本依据。首先，依据土壤制图精度或者经费预算限制，确定农田土壤采样方案的总体样本量；然后，计算各农田子区域的面积，并以此为权重，对个农田子区域的土壤采样数量进行合理分配。公式如下所示：Spatially differentiated farmland subregions are the most fundamental basis for rationally allocating the number of soil samples. First, determine the overall sample size of the farmland soil sampling plan according to the accuracy of soil mapping or budget constraints; then, calculate the area of each farmland sub-region, and use this as a weight to reasonably allocate the number of soil samples in each farmland sub-region. The formula looks like this:

（2） (2)

式中，SN是农田土壤采样方案的总体样本量，是农田子区域h的面积，是农田子区域h的土壤采样数量。In the formula, SN is the overall sample size of the farmland soil sampling plan, is the area of farmland subregion h, is the number of soil samples in farmland subregion h.

S2、形成具有相似的土壤局部空间变异水平的子斑块，获取相应子斑块的采样数量（第二阶段k-means）S2. Form sub-patches with similar soil local spatial variation levels, and obtain the sampling number of corresponding sub-patches (the second stage k-means)

S21 推断农田土壤的局部空间变异水平S21 Inferring the level of local spatial variation of farmland soils

局部空间变化强烈的土壤区域需要布置更多的土壤样点，以获取精准的土壤空间分布结果。而不同区域的农田土壤具有不同的局部土壤空间变异水平。局部的土壤状况，会影响到农田作物的长势情况。因此，可以通过农田作物的长势情况反推土壤的局部状况。在本研究中，选择与作物长势息息相关的NDVI数据，利用局部变异系数CV，推断农田土壤的局部空间变异水平，为土壤样点的进一步分配，提供数据支撑。Soil areas with strong local spatial variation need to arrange more soil samples to obtain accurate soil spatial distribution results. However, farmland soils in different regions have different levels of local soil spatial variation. Local soil conditions will affect the growth of farmland crops. Therefore, the local conditions of the soil can be inferred from the growth conditions of farmland crops. In this study, the NDVI data closely related to crop growth were selected, and the local variation coefficient CV was used to infer the local spatial variation level of farmland soil, which provided data support for the further allocation of soil samples.

局部变异系数CV是相对数形式表示的变异指标。它是通过变异指标中的全距、平均差或标准差与平均数对比得到的，常用的是标准差系数。变异系数的应用条件是当所对比的两个数列的水平高低不同时，就不能采用全距、平均差或标准差百行对比分析，因为它们都是绝对指标。CV公式如下所示：The local coefficient of variation CV is a variation index expressed in relative number form. It is obtained by comparing the range, mean deviation or standard deviation in the variation index with the mean, and the standard deviation coefficient is commonly used. The application condition of the coefficient of variation is that when the levels of the two series to be compared are different, the range, mean difference or standard deviation cannot be used for comparative analysis of hundreds of lines, because they are all absolute indicators. The CV formula looks like this:

（3） (3)

式中，是计算窗口内的NDVI值之一, 计算窗口内的DNVI均值。其中，计算窗口的大小需要根据研究区域的大小进行确定。In the formula, is one of the NDVI values within the calculation window, Computes the mean of DNVI over a window. Among them, the size of the calculation window needs to be determined according to the size of the research area.

S22生成具有相似的土壤局部空间变异水平的子斑块S22 generates subpatches with similar levels of soil local spatial variation

在各农田子区域内部，土壤的局部空间变异水平会有一定的差异。局部空间变异水平高的土壤区域需要设置更多的土壤采样点，以获取精准的土壤空间分布结果。为实现此目的，需要对农田子区域进行进一步划分，生成具有相似的土壤局部空间变异水平的子斑块。Within each farmland sub-region, the local spatial variation level of soil will have certain differences. Soil areas with high levels of local spatial variation need to set more soil sampling points to obtain accurate soil spatial distribution results. To achieve this, sub-regions of farmland need to be further divided to generate sub-patches with similar levels of local spatial variation in soil.

在本实施例中，基于农田土壤的局部空间变异水平计算结果，利用k-means方法，在农田子区域内部，形成具有相似的土壤局部空间变异水平的子斑块。为确保每个子斑块内均有采样点，各农田子区域内部的聚类数目按照各农田子区域的土壤采样数量的一半进行决定。In this embodiment, based on the calculation results of the local spatial variation level of farmland soil, the k-means method is used to form sub-patches with similar local soil spatial variation levels within the farmland sub-region. In order to ensure that there are sampling points in each sub-patch, the number of clusters in each farmland sub-region is determined according to half of the number of soil samples in each farmland sub-region.

S23获取相应子斑块的采样数量S23 Obtain the sampling number of the corresponding sub-patch

子斑块的面积越大，局部土壤空间变异水平越高，需求的采样点也就越多。在本实施例中，基于子斑块划分结果，计算其面积和局部变异系数CV，并以此为权重，进一步分配土壤样点，从而获取相应子斑块的采样数量，为确定土壤采样的空间位置，提供定量支持。子斑块的采样数量确定如下式所示：The larger the sub-patch area, the higher the level of local soil spatial variation, and the more sampling points are required. In this embodiment, based on the sub-patch division results, calculate its area and local coefficient of variation CV, and use this as a weight to further allocate soil sample points, so as to obtain the sampling number of the corresponding sub-patch, in order to determine the soil sampling space position, providing quantitative support. The sampling number of sub-patch is determined as follows:

（4） (4)

式中，和是农田子区域 h内部的子斑块 l的面积和CV。 In the formula, and is the area and CV of the sub-patch l inside the farmland sub-region h .

图4为局部变异系数CV计算及第二阶段k-means处理结果。Figure 4 shows the results of local variation coefficient CV calculation and the second stage k-means processing.

S3、确定代表性样本的空间位置（第三阶段k-means）S3. Determine the spatial location of representative samples (the third stage k-means)

S31 在子斑块内部形成作物产量水平相似的子集S31 Form sub-groups with similar crop yield levels within the sub-patch

在各子斑块内部，土壤的局部空间变异水平相似，但其局部土壤的空间分布状况并不一致。同一块田的气候条件、温度条件以及灾害条件相似，因此作物产量可以很大长度上解释土壤状况。Within each sub-patch, the local spatial variation level of the soil was similar, but the spatial distribution of the local soil was not consistent. Climatic conditions, temperature conditions, and disaster conditions in the same field are similar, so crop yield can explain soil conditions to a large extent.

随着遥感技术的发展，大面积的作物快速估产成为了可能。其中，WOFOST模型是最常用的作物遥感估产方法。WOFOST（WOrldFOodSTudies）模型是荷兰Wageningen农业大学和世界粮食研究中心（CWFS）共同开发研制，模拟特定土壤和气候条件下一年生作物生长的动态解释性模型。模型着重强调定量土地评价、区域产量预报、风险分析和年际间产量变化及气候变化影响的量化应用。模型以同化作用、呼吸作用、蒸腾作用和干物质分配等作物生理生态过程为模拟基础，主要包括潜在生长条件、水分限制条件和养分限制条件下作物生长的模拟。With the development of remote sensing technology, it is possible to quickly estimate the yield of large-scale crops. Among them, the WOFOST model is the most commonly used crop remote sensing yield estimation method. The WOFOST (WOrldFOodSTudies) model is jointly developed by Wageningen Agricultural University in the Netherlands and the Center for World Food Research (CWFS) to simulate the dynamic explanatory model of annual crop growth under specific soil and climate conditions. The model places a strong emphasis on quantitative land evaluation, regional yield forecasting, risk analysis, and quantification of interannual yield variability and climate change impacts. The model is based on the simulation of crop physiological and ecological processes such as assimilation, respiration, transpiration and dry matter distribution, and mainly includes the simulation of crop growth under potential growth conditions, water limitation conditions and nutrient limitation conditions.

在本实施例中，基于GF-1遥感卫星数据，利用WOFOST模型进行作物的大面积快速估产；然后，基于作物的遥感估产数据，利用k-means方法，在各子斑块内部，形成具有相似作物产量水平的子集。各子斑块的聚类数等于相应子斑块的采样数量。In this example, based on the GF-1 remote sensing satellite data, the WOFOST model is used to quickly estimate the large-scale yield of crops; then, based on the remote sensing yield estimation data of crops, the k-means method is used to form similar A subset of crop yield levels. The number of clusters in each sub-patch is equal to the number of samples in the corresponding sub-patch.

S32确定代表性样本的空间位置S32 determine the spatial location of the representative sample

经历上述过程后，每个子集具有相似的地形条件、局部土壤变异水平以及作物产量水平。为合理分配土壤样本资源，确定代表性样本的空间位置，首先计算期望方差。公式如下所示：After going through the above process, each subset has similar topographic conditions, local soil variation levels, and crop yield levels. In order to reasonably allocate soil sample resources and determine the spatial location of representative samples, the expected variance was calculated first. The formula looks like this:

（5） (5)

式中，是农田子区域 h的作物遥感估产数据的方差。然后，在各子集内部随机选择一个样点组成样点集合，并搜索最接近期望方差的样点集合作为采样点，从而确定代表性样本的空间位置，形成农田土壤采样方案。图5为实施例的第三阶段k-means处理及土壤采样方案结果。 In the formula, is the variance of crop remote sensing yield estimation data in farmland sub-region h . Then, a sample point is randomly selected in each subset to form a sample point set, and the sample point set closest to the expected variance is searched for as a sampling point, so as to determine the spatial position of the representative sample and form a farmland soil sampling plan. Fig. 5 is the result of the third stage k-means processing and soil sampling scheme of the embodiment.

采用不同方法进行对比。不同采样方法的土壤SOM制图定量评价结果如图表1 。Use different methods to compare. The quantitative evaluation results of soil SOM mapping with different sampling methods are shown in Table 1.

表1不同采样方法的土壤SOM制图定量评价结果Table 1 Quantitative evaluation results of soil SOM mapping with different sampling methods

如图6所示，图6中分别为（a）三阶段k-means采样，（b）分层随机采样，（c）k-means采样和（d）规则格网采样。As shown in Figure 6, in Figure 6 are (a) three-stage k-means sampling, (b) stratified random sampling, (c) k-means sampling and (d) regular grid sampling.

图7为不同采样方法的土壤SOM属性制图及误差分布：（a）三阶段k-means采样，（b）分层随机采样，（c）k-means采样和（d）规则格网采样。Fig. 7 shows soil SOM attribute mapping and error distribution for different sampling methods: (a) three-stage k-means sampling, (b) stratified random sampling, (c) k-means sampling and (d) regular grid sampling.

Claims

1. The farmland soil sampling scheme design method based on three-stage k-means, is characterized in that, comprises the steps:

S1. Based on the DEM data, use the k-means method and the geographic detector method to form farmland sub-regions with spatial differentiation, and allocate the number of soil samples in the farmland sub-regions according to the area ratio to realize the first stage of k-means processing ;

S1 specifically includes the following sub-steps:

S11. Using the k-means method to construct farmland sub-regions with spatial differentiation:

Based on the DEM data, the k-means method is used to divide the farmland to form farmland sub-regions with similar topographic conditions within the region and different topographic conditions between regions, that is, spatial differentiation;

S12. Using geographic detectors to determine the optimal number of farmland sub-regions:

Based on the DEM data, use the Q value in the geographic detector to detect the spatial differentiation of farmland sub-regions under different numbers, obtain the change curve of Q value, and select the number corresponding to the inflection point as the optimal number of farmland sub-regions;

S13. Assign the number of soil samples for each farmland sub-region:

The design of the soil sampling plan is to determine the sample size first, and then determine the sample location; based on the results of the division of farmland sub-regions, calculate the area of each farmland sub-region, and use this as a weight to reasonably allocate the number of soil samples in each farmland sub-region; The formula looks like this:

In the formula, SN is the overall sample size of the farmland soil sampling scheme, A _h is the area of the farmland sub-region h, and SN1 _h is the number of soil samples in the farmland sub-region h;

S2. Based on the NDVI data, use the local variation coefficient CV and k-means method to subdivide the farmland sub-region obtained by k-means in the first stage to form a series of sub-patches with similar levels of local spatial variation. and the level of local spatial variation, obtain the number of soil samples of each sub-patch, and realize the second stage of k-means processing;

S2 specifically includes the following sub-steps:

S21. Infer the local spatial variation level of farmland soil:

Select the NDVI data that is closely related to crop growth, use the local variation coefficient CV to infer the local spatial variation level of farmland soil, and provide data support for the further allocation of soil samples;

S22. Generate sub-patches with similar soil local spatial variation levels

Based on the calculation results of the local spatial variation level of farmland soil, using the k-means method, sub-patches with similar soil local spatial variation levels are formed within the farmland sub-region; the number of clusters in each farmland sub-region is determined according to the Half of the number of soil samples in the area shall be determined;

S23. Obtain the sampling quantity of the corresponding sub-patch:

Based on the sub-patch division results, calculate its area and local coefficient of variation CV, and use this as a weight to further allocate soil samples, so as to obtain the sampling number of the corresponding sub-patch, and provide quantitative support for determining the spatial location of soil sampling; The sampling number of sub-patch is determined as follows:

In the formula, A _hl and CV _hl are the area and CV of the sub-patch l inside the farmland sub-region h;

S3. Based on the remote sensing yield estimation data, use the k-means method and variance statistics to determine the spatial location of representative soil samples, realize the third stage of k-means processing, and complete the design of the farmland soil sampling plan;

S3 specifically includes the following sub-steps:

S31, forming subsets with similar crop yield levels inside the sub-patch:

Based on crop remote sensing yield estimation data, using the k-means method, a subset with similar crop yield levels is formed within each sub-patch; the clustering number of each sub-patch is equal to the sampling number of the corresponding sub-patch;

S32. Determine the spatial position of the representative sample:

After going through the above process, each subset has similar terrain conditions, local soil variation levels, and crop yield levels; in order to determine the spatial location of representative samples, the expected variance is calculated; then, a sampling point is randomly selected within each subset to form a sample The sample point set is searched for the sample point set closest to the expected variance as the sampling point, so as to determine the spatial position of the representative sample and form the farmland soil sampling plan.