CN115223660B - Training method and device of biological population evaluation model and electronic equipment - Google Patents

Training method and device of biological population evaluation model and electronic equipment Download PDF

Info

Publication number
CN115223660B
CN115223660B CN202211140439.5A CN202211140439A CN115223660B CN 115223660 B CN115223660 B CN 115223660B CN 202211140439 A CN202211140439 A CN 202211140439A CN 115223660 B CN115223660 B CN 115223660B
Authority
CN
China
Prior art keywords
information
data
sampling
sample set
biological
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211140439.5A
Other languages
Chinese (zh)
Other versions
CN115223660A (en
Inventor
俞乐
赵剑桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211140439.5A priority Critical patent/CN115223660B/en
Publication of CN115223660A publication Critical patent/CN115223660A/en
Application granted granted Critical
Publication of CN115223660B publication Critical patent/CN115223660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a training method, a device and electronic equipment of a biological population evaluation model, wherein the method comprises the following steps: acquiring grid data and field statistical data of a sampling region; combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a training sample set of the biological population evaluation model; the training sample set includes: a first characteristic of a random effect type and a second characteristic of a fixed effect type; and training based on the first characteristic and the second characteristic in the training sample set to obtain a biological population evaluation model. Meanwhile, the evaluation model is trained by utilizing the grid data and the field statistical data, the space limitation of biological evaluation in the prior art is broken through, the accuracy of biological population evaluation under a larger space scale is improved by reasonably applying the random effect and the fixed effect, the evaluation of the human to the biological population in the nature is greatly facilitated, and the accuracy and the efficiency of the utilization of the natural resources of the human are improved.

Description

Training method and device for biological population evaluation model and electronic equipment
Technical Field
The invention relates to the technical field of ecology, in particular to a training method and a training device for a biological population evaluation model and electronic equipment.
Background
Biodiversity is the basis on which human society relies on survival and development, and in natural society, assessment of biodiversity is not independent of species and community richness.
Through the field investigation of a specific area, the information such as species abundance, community abundance, habitat land utilization type and the like of the area can be obtained. This type of investigation provides relatively accurate field statistics, and detailed biodiversity and land use data can be collected.
However, such a manual investigation and evaluation method usually focuses on specific details of a small-scale area, and cannot accurately evaluate the biological population information in a large spatial scale, in which case, the error rate of human beings using natural resources is greatly increased, resulting in severe imbalance of natural resource allocation.
Disclosure of Invention
The embodiment of the invention provides a training method and a training device for a biological population evaluation model and electronic equipment, which can break through the space limitation of biological evaluation in the prior art, improve the accuracy of biological population information under the condition of evaluating a larger space scale, greatly help human to evaluate natural biological populations and improve the accuracy and the efficiency of the utilization of natural resources of human beings.
In a first aspect, an embodiment of the present invention provides a method for training a biological population estimation model, where the method includes:
acquiring grid data and field statistical data of a sampling region;
combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a training sample set of the biological population evaluation model; the training sample set comprises: a first characteristic of a random effect type and a second characteristic of a fixed effect type;
training based on the first feature and the second feature in the training sample set to obtain a biological population evaluation model, wherein the biological population evaluation model is used for evaluating biological population information.
In a second aspect, embodiments of the present invention provide a method of biological population assessment, the method comprising:
acquiring raster data and field statistical data of a target area;
obtaining an evaluation result of the number of species in the target region according to a species number evaluation model based on the raster data and the field statistical data;
obtaining an evaluation result of the number of the biological individuals in the target area according to a biological individual number evaluation model based on the raster data and the field statistical data;
the species quantity evaluation model and the biological individual quantity evaluation model are obtained by training based on first features and second features in a training sample set.
In a third aspect, an embodiment of the present invention provides a training apparatus for a biological population estimation model, the apparatus including:
the sampling area data acquisition module is used for acquiring grid data and field statistical data of a sampling area;
the sampling area data synthesis module is used for combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a training sample set of the biological population evaluation model; the training sample set includes: a first characteristic of a random effect type and a second characteristic of a fixed effect type;
and the evaluation model training module is used for training based on the first characteristic and the second characteristic in the training sample set to obtain a biological population evaluation model, and the biological population evaluation model is used for evaluating biological population information.
In a fourth aspect, embodiments of the present invention provide a biological population evaluation device, the device comprising:
the target area data acquisition module is used for acquiring raster data and field statistical data of a target area;
the species quantity evaluation module is used for acquiring an evaluation result of the species quantity in the target area according to a species quantity evaluation model based on the grid data and the field statistical data;
the biological individual number evaluation module is used for acquiring an evaluation result of the biological individual number in the target area according to a biological individual number evaluation model based on the raster data and the field statistical data;
the species quantity evaluation model and the biological individual quantity evaluation model are obtained by training based on first features and second features in a training sample set.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory, the processor executing a computer program stored in the memory implementing the method of training a biological population evaluation model according to the first aspect.
In a sixth aspect, embodiments of the present invention provide a readable storage medium having stored therein computer instructions, which when executed by a processor, implement a method of training a biological population assessment model according to the first aspect.
The embodiment of the invention has the following advantages:
in the embodiment of the application, the field statistical data and the grid data of the sampling area are used, the random effect and the fixed effect in the biological population evaluation are considered at the same time, the biological population evaluation model is trained, the accuracy of the evaluation of the biological population is improved, and the biological population evaluation model can comprise a species number evaluation model and a biological individual number evaluation model; and then, the farmland characteristic information in the acquired raster data of the target area is input into the trained species quantity evaluation model and the trained biological individual quantity evaluation model, so that the evaluation results of the species quantity and the biological individual quantity of the target area are acquired, and the multi-angle evaluation of the biological population of the target area is realized. Furthermore, the evaluation model is trained by simultaneously utilizing the grid data and the field statistical data, the space limitation of biological evaluation in the prior art is broken through, the accuracy of evaluating the biological population under a larger space scale is improved by reasonably applying the random effect and the fixed effect, the evaluation of the human on the biological population in the nature is greatly facilitated, and the accuracy and the efficiency of utilizing the natural resources of the human are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 illustrates a flow chart of an embodiment of a method of training a biological population assessment model of the present invention;
FIG. 2 shows a flow chart of one embodiment of a method of biological population assessment of the present invention;
FIG. 3 is a block diagram illustrating an embodiment of a training apparatus for a biological population estimation model of the present invention;
FIG. 4 is a block diagram illustrating the construction of one embodiment of a biological population evaluation device of the present invention;
fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, it should be noted that, in the embodiment of the present application, the processes related to acquiring various data are performed under the premise of complying with the data protection regulation policy corresponding to the country of the location, and the acquisition is authorized by the owner of the corresponding device.
Biodiversity is the basis on which human society relies on survival and development, and in natural society, assessment of biodiversity is not independent of species and community richness.
Through field investigation on a specific area, the information such as species abundance, community abundance, habitat land utilization type and the like of the area can be obtained. This type of survey provides relatively accurate field statistics and allows for the collection of detailed biodiversity and land use data, but this type of survey-then-evaluation approach generally focuses on the specific details of small-scale areas, is inefficient and does not allow for accurate evaluation of biological population information at large spatial scales. The dynamic land grid data set produced by predecessors based on remote sensing data has the advantage of covering a large spatial scale, but cannot realize effective evaluation on the biological population such as species number and biological individual number. In this case, the error rate of human resources is greatly increased, resulting in a severe imbalance in natural resource allocation.
Referring to fig. 1, a flow chart of a method embodiment of the present invention for training a biological population assessment model is shown, wherein the assessment of a biological population may include an assessment of a number of species and an assessment of a number of biological individuals, the method may include:
step 101, acquiring grid data and field statistical data of a sampling region.
The sampling area covers the target area.
The raster data is a data form in which a space is divided into regular meshes, each mesh is called a cell, and a corresponding attribute value is assigned to each cell to represent a solid area. In the present invention, the grid data of the target area refers to data reflecting the dynamic characteristics of the farmland at the location of the living beings, including but not limited to the land utilization type of the target area, the peripheral farmland area proportion category of the sampling points, the planting intensity, the yield per unit and the fertilizer application rate, which are all important factors affecting the abundance of species and the abundance of communities.
Optionally, taking the planting intensity as an example, the planting intensity grid data may represent fallow, single-season planting, double-season planting, and three-season planting respectively by numbers with original values of 0, 1, 2, and 3, so as to reflect the planting intensity information in the grid forming the target area, and further obtain the overall crop planting intensity near the sampling point of the target area.
The raster data can be directly obtained through remote sensing data, and can also be obtained according to a public remote sensing data set. The grid data not only breaks through the limitation of evaluation on biological populations under the space scale, but also enriches the influence factors of species and communities from multiple angles.
The field statistical data refers to data obtained by field sampling in a target area through setting sampling points. The data collected in the field includes, but is not limited to, the position information of the sampling points in the sampling area, the species number, the biological individual number, and the like. Of course, the distribution information of the land use type may be raster data obtained by remote sensing, and the two may be mutually verified. The field statistical data can be obtained by public data sets, papers, researches and the like, and can also be obtained by field statistics.
Compared with raster data, the field statistical data focuses on specific details of small-scale areas, and the raster data and the field statistical data are combined, so that the evaluation of the biological population under large scale is facilitated, influence factors of species and communities are enriched from multiple angles, and the comprehensiveness and accuracy of the evaluation of the biological population are improved.
And 102, combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a training sample set of the biological population evaluation model.
The training sample set includes: a first characteristic of a random effect type and a second characteristic of a fixed effect type.
Because the grid data and the field statistical data are different in spatial scale, the grid data and the field statistical data need to be combined before the evaluation model is trained, and comprehensive information of each sampling point in the sampling area is acquired.
The merging specifically refers to extracting information such as peripheral farmland area proportion category, planting strength, yield per unit, global fertilizer application rate and the like of the sampling points from the grid data set according to complete longitude and latitude coordinates of each sampling point in the real-time statistical data, and finally, taking the merged real-time statistical data and grid data as training samples of the evaluation model, wherein the sample information can include position information, species quantity information or biological individual quantity information of the sampling points, and dynamic characteristics of the farm land.
The position information can comprise longitude and latitude of a sampling point, an original source of sampling point data, a superior partition of the sampling point and the sampling point, and the latter three can be also collectively called as sampling data source information, for example, the original source of the sampling point data is a public forum, the superior partition is a certain field in the forum, and the data of the sampling point is data in a certain article; for example, the data source of the sampling point is a country, the upper level partition is a province or state, and the sampling point itself is a county, such as the above information can reflect the source of the sampling information, which may affect the subsequent biological population estimation result.
The agricultural land dynamic characteristics may include a land utilization type of the sampling point, a peripheral field area proportion category of the sampling point, planting intensity, yield per unit, and global fertilizer application rate.
In short, the feature information in the training sample set of the biological population estimation model for biological population estimation mainly includes the dynamic features of the farm land and the source information of the sampling data.
The training sample set includes: a first characteristic of a random effect type and a second characteristic of a fixed effect type.
Fixed effects refer to the presence of variables that are correlated with predicted or independent variables in an individual that do not change over time. The fixed effect is more suitable for studying the difference between samples, for example, in the embodiment of the invention, the dynamic characteristics of the farmland are the main factors influencing the biological population, and the influence of the dynamic characteristics of the farmland on the biological population protects a certain internal relationship, and the two are related, so that in the embodiment of the invention, the dynamic characteristics of the farmland can be used as the characteristics of the type of the fixed effect evaluated by the biological population.
Random effects refer to the presence of variables that are not correlated with predicted or independent variables in an individual variable that does not change over time. The random effect is suitable for deducing the overall characteristics from the sample, for example, you want to know if the employment rate of the famous brand university is higher than that of the ordinary university, you randomly choose schools A, B, C, D4 to compare, wherein A and B are famous brand universities, C and D are ordinary universities, your conclusion is not limited to the 4 universities, but is generalized to a wider range of famous brand and ordinary universities. The meaning of "random" is that the 4 schools are randomly selected from nameplates and general universities. The individual features not observable at this time at this university of four are also random, independent of the independent variable. Thus, in embodiments of the present invention, the source information of the sampled data in the field statistical data is random, has no inherent relationship to the effect of the biological population, and is therefore uncorrelated, and thus can be used as a characteristic of the type of random effect evaluated by the biological population.
And dividing the characteristic information in the training sample set of the evaluation model according to a random effect and a fixed effect, so that the evaluation model can be more accurately set up.
103, training based on the first features and the second features in the training sample set, and acquiring a biological population evaluation model, wherein the biological population evaluation model is used for evaluating biological population information.
Therefore, for the first feature and the second feature in the training sample set, the random effect and the fixed effect of the event may be included in the hybrid model in consideration of the random effect and the fixed effect of the biological population estimation, and therefore in the embodiment of the present invention, the biological population estimation model may be constructed by using the training sample set preferentially using the hybrid model or an extended form of the hybrid model.
The evaluation on the biological population includes but is not limited to the evaluation on the species number and the evaluation on the biological individual number, accordingly, the evaluation model includes but is not limited to a species number evaluation model and a biological individual number evaluation model, and the population number can also be evaluated according to a mixed model formed by random effect and fixed effect.
Meanwhile, the fixed effect and the random effect in the biological population evaluation are considered, so that the accuracy of the biological population evaluation under a larger spatial scale can be improved, the biological population evaluation is closer to the natural world, the evaluation of the human to the biological population in the natural world is greatly facilitated, and the accuracy and the efficiency of the utilization of the human natural resources are improved.
After the variable setting is done, the data information of the corresponding variables in the training sample set is used for evaluating the model, and the biological population evaluation model is obtained.
And applying the trained biological population evaluation model to realize the evaluation of the biological population of the target area, including the evaluation of the number of species and the evaluation of the number of biological individuals. And inputting the farmland characteristic information of the target area, which is included in the acquired grid data of the target area, into the species quantity evaluation model and the biological individual quantity evaluation model, and acquiring evaluation results of the species quantity and the biological individual quantity in the target area.
The specific process can be seen in the embodiment shown in fig. 2.
Optionally, the training sample set comprises a first training sample set and a second training sample set.
Step 102, combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling region to obtain a training sample set of the biological population evaluation model, comprising:
step S1021, dividing first field statistical data comprising species quantity information and second field statistical data comprising biological individual quantity information according to the field statistical data; the first field statistical data comprises species number information for a sampling region; the second field statistical data includes information on a number of biological individuals in the sampling region.
The field statistical data at least comprises species quantity information, biological individual quantity information, position information and the like of a sampling area, wherein the species quantity information and the biological individual quantity information can be used as response variables, namely dependent variables, of the evaluation of the biological population, and therefore a species quantity evaluation model and a biological individual quantity evaluation model are built. Accordingly, the field statistical data is divided, and the first field statistical data is used for training a species quantity evaluation model; the second field statistical data is used for training a biological individual number evaluation model.
The first and second field statistical data each include aground feature information and location information of a sampling region, and since they are used for training of different models, the two statistical data are different in that species number information of the sampling region is included in the first field statistical data, and the second field statistical data includes organism individual number information of the sampling region.
Step S1022, merging the grid data and the first field statistical data according to the coordinate information of the sampling points in the sampling region, and obtaining a first set of the evaluation model.
When a species quantity evaluation model is built, after grid data and first on-site statistical data of a sampling area are obtained, the grid data and the first on-site statistical data of the sampling area are combined based on coordinate information of sampling points set by the on-site statistical data, and sample information of each sampling point is obtained, wherein the sample information of each sampling point forms a first set of the species quantity evaluation model. The sample information of each sampling point at least comprises position information, the species number of the sampling point and the dynamic characteristics of the farmland.
And S1023, combining the grid data and the second field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a second set of the evaluation model.
When a biological individual quantity evaluation model is built, after grid data and second field statistical data of a sampling region are obtained, the grid data and the second field statistical data of the sampling region are combined based on coordinate information of sampling points set by the field statistical data, and sample information of each sampling point is obtained, wherein the sample information of each sampling point forms a second set of the species quantity evaluation model. The sample information of each sampling point at least comprises position information, the species number of the sampling point and the dynamic characteristics of the farmland.
Step S1024 is to divide the feature information of the biological populations in the first set and the second set into a fixed effect type and a random effect type, respectively, to obtain a first training sample set and a second training sample set of the evaluation model.
In order to accurately aim at random effects and fixed effects in biological population evaluation, it is necessary to divide the characteristic information of the biological populations in the first set and the second set, that is, the position information and the dynamic characteristics of the agricultural land, for example, the source information of the sampling point data in the position information can be divided into first characteristics of random effect types, and the dynamic characteristics of the agricultural land can be divided into second characteristics of fixed effect types, which is beneficial to better utilize the characteristic information of fixed effect types and the characteristic information of random effect types to accurately build biological population models when building the models.
Thus, the first sample set comprises at least species number information, first characteristic information and second characteristic information of sample points; the second training sample set at least comprises biological individual number information, first characteristic information and second characteristic information of the sampling points.
Optionally, before the dividing step S1021 of the first field statistical data including the species quantity information and the second field statistical data including the biological individual quantity information according to the field statistical data, the step 102 of combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling region to obtain the training sample set of the biological population evaluation model further includes:
step S1020 corrects the solid statistical data.
The original individual organism number of each sampling point is corrected based on the sampling work intensity recorded in the original on-site statistical data. The sampling effort is quantitatively expressed in units depending on the sampling method of the station, for example: when a trap method is adopted for sampling, the sampling working intensity is the number of traps multiplied by the number of sampling days; when sampling by adopting a sample belt method, the sampling working strength is the length of the sample belt, and the unit is meter. The sampling effort and the number of original biological individuals recorded from field observations can be considered to be linearly related. Therefore, for the sampling point data from the same original source, the species quantity information and the biological individual information of other sampling points are amplified in proportion by taking the maximum sampling working intensity index as a standard, and the corrected species quantity information and biological individual quantity are obtained.
Secondly, in the field observation, there is a phenomenon that a part of the sampling points are repeatedly recorded. In order to avoid data repetition, statistical data of the same sampling is fused, and each sampling point is ensured to only keep one effective record. Thirdly, calculating the biological individual number of the missing sites of partial biological individual number data based on the corrected and fused biological individual number, recording whether a certain species appears or not and the biological individual number appears by the original statistical data, and the records can be used for further supplementing the biological individual number information and the species number information of the sampling points with data missing.
Accordingly, the dividing step S1021 of the first field statistical data including species number information and the second field statistical data including biological individual number information according to the field statistical data comprises:
step S10211, dividing the first solid statistical data and the second solid statistical data according to the corrected solid statistical data.
Optionally, the evaluation of the population of organisms comprises an evaluation of the number of species and an evaluation of the number of individuals of the organism; the first characteristic comprises a source of sampling information; the second characteristic comprises a dynamic characteristic of the farmland.
Step 103, training the evaluation model according to random effects and fixed effects in the biological population evaluation based on the training sample set to obtain a biological population evaluation model, including:
step S1031, taking the species quantity information in the first training sample set as a response variable, training the species quantity evaluation model based on the first training sample set according to the following formula, and obtaining the species quantity evaluation model based on the generalized linear mixed model:
Figure 100002_DEST_PATH_IMAGE001
wherein g (Y) represents the linkage function of the dependent variable Y to the linear moiety X β + Z μ + ε; y represents the number of dependent variable species; x represents a design matrix of the independent variable farmland dynamic characteristics, and beta represents a parameter matrix of the independent variable farmland dynamic characteristics; z represents a design matrix of independent variable sampling data source information, and the mu is used for representing the relation between the sampling data source information in the first training sample set and the species number; ε represents the random error matrix;
the number of species, as the name implies, represents the number of biological species. Considering that the distribution of the species number is statistically in accordance with Poisson distribution rather than normal distribution, a Generalized Linear Mixture Model (GLMM) is selected to evaluate the species number of the target area, and a species number evaluation model based on the generalized linear mixture model is constructed.
The Generalized Linear Mixed Model (GLMM) can be regarded as an extended form of the linear model, so that the dependent variables are no longer required to satisfy the normal distribution, and at the same time, the dependent variables include the fixed effect and the random effect.
To build a species number evaluation model based on the generalized linear mixture model, the number of species in the training sample set is used as a response variable. In the embodiment of the present invention, the random effect independent variable may include an original source of the sampling point data, a higher-level partition of the sampling point, and the sampling point itself, and the specific situation may be selected according to an actual situation or a training effect of the model. Corresponding to the random effect independent variables described above, the fixed effect independent variables may include land use type, surrounding field area proportion category, planting intensity category as categorical variables, yield per unit and fertilizer application rate.
The formula of the generalized linear mixed model is added with a connection function on the basis of the linear mixed model, and the dependent variable meets exponential distribution, and in the construction of the species number evaluation model in the embodiment of the invention, the formula is as follows:
Figure 100002_DEST_PATH_IMAGE002
(1),
where g represents a connection function, such as a natural logarithm function ln (). The correlation between the dependent variable Y and the linear part X beta + Z mu + epsilon can be realized through a connection function, and optionally, a connection relation g can be preset according to the distribution characteristic that the number of species accords with Poisson distribution, so that the training amount of the model is reduced. Y represents a dependent variable, namely the number of species; x represents a design matrix of a fixed effect independent variable, beta represents a parameter matrix of the fixed effect independent variable, wherein the fixed effect independent variable represents the farmland dynamic characteristic information required by the construction of the species number model, and the fixed effect independent variable can represent independent variable farmland characteristic information; z represents a design matrix of a random effect independent variable, and mu represents a parameter matrix of the random effect independent variable, wherein the random effect independent variable represents the source information of the sampling data required by the construction of the species number model, namely the random effect independent variable can represent the source information of the independent variable sampling data; ε represents the random error matrix.
Alternatively, in the formula (1), the design matrix of the random effect independent variable may be a three-dimensional characteristic matrix including first characteristic information such as an original source of the sampling point data, an upper stage partition of the sampling point, and information of the sampling point itself, and accordingly μ includes at least information of the original source of the sampling point data of the independent variable, the upper stage partition of the sampling point, and the sampling point itself, and the design matrix of the fixed effect may be a multi-dimensional characteristic matrix including a plurality of the aforementioned dynamic characteristics of the farm land, that is, the second characteristic information.
After the variable setting is done, a parameter matrix of the random effect independent variable and a parameter matrix of the fixed effect independent variable in the formula (1) are trained by using the first characteristic information and the second characteristic information in the first training sample set in a pertinence manner, and a species number evaluation model is finally obtained by combining other information in the first training sample set.
Alternatively, in the embodiment of the present invention, the fixed effect independent variables such as the land utilization type, the surrounding farmland area proportion category, the planting strength category, the yield per unit and the fertilizer application rate mentioned above can be combined, and besides the comprehensive evaluation of the number of species in the target area based on the dynamic characteristics of the farmland, the specific evaluation can also be carried out for the dynamic characteristics of the specific farmland. For example, the following five generalized linear mixture models can be constructed to evaluate the response of species richness: the fixed effect of the model 1 comprises the land utilization type and the area proportion category of the surrounding farmland; the analysis of the model 2 is limited in a sample point with the land utilization type of farmland, and the fixed effect comprises a planting strength category and a surrounding farmland area proportion category; the fixing effect of the model 3 comprises land utilization type and yield per unit; the fixing effect of model 4 includes the type of land use, the rate of fertilizer application, and the interaction of the two; the fixed effect of the model 5 includes land utilization type, surrounding farmland area proportion category, planting strength category as classification variable, yield per unit, fertilizer application rate.
Step S1032, training a biological individual number evaluation model based on the second training sample set and according to the following formula by using the biological individual number information in the second training sample set as a response variable, and obtaining a biological individual number evaluation model based on a linear mixture model:
Figure 100002_DEST_PATH_IMAGE003
wherein A represents the number of dependent variable species; b represents a design matrix of the independent variable farmland dynamic characteristics, and m represents a parameter matrix of the independent variable farmland dynamic characteristics; c represents a design matrix of independent variable sampling data source information, and n is used for representing the relation between the sampling data source information in the second training sample set and the biological individual number; q represents a random error matrix.
The linear mixed model is an extended form of the linear model, comprises a fixed effect and a random effect of variables, and has extremely high accuracy rate on dependent variables conforming to normal distribution, so that in the embodiment of the invention, the linear mixed model is adopted to evaluate the number of biological individuals in the target area.
To build a biological individual number evaluation model based on a linear hybrid model, the biological individual number in a training sample set is used as a response variable. When the training of the biological individual quantity evaluation model is performed by using the training characteristic information of each sampling point, the random effect independent variable can comprise an original source of the sampling point data, a superior partition of the sampling point and the sampling point, and the specific situation can be selected according to the actual situation or the training effect of the model. Corresponding to the random effect independent variables described above, the fixed effect independent variables may include land use type, surrounding field area proportion category, planting intensity category as categorical variables, yield per unit and fertilizer application rate. After the variable setting is done, the linear mixed model is trained by using a second training sample set, and a biological individual number evaluation model is obtained.
The linear mixed model includes random effects besides fixed effects, and does not require independence and homogeneity of variance of dependent variables, but the dependent variables need to satisfy the normality assumption, and in the construction of the model for evaluating the number of the biological individuals, the formula is as follows:
Figure 100002_DEST_PATH_IMAGE004
(2),
wherein, A represents dependent variable, namely species number; b represents a design matrix of a fixed effect independent variable, m represents a parameter matrix of the fixed effect independent variable, wherein the fixed effect independent variable represents the farmland dynamic characteristic information required for constructing the biological individual quantity model, namely the fixed effect independent variable can represent the independent variable farmland characteristic information; c represents a design matrix of a random effect independent variable, n represents a parameter matrix of the random effect independent variable, wherein the random effect independent variable represents the source information of the sampling data required for constructing the biological individual quantity evaluation model, namely the random effect independent variable can represent the source information of the independent variable sampling data; q represents a random error matrix.
Alternatively, in formula (2), the design matrix of the random effect independent variable may be a three-dimensional feature matrix including information of an original source of the sampling point data, an upper partition of the sampling point, and the sampling point itself, and accordingly, n includes information of the original source of the independent variable sampling point data, the upper partition of the sampling point, and the sampling point itself; the design matrix for the fixed effect may be a multi-dimensional feature matrix comprising a plurality of the aforementioned dynamic features of the agricultural field.
After the variable setting is done, a parameter matrix of the random effect independent variable and a parameter matrix of the fixed effect independent variable in the formula (1) are trained by using the first characteristic information and the second characteristic information in the first training sample set in a pertinence manner, and a biological individual number evaluation model is finally obtained by combining other information in the first training sample set.
Optionally, the μ in the formula (1) is used to represent a relationship between sampling data source information in the first training sample set and a species number, where the sampling data source information in the first training sample set includes an original source of the independent variable sampling point data, an upper partition of the sampling point, and information of the sampling point itself; and n in the formula (2) is used for expressing the relationship between the sampling data source information in the second training sample set and the biological individual number, wherein the sampling data source information in the second training sample set comprises the original source of the independent variable sampling point data, the superior partition of the sampling point and the information of the sampling point.
Alternatively, in the embodiment of the present invention, the fixed effect independent variables such as the land utilization type, the surrounding farmland area proportion category, the planting strength category, the yield per unit and the fertilizer application rate mentioned above can be combined, and besides the comprehensive evaluation of the number of species in the target area based on the dynamic characteristics of the farmland, the specific evaluation can also be carried out for the dynamic characteristics of the specific farmland. For example, the following five linear mixture models can be constructed to evaluate the response of the number of biological individuals: the fixed effect of the model 1 comprises land utilization type and the area proportion category of surrounding farmlands; the analysis of the model 2 is limited in a sample point with the land utilization type of farmland, and the fixed effect comprises a planting strength category and a surrounding farmland area proportion category; the fixing effect of the model 3 comprises land utilization type and yield per unit; the fixing effect of model 4 includes the type of land use, the rate of fertilizer application, and the interaction of the two; the fixed effect of the model 5 includes land utilization type, surrounding farmland area proportion category, planting strength category as classification variable, yield per unit, fertilizer application rate.
The species quantity evaluation model is constructed by selecting a linear mixed model according to the distribution characteristics of the biological individual quantity, so that the species quantity information in the target area can be accurately evaluated; the number information of the biological individuals under a larger scale space can be more accurately evaluated by integrating the grid data and the field statistical data.
Optionally, the training method of the species number evaluation model in step S1301 includes:
and training the generalized linear mixed model by using the species quantity information in the first training sample set as a response variable and adopting a Bayesian parameter estimation method based on the first training sample set to obtain the species quantity evaluation model.
The Bayesian statistical inference starts from posterior distribution, usually has certain prior information or no information prior to unknown parameters, and can solve posterior distribution when given prior information so as to complete Bayesian parameter estimation.
Of course, a penalty-based Likelihood-fitting method (PQL, penalized Quasi-Likelihood) can also be used for parameter estimation in the species number estimation model. The quasi-likelihood method does not require that the response variable is a specific known distribution, only needs to know the mean value and the variance of the response variable, can approximate to a normal distribution under the condition of a large sample, and can improve the estimation precision and the accuracy in species number evaluation by adding a penalty term in order to reduce the error of variance estimation.
The PQL and Bayesian parameter estimation method has higher accuracy on parameters obtained by training of the generalized linear mixed model, but the Bayesian parameter estimation method has higher operation speed and higher efficiency.
Optionally, the training method of the biological individual quantity evaluation model in step S1032 includes at least any one of the following:
step S10321, the biological individual number information in the second training sample set is used as a response variable, a linear mixed model is trained by adopting a method of limiting maximum likelihood estimation based on the second training sample set, and the biological individual number evaluation model is obtained.
The method for limiting maximum likelihood estimation is to eliminate the part related to the parameter matrix of the fixed effect, namely beta 2 in the formula (2), in the new model by properly transforming the model, thereby reducing the loss of freedom degree when estimating the parameter beta 2 and reducing the error of the parameter matrix for estimating the random effect.
Step S10322, training a linear mixture model by using the information of the number of biological individuals in the second training sample set as a response variable and using a minimum norm quadratic unbiased estimation method based on the second training sample set, and obtaining the biological individual number evaluation model.
The method is not limited to maximum likelihood estimation and minimum norm quadratic unbiased estimation, and in training of the biological individual quantity evaluation model based on the linear mixed model, the parameter estimation method can also comprise a plurality of methods such as maximum likelihood estimation, spectral decomposition estimation and the like.
Optionally, before the dividing step S1021 of the first field statistical data including the species quantity information and the second field statistical data including the biological individual quantity information according to the field statistical data, the step 102 of combining the grid data and the field statistical data according to the coordinate information of the sampling points in the sampling region to obtain the training sample set of the biological population evaluation model further includes:
the raster data can be customized as follows.
For the farmland area proportion grid data, the following settings can be made: when the area ratio is 10% or less, the peripheral farmland area ratio category is regarded as "low"; when the area ratio is greater than 10% and equal to or less than 60%, the peripheral farmland area ratio category is regarded as "low"; when the area ratio is larger than 60%, the peripheral field area ratio category is regarded as "high".
For the crop planting intensity grid data, the original values are 0, 1, 2 and 3, which respectively represent fallow cultivation, single-season planting, double-season planting and three-season planting. Considering that the grids for the three-season planting are few, the embodiment of the invention combines the two-season planting and the three-season planting into the multi-season planting, and obtains the planting strength of the final input model, which comprises three categories: fallow, single-season planting and multi-season planting.
For the specific yield data, for example, the specific yield of each of 42 crops is provided in units of kg per hectare. The embodiment of the invention adds the unit yields of all crops pixel by pixel and fuses to obtain the total unit yield data of the multiple crops.
For fertilizer application rate data, for example, 17 crop fertilizer application rates are provided globally, in kilograms per hectare, respectively. According to the embodiment of the invention, the fertilizer application rates of all crops are added pixel by pixel, and the total fertilizer application rate data of multiple crops are obtained through fusion.
Optionally, before training the estimation model for random and fixed effects in the estimation of biological population based on the training sample set and obtaining the estimation model of biological population in step 103, the method further includes:
and (3) standardizing the numerical variables in the training sample set, such as the yield per unit, the fertilizer application rate and the like, and then respectively using the other two standardized numerical variables, such as the construction of a model, so as to improve the convergence of a model fitting algorithm. The normalized equation (3) is as follows:
xstd=(x-xmean)/xst,(3)
in the formula, xstd represents the numerical variable after normalization, x represents the numerical variable before normalization, xmean represents the mean value of the numerical variable before normalization, and xst represents the standard deviation of the numerical variable before normalization.
Accordingly, the step 103 of training the estimation model for random and fixed effects in the estimation of biological population based on the training sample set to obtain the estimation model of biological population includes:
and training the evaluation model aiming at random effect and fixed effect in the biological population evaluation based on the training sample set after data standardization to obtain the biological population evaluation model.
Referring to fig. 2, a method of biological population assessment is shown, comprising:
step 201, obtaining raster data and field statistical data of the target area.
Wherein the sampling area for evaluation model training in fig. 1 covers the target area.
In the embodiment of the present invention, the grid data is the same as the grid data described in fig. 1, and both refer to data reflecting the dynamic characteristics of the farmland at the location of the living being, including but not limited to the land utilization type of the target area, the peripheral farmland area proportion category of the sampling point, the planting intensity, the yield per unit area, and the fertilizer application rate, which are important factors affecting the abundance of species and the abundance of communities. Since the result of the evaluation of the biological population in the target area is unknown, the field statistical data includes only the location information of the sample points in the target area, which is consistent with the embodiment shown in FIG. 1, including coordinate information and sample data source information.
In the embodiment of the present invention, the raster data and the field statistical data of the target area may be obtained through a public distribution data set, such as a public remote sensing image, land information, and the like.
Step 202, obtaining an evaluation result of the species number in the target region according to a species number evaluation model based on the grid data and the field statistical data.
And inputting the farmland characteristic information in the grid data, the sampling data source information in the field statistical data and the coordinate information into a species quantity evaluation model, so that the species quantity evaluation model evaluates the species quantity of the target area, and an evaluation result of the species quantity in the target area is obtained.
And 203, acquiring an evaluation result of the number of the biological individuals in the target area according to a biological individual number evaluation model based on the grid data and the field statistical data.
And inputting the farmland characteristic information in the grid data, the sampling data source information in the field statistical data and the coordinate information into a biological individual quantity evaluation model, so that the biological individual quantity evaluation model evaluates the biological individual quantity of the target area, and obtains an evaluation result of the biological individual quantity in the target area.
The species quantity evaluation model and the biological individual quantity evaluation model are both obtained by training based on the first feature and the second feature in the training sample set as shown in the embodiment shown in fig. 1.
The specific training mode of the species quantity evaluation model and the biological individual quantity evaluation model can refer to the embodiment shown in fig. 1 and the optional model training steps described in the foregoing, and are not described in detail here.
In summary, in the embodiment of the present application, the on-site statistical data and the grid data of the sampling area are used, and the random effect and the fixed effect in the estimation of the biological population are considered at the same time, so that the training of the biological population estimation model is beneficial to improving the accuracy of the estimation of the biological population, and the biological population estimation model may further include a species number estimation model and a biological individual number estimation model; and then, the farmland characteristic information in the acquired raster data of the target area is input into the trained species quantity evaluation model and the trained biological individual quantity evaluation model, so that the evaluation results of the species quantity and the biological individual quantity of the target area are acquired, and the multi-angle evaluation of the biological population of the target area is realized. Furthermore, the grid data and the field statistical data are simultaneously used for training the evaluation model, the space limitation of biological evaluation in the prior art is broken through, the reasonable application of the random effect and the fixed effect improves the accuracy of evaluating the biological population under a larger spatial scale, greatly helps human to evaluate the biological population in the nature, and improves the accuracy and the efficiency of utilizing the natural resources of the human.
It should be noted that for simplicity of description, the method embodiments are shown as a series of combinations of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 3, a block diagram of an embodiment of an apparatus for training a biological population estimation model of the present invention is shown, wherein the apparatus 300 may comprise:
a sampling region data acquiring module 301, configured to acquire grid data and field statistical data of a sampling region;
a sampling region data synthesis module 302, configured to combine the grid data and the field statistical data according to coordinate information of sampling points in a sampling region, and obtain a training sample set of the biological population evaluation model; the training sample set comprises: a first characteristic of a random effect type and a second characteristic of a fixed effect type;
an evaluation model training module 303, configured to perform training based on the first feature and the second feature in the training sample set, and obtain a biological population evaluation model, where the biological population evaluation model is used to evaluate biological population information.
Optionally, the training sample set comprises a first training sample set and a second training sample set;
the data synthesis module may include:
a field statistical data dividing module for dividing first field statistical data including species quantity information and second field statistical data including biological individual quantity information according to the field statistical data; the first field statistical data comprises species number information for a sampling region; the second field statistical data comprises information on the number of biological individuals in the sampling region;
the first training sample set acquisition module is used for combining the grid data and the first field statistical data according to the coordinate information of the sampling points in the sampling area to acquire a first training sample set of the evaluation model;
the second training sample set acquisition module is used for combining the grid data and the second field statistical data according to the coordinate information of the sampling points in the sampling area to acquire a second training sample set of the evaluation model;
the characteristic information dividing module is used for dividing the characteristic information of the biological populations in the first set and the second set into a fixed effect type and a random effect type respectively to obtain a first training sample set and a second training sample set of the evaluation model; the first training sample set comprises species number information, first feature information and second feature information of a sampling region; the second training sample set includes biological individual quantity information, first feature information, and second feature information of the sampling region.
Optionally, the sampling region data synthesis module may further include:
a field statistical data correction module for correcting the field statistical data before the dividing of first field statistical data including species quantity information and second field statistical data including biological individual quantity information according to the field statistical data;
correspondingly, the field statistical data partitioning module may include:
and the corrected solid statistical data dividing module is used for dividing the first solid statistical data and the second solid statistical data according to the corrected solid statistical data.
Optionally, the evaluation of the population of the organism comprises an evaluation of the number of species and an evaluation of the number of individuals of the organism; the first characteristic comprises a source of sampling information; the second characteristic comprises an agro-dynamic characteristic;
the evaluation model training module may include:
the species quantity evaluation model training module is used for training the species quantity evaluation model based on the first training sample set by taking the species quantity information in the first training sample set as a response variable according to the following formula, and acquiring the species quantity evaluation model based on the generalized linear mixture model:
Figure DEST_PATH_IMAGE005
wherein g (Y) represents the linkage function of the dependent variable Y to the linear portion X β + Z μ + ε; y represents the number of dependent variable species; x represents a design matrix of the independent variable farmland dynamic characteristics, and beta represents a parameter matrix of the independent variable farmland dynamic characteristics; z represents a design matrix of independent variable sampling data source information, and the mu is used for representing the relation between the sampling data source information in the first training sample set and the species number; ε represents the random error matrix;
the biological individual number evaluation model training module is used for training a biological individual number evaluation model based on the second training sample set by taking the biological individual number information in the second training sample set as a response variable according to the following formula, and acquiring a biological individual number evaluation model based on a linear mixed model:
Figure DEST_PATH_IMAGE006
wherein A represents the number of dependent variable species; b represents a design matrix of the independent variable farmland dynamic characteristics, and m represents a parameter matrix of the independent variable farmland dynamic characteristics; c represents a design matrix of independent variable sampling data source information, and n is used for representing the relation between the sampling data source information in the second training sample set and the biological individual number; q represents a random error matrix.
Optionally, the μ 1 in the species number evaluation model training module is used to represent a relationship between sampling data source information in the first training sample set and the species number, where the sampling data source information in the first training sample set includes an original source of the independent variable sampling point data, an upper partition of the sampling point, and information of the sampling point itself;
the mu 2 in the biological individual number evaluation model training module is used for representing the relationship between the sampling data source information in the second training sample set and the biological individual number, and the sampling data source information in the second training sample set comprises the original source of the independent variable sampling point data, the superior partition of the sampling point and the information of the sampling point.
Optionally, the species number evaluation model training module may include:
and the Bayesian parameter estimation module is used for training the generalized linear mixed model by using the species quantity information in the first training sample set as a response variable and adopting a Bayesian parameter estimation method based on the first training sample set to obtain the species quantity evaluation model.
Optionally, the training module of the biological individual number evaluation model at least includes any one of the following modules:
the likelihood estimation module is used for training the linear mixed model by adopting a method of limiting maximum likelihood estimation based on the second training sample set by taking the biological individual quantity information in the second training sample set as a response variable to obtain the biological individual quantity evaluation model;
and the minimum norm quadratic unbiased estimation module is used for training the linear mixed model by using the biological individual quantity information in the second training sample set as a response variable and adopting a minimum norm quadratic unbiased estimation method based on the second training sample set to obtain the biological individual quantity evaluation model.
Referring to FIG. 4, a block diagram of a biological population evaluation device embodiment of the present invention is shown, the device 400 may comprise:
a target area data obtaining module 401, configured to obtain raster data and field statistical data of a target area;
a species number evaluation module 402, configured to obtain an evaluation result of the number of species in the target region according to a species number evaluation model based on the grid data and the field statistical data;
a biological individual number evaluation module 403, configured to obtain an evaluation result of the number of biological individuals in the target area according to a biological individual number evaluation model based on the grid data and the field statistical data;
the species quantity evaluation model and the biological individual quantity evaluation model are obtained by training based on first features and second features in a training sample set.
The specific training mode of the species quantitative assessment model and the biological individual quantitative assessment model can be seen in the embodiment shown in fig. 1 and the optional training steps thereof, which are not described in detail herein.
For the apparatus embodiment, since it is basically similar to the method embodiment, it is described relatively simply, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are all described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same and similar between the embodiments may be referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Referring to fig. 5, an electronic device 500 provided in an embodiment of the present application is shown, including: a processor 501, a memory 502, and a computer program stored on the memory 502 and executable on the processor 501, the computer program when executed by the processor 501 implementing the steps of a method of training a biological population assessment model as described in method embodiments.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the steps of the method for training a biological population estimation model are implemented, and the same technical effects can be achieved, and are not described herein again to avoid repetition.
The processor is a processor in the electronic device in the above embodiment of the electronic device. Readable storage media include computer readable storage media such as Read-Only Memory (ROM), random Access Memory (RAM), magnetic or optical disk, and so on.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present disclosure are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the embodiments of the present disclosure as described herein, and any descriptions of specific languages are provided above to disclose the best mode of use of the embodiments of the present disclosure.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the embodiments of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed to reflect the intent: that is, claimed embodiments of the disclosure require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of an embodiment of this disclosure.
Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a sequencing device according to embodiments of the present disclosure. Embodiments of the present disclosure may also be implemented as an apparatus or device program for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present disclosure may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit embodiments of the disclosure, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the embodiments of the present disclosure, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the embodiments of the present disclosure are intended to be included within the scope of the embodiments of the present disclosure.
The above description is only a specific implementation of the embodiments of the present disclosure, but the scope of the embodiments of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present disclosure, and all the changes or substitutions should be covered by the scope of the embodiments of the present disclosure. Therefore, the protection scope of the embodiments of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method for training a biological population evaluation model, comprising:
acquiring grid data and field statistical data of a sampling region;
partitioning a first field statistical data comprising species number information and a second field statistical data comprising biological individual number information based on the field statistical data;
combining the grid data and the first field statistical data according to coordinate information of sampling points in the sampling area to obtain a first set, wherein the grid data is used for reflecting the dynamic characteristics of the farmland where the organisms are located, and the dynamic characteristics of the farmland comprise the land utilization type of a target area, the peripheral farmland area proportion category of the sampling points, the planting strength, the yield per unit and the fertilizer application rate;
combining the grid data and the second field statistical data according to the coordinate information of the sampling points in the sampling area to obtain a second set;
respectively dividing the feature information of the biological populations in the first set and the second set into a fixed effect type and a random effect type to obtain a first training sample set and a second training sample set of the evaluation model; the first training sample set comprises species number information of a sampling region, first characteristic information of a random effect type and second characteristic information of a fixed effect type; the second training sample set comprises biological individual quantity information of a sampling region, first characteristic information of a random effect type and second characteristic information of a fixed effect type;
and training a species quantity evaluation model based on the first training sample set by taking the species quantity information in the first training sample set as a response variable according to the following formula, and acquiring the species quantity evaluation model based on the generalized linear mixed model:
Figure DEST_PATH_IMAGE001
wherein g (Y) represents the linkage function of the dependent variable Y to the linear moiety X β + Z μ + ε; y represents the number of dependent variable species; x represents a design matrix of the independent variable farmland dynamic characteristics, and beta represents a parameter matrix of the independent variable farmland dynamic characteristics; z represents a design matrix of independent variable sampling data source information, and the mu is used for representing the relation between the sampling data source information in the first training sample set and the species number; epsilon represents a random error matrix;
and training a biological individual quantity evaluation model based on the second training sample set by taking the biological individual quantity information in the second training sample set as a response variable according to the following formula to obtain a biological individual quantity evaluation model based on a linear mixed model:
Figure DEST_PATH_IMAGE002
wherein A represents the number of dependent variable species; b represents a design matrix of the independent variable farmland dynamic characteristics, and m represents a parameter matrix of the independent variable farmland dynamic characteristics; c represents a design matrix of independent variable sampling data source information, and n is used for representing the relation between the sampling data source information in the second training sample set and the biological individual number; q represents a random error matrix.
2. The method of claim 1, wherein prior to said partitioning, from said field statistical data, first field statistical data comprising species quantity information and second field statistical data comprising biological individual quantity information, the method further comprises:
correcting the field statistical data;
said partitioning, based on said field statistics, first field statistics comprising species quantitative information and second field statistics comprising biological individual quantitative information, comprising:
and dividing the first field statistical data and the second field statistical data according to the corrected field statistical data.
3. The method of claim 1, wherein the source information of the sampled data in the first training sample set includes information of an original source of the argument sampled point data, an upper level partition of the sampled point, and the sampled point itself;
the sampling data source information in the second training sample set comprises the original source of the independent variable sampling point data, the superior partition of the sampling point and the information of the sampling point.
4. The method of claim 1, wherein the training method of the species number evaluation model comprises:
and training the generalized linear mixed model by using the species quantity information in the first training sample set as a response variable and adopting a Bayesian parameter estimation method based on the first training sample set to obtain the species quantity evaluation model.
5. The method according to claim 1, wherein the training method of the biological individual quantity evaluation model at least comprises any one of the following:
training the linear mixed model by using the biological individual quantity information in the second training sample set as a response variable and adopting a method of limiting maximum likelihood estimation based on the second training sample set to obtain the biological individual quantity evaluation model;
and training the linear mixed model by using the biological individual quantity information in the second training sample set as a response variable and adopting a minimum norm quadratic unbiased estimation method based on the second training sample set to obtain the biological individual quantity evaluation model.
6. A method of biological population assessment, comprising:
acquiring raster data and field statistical data of a target area;
inputting the source information and the coordinate information of the sampling data in the field statistical data and the dynamic characteristic information of the farmland in the raster data into a species quantity evaluation model to obtain an evaluation result of the species quantity in the target area;
inputting the source information and the coordinate information of the sampling data in the field statistical data and the dynamic characteristic information of the farmland in the raster data into a biological individual number evaluation model to obtain an evaluation result of the biological individual number in the target area;
wherein the species number evaluation model is trained according to the method of any one of claims 1-5, and the biological individual number evaluation model is trained according to the method of any one of claims 1-5.
7. A training apparatus for a biological population evaluation model, comprising:
the sampling area data acquisition module is used for acquiring grid data and field statistical data of a sampling area;
a field statistical data partitioning module for partitioning a first field statistical data including species quantity information and a second field statistical data including biological individual quantity information based on the field statistical data;
the first training sample set acquisition module is used for merging the grid data and the first field statistical data according to the coordinate information of the sampling points in the sampling area to acquire a first set, wherein the grid data is used for reflecting the dynamic characteristics of the farmland where the organisms are located, and the dynamic characteristics of the farmland comprise the land utilization type of a target area, the proportion category of the peripheral farmland area of the sampling points, the planting strength, the yield per unit and the fertilizer application rate;
the second training sample set acquisition module is used for combining the grid data and the second field statistical data according to the coordinate information of the sampling points in the sampling area to acquire a second set;
the characteristic information dividing module is used for dividing the characteristic information of the biological populations in the first set and the second set into a fixed effect type and a random effect type respectively to obtain a first training sample set and a second training sample set of the evaluation model; the first training sample set comprises species number information of a sampling region, first characteristic information of a random effect type and second characteristic information of a fixed effect type; the second training sample set comprises biological individual quantity information of a sampling region, first characteristic information of a random effect type and second characteristic information of a fixed effect type;
the species quantity evaluation model training module is used for training a species quantity evaluation model based on the first training sample set by taking the species quantity information in the first training sample set as a response variable according to the following formula, and acquiring the species quantity evaluation model based on the generalized linear mixed model:
Figure DEST_PATH_IMAGE003
wherein g (Y) represents the linkage function of the dependent variable Y to the linear moiety X β + Z μ + ε; y represents the number of dependent variable species; x represents a design matrix of the independent variable farmland dynamic characteristics, and beta represents a parameter matrix of the independent variable farmland dynamic characteristics; z represents a design matrix of independent variable sampling data source information, and the mu is used for representing the relation between the sampling data source information in the first training sample set and the species number; epsilon represents a random error matrix;
the biological individual number evaluation model training module is used for training a biological individual number evaluation model based on the second training sample set by taking the biological individual number information in the second training sample set as a response variable according to the following formula, and acquiring a biological individual number evaluation model based on a linear mixed model:
Figure DEST_PATH_IMAGE004
wherein A represents the number of dependent variable species; b represents a design matrix of the independent variable farmland dynamic characteristics, and m represents a parameter matrix of the independent variable farmland dynamic characteristics; c represents a design matrix of independent variable sampling data source information, and n is used for representing the relation between the sampling data source information in the second training sample set and the biological individual number; q represents a random error matrix.
8. A biological population evaluation device, comprising:
the target area data acquisition module is used for acquiring raster data and field statistical data of a target area;
the species quantity evaluation module is used for inputting the source information and the coordinate information of the sampling data in the field statistical data and the dynamic characteristic information of the farmland in the grid data into a species quantity evaluation model to obtain an evaluation result of the species quantity in the target area;
the biological individual quantity evaluation module is used for inputting the source information and the coordinate information of the sampling data in the field statistical data and the dynamic characteristic information of the farmland in the grid data into a biological individual quantity evaluation model to obtain an evaluation result of the biological individual quantity in the target area;
wherein the species number evaluation model is trained according to the method of any one of claims 1-5, and the biological individual number evaluation model is trained according to the method of any one of claims 1-5.
9. An electronic device, comprising: a processor and a memory, the processor executing a computer program stored in the memory implementing the method of training a biological population evaluation model of any one of claims 1-5.
10. A readable storage medium, wherein instructions in the storage medium, when executed by a processor of an apparatus, enable the apparatus to perform a method of training a biological population assessment model according to any of method claims 1 to 5.
CN202211140439.5A 2022-09-20 2022-09-20 Training method and device of biological population evaluation model and electronic equipment Active CN115223660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211140439.5A CN115223660B (en) 2022-09-20 2022-09-20 Training method and device of biological population evaluation model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211140439.5A CN115223660B (en) 2022-09-20 2022-09-20 Training method and device of biological population evaluation model and electronic equipment

Publications (2)

Publication Number Publication Date
CN115223660A CN115223660A (en) 2022-10-21
CN115223660B true CN115223660B (en) 2023-03-10

Family

ID=83617468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211140439.5A Active CN115223660B (en) 2022-09-20 2022-09-20 Training method and device of biological population evaluation model and electronic equipment

Country Status (1)

Country Link
CN (1) CN115223660B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002036812A2 (en) * 2000-11-03 2002-05-10 Michael Korenberg Nonlinear system identification for class prediction in bioinformatics and related applications
CN108876917A (en) * 2018-06-25 2018-11-23 西南林业大学 A kind of forest ground biomass remote sensing estimation universal model construction method
CN113536519B (en) * 2020-04-21 2023-06-16 生态环境部南京环境科学研究所 Biodiversity evaluation method and computer equipment
CN111765974B (en) * 2020-07-07 2021-04-13 中国环境科学研究院 Wild animal observation system and method based on miniature refrigeration thermal infrared imager
CN112132432A (en) * 2020-09-15 2020-12-25 中国水产科学研究院黄海水产研究所 Comprehensive evaluation method for potential risks of ecological vulnerability of coastal wetland
CN113011086B (en) * 2021-03-02 2022-08-16 西南林业大学 Estimation method of forest biomass based on GA-SVR algorithm
CN113095467B (en) * 2021-04-29 2023-04-18 清华大学 Quantum biological population quantity estimation method
CN114022008A (en) * 2021-11-11 2022-02-08 东莞理工学院 Estuary suitable ecological flow assessment method based on water ecological zoning theory

Also Published As

Publication number Publication date
CN115223660A (en) 2022-10-21

Similar Documents

Publication Publication Date Title
Ngango et al. Assessment of technical efficiency and its potential determinants among small-scale coffee farmers in Rwanda
Yuan et al. Anthropogenic disturbances are key to maintaining the biodiversity of grasslands
Gong et al. Multi-objective parameter optimization of common land model using adaptive surrogate modeling
Chase et al. A framework for disentangling ecological mechanisms underlying the island species–area relationship
CN108241905A (en) For predicting the method for soil and/or plant situation
Chun et al. Partitioning the regional and local drivers of phylogenetic and functional diversity along temperate elevational gradients on an East Asian peninsula
Van Oijen et al. Incorporating biodiversity into biogeochemistry models to improve prediction of ecosystem services in temperate grasslands: Review and roadmap
Du et al. Estimating leaf area index of maize using UAV-based digital imagery and machine learning methods
Schaak et al. Long-term trends in functional crop diversity across Swedish farms
Cozzoli et al. Sensitivity of phytoplankton metrics to sample-size: A case study on a large transitional water dataset (WISER)
Pottier et al. On the relationship between clonal traits and small-scale spatial patterns of three dominant grasses and its consequences on community diversity
Fukano et al. GIS-based analysis for UAV-supported field experiments reveals soybean traits associated with rotational benefit
Yamamura Dispersal distance of corn pollen under fluctuating diffusion coefficient
Castex et al. Assembling and testing a generic phenological model to predict Lobesia botrana voltinism for impact studies
Wu et al. Bayesian binomial mixture models for estimating abundance in ecological monitoring studies
Adewopo et al. Can a combination of UAV-derived vegetation indices with biophysical variables improve yield variability assessment in smallholder farms?
Grossman Evidence of constrained divergence and conservatism in climatic niches of the temperate maples (Acer L.)
Manu et al. Soil mite (Acari: Mesostigmata) communities and their relationships with some environmental variables in experimental grasslands from Bucegi Mountains in Romania
Hou et al. Acoustic Sensor-Based Soundscape Analysis and Acoustic Assessment of Bird Species Richness in Shennongjia National Park, China
Da Mata et al. Stacked species distribution and macroecological models provide incongruent predictions of species richness for Drosophilidae in the Brazilian savanna
Kumar et al. Performance of APSIM to simulate the dynamics of winter wheat growth, phenology, and nitrogen uptake from early growth stages to maturity in Northern Europe
CN115223660B (en) Training method and device of biological population evaluation model and electronic equipment
Yin et al. Examining the patterns and dynamics of species abundance distributions in succession of forest communities by model selection
Agoglitta et al. Cumulative annual dung beetle diversity in Mediterranean seasonal environments
Dainese et al. Plant and animal diversity in a region of the Southern Alps: the role of environmental and spatial processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant