CN114627979A - Method and system for determining biomass material characteristic probability distribution information - Google Patents

Method and system for determining biomass material characteristic probability distribution information Download PDF

Info

Publication number
CN114627979A
CN114627979A CN202210305030.8A CN202210305030A CN114627979A CN 114627979 A CN114627979 A CN 114627979A CN 202210305030 A CN202210305030 A CN 202210305030A CN 114627979 A CN114627979 A CN 114627979A
Authority
CN
China
Prior art keywords
probability distribution
material characteristic
probability
data
biomass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210305030.8A
Other languages
Chinese (zh)
Other versions
CN114627979B (en
Inventor
王鑫磊
韩鲁佳
杨增玲
郭轩宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202210305030.8A priority Critical patent/CN114627979B/en
Publication of CN114627979A publication Critical patent/CN114627979A/en
Application granted granted Critical
Publication of CN114627979B publication Critical patent/CN114627979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method and a system for determining biomass material characteristic probability distribution information, which belong to the field of biomass. The method adopts a frequency histogram and a classical probability density function fitting method to realize the acquisition and visualization of qualitative and quantitative information of a probability distribution model of material characteristic data.

Description

Method and system for determining probability distribution information of characteristics of biomass material
Technical Field
The invention relates to the field of biomass, in particular to a method and a system for determining biomass material characteristic probability distribution information.
Background
The biomass such as crop straws, livestock and poultry manure and the like has large output and wide distribution range, and is one of the green renewable resources with the most development potential. However, due to the influence of differences of varieties, geographical environments and the like, the characteristics of biomass materials present large variability and specificity, and the method is one of main obstacles for high-value, high-efficiency and industrial production of the biomass materials. The heterogeneous material is used as a production raw material, the property, the quality and the application prospect of the material are determined, the general rule of variability and specificity of the characteristics of the biomass material needs to be deeply known and mastered, and the general distribution trend and the characteristics of the biomass material are explored.
The probability distribution model is used for describing the distribution trend of random variable values, quantitatively disclosing the overall characteristic rule of data and describing the uncertainty of the data, is one of important big data mining analysis methods, and is applied to the fields of power system simulation, combustion reaction process modeling, hydrological observation, meteorological data analysis and the like. The difference of the probability distribution models can also influence the selection of the statistical analysis method and the accuracy of the statistical result, so that the deviation between the statistical result and the actual result is caused, and the actual production is difficult to guide. The conventional statistical method is based on the premise that the data is normally distributed on the whole, but research shows that the characteristics of biomass materials such as crop straws, livestock and poultry manure are influenced by various external factors and individual differences to present non-normal distribution characteristics. Therefore, the probability distribution model for obtaining the characteristics of the biomass material has important significance for industrial conversion.
In order to effectively obtain the probability distribution model of the characteristics of the biomass material, a method and means which are in accordance with the information mining of the probability distribution of the characteristics of the biomass material need to be adopted. The existing exploration probability distribution information mining method still has the following problems: the existing method mainly adopts a few common probability distribution models for fitting, and the related probability distribution models are few in types and difficult to explore and optimize the probability distribution models of the characteristics of the biomass materials; the prior method has less statistical information related to a probability distribution model, and model information is difficult to obtain according to requirements; and thirdly, no specific method for exploring the probability distribution information of the characteristics of the biomass material exists at present.
Disclosure of Invention
The invention aims to provide a method and a system for determining biomass material characteristic probability distribution information, so as to realize the acquisition and visualization of qualitative and quantitative information of a probability distribution model of characteristics, components and contents of biomass materials.
In order to achieve the purpose, the invention provides the following scheme:
a method for determining probability distribution information of characteristics of biomass materials comprises the following steps:
acquiring a material characteristic value sequence of biomass;
arranging the material characteristic values in the material characteristic value sequence from small to large, and determining a frequency distribution histogram of the ordered material characteristic value sequence;
selecting a plurality of probability density functions of the frequency distribution according with the frequency distribution histogram;
solving parameters of each probability density function by adopting maximum likelihood estimation according to the material characteristic value sequence, and fitting to obtain a plurality of probability distribution models;
performing goodness-of-fit inspection on each probability distribution model, and screening out probability distribution models meeting goodness-of-fit inspection conditions;
calculating the statistical analysis result of each screened probability distribution model; the statistical analysis result comprises expectation, variance, skewness, kurtosis, median and standard deviation;
and standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
Optionally, the obtaining a material characteristic value sequence of the biomass further includes:
calculating the median and the quartile distance of all material characteristic values in the material characteristic value sequence;
removing [ x ] in the material characteristic value sequence50%-3IQR,x50%+3IQR]Material characteristic values outside the range are eliminated, and material characteristic values with empty numerical values are eliminated;
wherein x is50%Represents the median, IQR represents the interquartile range, IQR ═ x75%-x25%
Figure BDA0003564530050000021
c=(n-1)×p+1,xpDenotes the value of the quantile p, p being 75%, 50% or 25%, xcRepresenting the material property value at position c, c representing position, N representing the number of material property values in the sequence of material property values, N representing a positive integer, { c } representing the fractional part of c,
Figure BDA0003564530050000022
and
Figure BDA0003564530050000023
respectively representing a previous material property value and a subsequent material property value at position c.
Optionally, the determining a frequency distribution histogram of the sorted material characteristic value sequence specifically includes:
according to the sorted material characteristic value sequence, determining the initial value, the final value and the number of divisions of the frequency distribution histogram by using the following formulas:
Figure BDA0003564530050000031
Figure BDA0003564530050000032
Figure BDA0003564530050000033
wherein SV represents a start value, EV represents an end value, k represents the number of divisions, xmin、xmaxRespectively representing the minimum and maximum material characteristic values in the sorted material characteristic value sequence, IQR representing a four-quadrant spacing, m representing the number of material characteristic values in the sorted material characteristic value sequence,
Figure BDA0003564530050000034
meaning that the rounding is done down,
Figure BDA0003564530050000035
represents rounding up;
dividing the interval from the starting value to the ending value into k sub-intervals according to the number k of the divisions;
counting the number of the material characteristic values falling into each subinterval in the sequenced material characteristic value sequence, and determining the ratio of the number of the material characteristic values falling into each subinterval to the number m of the material characteristic values in the sequenced material characteristic value sequence as the frequency of each subinterval;
and drawing a frequency distribution histogram by taking the material characteristic value as an abscissa and the frequency as an ordinate.
Optionally, carry out goodness-of-fit inspection to every probability distribution model, select the probability distribution model who satisfies goodness inspection condition, specifically include:
and performing goodness-of-fit inspection on each probability distribution model by adopting K-S inspection, and screening out the probability distribution models of which the goodness-of-fit inspection values are greater than or equal to a preset threshold value.
Optionally, carry out goodness-of-fit inspection to every probability distribution model, select the probability distribution model who satisfies goodness inspection condition, specifically include:
using a formula based on each probability distribution model
Figure BDA0003564530050000036
Calculating the cumulative probability value in each subinterval in the frequency histogram; wherein m isi、mi+1Respectively representing the start value and the end value of the ith interval,
Figure BDA0003564530050000037
is represented by [ mi,mi+1]Cumulative probability values for the intervals, F representing a probability density distribution function, F representing a cumulative probability distribution function, theta representing a parameter of the probability distribution model, F (x; theta) representing a probability density distribution function with the parameter theta, F (m)i;θ)、F(mi+1(ii) a Theta) respectively represent the probability distribution model at miAnd mi+1Summary of the stationA value of the rate;
and performing goodness-of-fit test on the cumulative probability value in each subinterval in the frequency histogram and the frequency in each subinterval in the frequency histogram by adopting a chi-square test, and screening out a probability distribution model with the goodness-of-fit test value being greater than or equal to a preset threshold value.
Optionally, the normalizing each screened probability distribution model to obtain a probability distribution model curve coinciding with the frequency distribution histogram specifically includes:
using a formula
Figure BDA0003564530050000041
Standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram;
wherein f isGF' is the screened probability distribution model curve, SV and EV are the initial value and the final value of the frequency distribution histogram respectively, and k is the number of divisions of the frequency distribution histogram.
A system for determining probability distribution information of a biomass material characteristic, the system comprising: the system comprises a data processing module and a visualization module;
the data processing module adopts the determination method of the biomass material characteristic probability distribution information;
the data processing module is connected with the visualization module; the visualization module is used for importing the material characteristic value sequence of the biomass and displaying a statistical analysis result obtained by the data processing module according to the material characteristic value sequence of the imported biomass and a probability distribution model curve superposed with the frequency distribution histogram.
Optionally, the visualization module includes: the system comprises a data import unit, a data display unit, a data selection unit and a model result output unit;
the data import unit is connected with the data display unit; the data import unit is used for importing material characteristic data of biomass and displaying the material characteristic data on the data display unit;
the data selection unit is connected with the data display unit; the data selection unit is used for receiving a data selection condition; the data display unit is used for displaying material characteristic data of the biomass meeting the data selection condition;
the data display unit and the model result output unit are connected with the data processing module; the data processing module is used for obtaining a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram according to the material characteristic data of the biomass meeting the data selection condition, and outputting the statistical analysis result and the probability distribution model curve superposed with the frequency distribution histogram to the model result output unit for visualization.
Optionally, the visualization module further comprises: a function setting module;
the function setting module is connected with the data processing module; the function setting module is used for receiving the selection of the probability density function and transmitting the selected probability density function to the data processing module;
the data processing module is used for solving parameters of the selected probability density function by adopting maximum likelihood estimation according to material characteristic data of the biomass meeting the data selection condition, fitting to obtain a selected probability distribution model and obtaining a fitting goodness test result of the selected probability distribution model; when the goodness-of-fit test result meets the goodness-of-fit test condition, outputting a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram; and when the goodness-of-fit test result does not meet the goodness-of-fit test condition, outputting a fitting error prompt.
Optionally, the data processing module includes:
the data acquisition unit is used for acquiring a material characteristic value sequence of the biomass;
the frequency distribution histogram acquisition unit is used for arranging the material characteristic values in the material characteristic value sequence from small to large and determining a frequency distribution histogram of the ordered material characteristic value sequence;
a probability density function selecting unit for selecting a plurality of probability density functions according with the frequency distribution shown by the frequency distribution histogram;
the model parameter fitting unit is used for solving the parameters of each probability density function by adopting maximum likelihood estimation according to the material characteristic value sequence, and fitting to obtain a plurality of probability distribution models;
the fitting result checking unit is used for carrying out goodness-of-fit checking on each probability distribution model and screening out the probability distribution models meeting goodness-of-fit checking conditions;
a statistical result calculation unit for calculating a statistical analysis result of each screened probability distribution model; the statistical analysis result comprises expectation, variance, skewness, kurtosis, median and standard deviation;
and the curve standardization unit is used for standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a method for determining biomass material characteristic probability distribution information, which comprises the steps of firstly determining a frequency distribution histogram of a material characteristic value sequence, then selecting a plurality of probability density functions which accord with frequency distribution displayed by the frequency distribution histogram, fitting a plurality of probability distribution models, screening the probability distribution models which pass through goodness of fit test, and finally calculating a statistical analysis result of each screened probability distribution model and obtaining a probability distribution model curve. The method adopts a frequency histogram and a classical probability density function fitting method to realize the acquisition of qualitative and quantitative information of a probability distribution model of material characteristic data.
According to the system for determining the biomass material characteristic probability distribution information, disclosed by the invention, the visualization module can be used for introducing the material characteristic value sequence of biomass, displaying the statistical analysis result and the probability distribution model curve superposed with the frequency distribution histogram, and realizing the visualization of the qualitative and quantitative information of the probability distribution model of the biomass material characteristics, components and contents.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a method for determining probability distribution information of characteristics of biomass materials provided by the present invention;
FIG. 2 is a relational diagram of a method for determining probability distribution information of characteristics of a biomass material according to the present invention;
FIG. 3 is a schematic diagram showing the result of probability distribution information of ash content in wheat straw provided by the embodiment of the present invention;
fig. 4 is a schematic interface diagram of a visualization module provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a method and a system for determining biomass material characteristic probability distribution information, so as to realize the acquisition and visualization of qualitative and quantitative information of a probability distribution model of characteristics, components and contents of biomass materials.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a method for determining biomass material characteristic probability distribution information, which comprises the following steps of:
step 1, a material characteristic value sequence of biomass is obtained.
The method comprises the steps of obtaining characteristic data (such as chemical composition, industrial composition, element composition, mineral elements, heat value and the like) of biomass materials such as crop straws, livestock and poultry manure and the like by adopting a standard test method, wherein the quantity of data samples is recommended to be more than 30.
Illustratively, step 1 is followed by a pre-processing of the data: in the process of collecting, sorting and analyzing material characteristic data, the conditions of numerical value deficiency and data abnormality exist in partial characteristic values, and the values need to be removed.
Calculating the median and the quartile distance of all material characteristic values in the material characteristic value sequence; eliminating [ x ] in material characteristic value sequence50%-3IQR,x50%+3IQR]And (4) material characteristic values outside the range, and eliminating material characteristic values with empty numerical values.
Median and interquartile range calculations: arranging the data from small to large, calculating the positions of 25%, 50% and 75% quantiles in the data, and finding out the data of the corresponding positions. Let x be data that has been arranged in order from small to large, the number of bits (50% fractional value) and the quartile range can be calculated by the following formula:
c=(n-1)×p+1
Figure BDA0003564530050000071
IQR=x75%-x25%
wherein x is50%Representing the median, IQR representing the interquartile range, xpDenotes the value of the quantile p, p being 75%, 50% or 25%, xcRepresenting the material property value at position c, c representing position, N representing the number of material property values in the sequence of material property values, N representing a positive integer, { c } representing the fractional part of c,
Figure BDA0003564530050000072
and
Figure BDA0003564530050000073
respectively represent the characteristic value andthe latter material property value.
And 2, arranging the material characteristic values in the material characteristic value sequence from small to large, and determining a frequency distribution histogram of the sequenced material characteristic value sequence.
And arranging the cleaned data again according to the sequence from small to large, acquiring a probability distribution histogram of the data, and determining parameters related to the frequency histogram, wherein the parameters include a starting value, an ending value and a division number. Wherein, the calculation of the starting value and the ending value is divided into the following 2 cases: 1. when the maximum value of the data is more than or equal to 3, rounding the minimum value downwards to be an initial value, and rounding the maximum value upwards to be a final value; 2. when the maximum value of the data is less than 3, the minimum value of 2 significant digits is reserved and rounded down to the initial value, and the maximum value of 2 significant digits is reserved and rounded up to the final value. The number of cells is calculated according to the starting value, the ending value, the four-bit distance and the data amount, and the number of cells is required to be more than or equal to 5.
Illustratively, the determining step of the frequency distribution histogram is:
2-1, determining the initial value, the final value and the number of divisions of the frequency distribution histogram by using the following formulas according to the sorted material characteristic value sequence:
Figure BDA0003564530050000081
Figure BDA0003564530050000082
Figure BDA0003564530050000083
wherein SV represents a start value, EV represents an end value, k represents the number of divisions, xmin、xmaxRespectively representing the minimum and maximum material characteristic values in the sorted material characteristic value sequence, IQR representing a four-quadrant spacing, m representing the number of material characteristic values in the sorted material characteristic value sequence,
Figure BDA0003564530050000084
meaning that the rounding is done down,
Figure BDA0003564530050000085
represents rounding up;
2-2, dividing the interval from the starting value to the ending value into k sub-intervals according to the number k of the divisions;
2-3, counting the number of the material characteristic values falling into each subinterval in the sequenced material characteristic value sequence, and determining the ratio of the number of the material characteristic values falling into each subinterval to the number m of the material characteristic values in the sequenced material characteristic value sequence as the frequency of each subinterval;
and 2-4, drawing a frequency distribution histogram by taking the material characteristic value as an abscissa and the frequency as an ordinate.
The obtained frequency histogram is used for qualitatively analyzing and displaying the measured data distribution, qualitatively obtaining the distribution characteristics and the distribution types of the data, and counting and analyzing the data. Furthermore, the probability distribution model can be examined by means of frequency histograms.
And 3, selecting a plurality of probability density functions which accord with the frequency distribution shown by the frequency distribution histogram.
Different probability density function forms are selected based on the frequency histogram results (the approximate shape formed by each rectangle in the frequency histogram), and the selected probability density function forms approximate the approximate shape of the frequency histogram.
And 4, solving the parameters of each probability density function by adopting maximum likelihood estimation according to the material characteristic value sequence, and fitting to obtain a plurality of probability distribution models.
Taking normal distribution as an example, the material characteristic value sequence is taken as an input value, and normal distribution is selected for fitting.
The normal distribution includes a position parameter mu and a proportion parameter sigma, and the mu and the sigma are obtained by adopting an MLE (Maximum Likelihood Estimation) method2The value, formula is as follows:
Figure BDA0003564530050000091
taking the logarithm of the likelihood function and deriving it, and making the derivative 0 can be written as follows:
Figure BDA0003564530050000092
Figure BDA0003564530050000093
solving the formula can obtain the result:
Figure BDA0003564530050000094
Figure BDA0003564530050000095
when the number of selected probability distribution models is large, parallel calculation can be selected to improve the operation efficiency.
And 5, performing goodness-of-fit inspection on each probability distribution model, and screening out the probability distribution models meeting goodness inspection conditions.
After the parameter fitting of the probability distribution model is completed, the fitting goodness check needs to be performed on the fitting results in sequence. Common test methods include the Kolmogorov-Smirnov test (K-S test), the chi-square test, and the like.
And when the K-S test is adopted, performing goodness-of-fit test on each probability distribution model, and screening out the probability distribution models of which the goodness-of-fit test values are greater than or equal to a preset threshold value.
When chi-square test is used, a formula is used according to each probability distribution model
Figure BDA0003564530050000101
Calculating a cumulative summary within each subinterval in a frequency histogramA value of the rate; and performing goodness-of-fit test on the cumulative probability value in each subinterval in the frequency histogram and the frequency in each subinterval in the frequency histogram by adopting a chi-square test, and screening out a probability distribution model with the goodness-of-fit test value being greater than or equal to a preset threshold value. Wherein m isi、mi+1Respectively representing the start value and the end value of the ith interval,
Figure BDA0003564530050000103
is represented by [ mi,mi+1]Cumulative probability values for the intervals, F representing a probability density distribution function, F representing a cumulative probability distribution function, theta representing a parameter of the probability distribution model, F (x; theta) representing a probability density distribution function with the parameter theta, F (m)i;θ)、F(mi+1(ii) a Theta) respectively represent the probability distribution model at miAnd mi+1A cumulative probability value of (d);
step 6, calculating the statistical analysis result of each screened probability distribution model; statistical analysis results include expectations, variance, skewness, kurtosis, median, and standard deviation.
The calculation methods of expectation, variance, skewness, kurtosis and standard deviation are determined by different probability distribution models, the median is an independent variable value corresponding to the cumulative distribution density curve value of 50%, and the median can be calculated by adopting a percentage function in the probability distribution models. The cumulative distribution function can completely describe the cumulative probability distribution characteristic of a real random variable X and is the integral of the probability density function.
And 7, standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
Because the curve corresponding to the screened probability distribution model and the frequency distribution histogram cannot coincide in the same coordinate system, the screened probability distribution model needs to be scaled, i.e., standardized. Using a formula
Figure BDA0003564530050000102
Standardizing each screened probability distribution model to obtain the probability of coincidence with the frequency distribution histogramA distribution model curve; wherein f isGF' is the screened probability distribution model curve, SV and EV are the initial value and the final value of the frequency distribution histogram respectively, and k is the number of divisions of the frequency distribution histogram.
According to the invention, firstly, the characteristic information of the biomass material with large sample volume and representativeness, such as physical indexes, chemical components or contents, biological indexes and the like, is obtained by a standard method. And secondly, removing null value and abnormal value information contained in the data. And thirdly, arranging the data in a descending order, calculating characteristic parameters through the data, dividing the data into a plurality of groups according to the parameters, obtaining frequency information of each group, and obtaining a frequency histogram. And fourthly, fitting the classical probability density function through the frequency information of each group to obtain a probability distribution model of the material characteristics. And fifthly, adopting methods such as Kolmogorov-Smimov test (K-S test), Chi-square test and the like to test the frequency histogram and the probability distribution model result. And finally, selecting and obtaining model information such as a statistical result, visualization and the like of the probability distribution model.
And then selecting wheat straws as crop biomass to be researched, taking the ash content of the wheat straws as the characteristic of the biomass material to be researched, and analyzing model information such as a statistical result, visualization and the like of the ash content of the wheat straws.
Taking the ash content of the wheat straws as an example, 778 samples of the wheat straws of different varieties and growth periods are obtained from different regions, and the ash content data of the rice straws are measured by adopting an ASTM E1755-01(2007) method.
The ash content data of the wheat straws after data pretreatment is 758.
The characteristic parameters of the wheat straw ash data are shown in the table 1, and the frequency histogram results are shown in the figure 3.
TABLE 1 wheat straw Ash data characteristic parameters
Minimum value Maximum value of Number of samples Four-bit pitch Initial value End value Number of divisions
3.32 16.30 758 2.92 3 17 17
The wheat straw ash content data is used as an input value, a probability distribution model is solved, and finally the wheat straw ash probability distribution information is shown in figure 3. Three probability density functions are shown in fig. 3: the values in the Logistic function, Normal function, and Right-skewed Gumbel function before the comma represent the position parameters, the values after the comma represent the shape parameters, and p represents the goodness-of-fit test values (9.10, 1.21), (9.17, 2.10), and (8.14, 1.96). Ash represents the Ash content.
The invention adopts a frequency histogram and a classical probability density function fitting method to obtain qualitative and quantitative information of a probability distribution model of material characteristic data, and designs a program to realize the qualitative and quantitative information, simplifies the calculation process of the probability distribution model under large sample quantity and related data management, contains more than 100 probability distribution models for fitting and selection, and can efficiently, quickly and accurately obtain the probability distribution model information of biomass material characteristics and visually display the probability distribution model information.
The invention also provides a system for determining the probability distribution information of the characteristics of the biomass material, which comprises the following steps: the device comprises a data processing module and a visualization module.
The data processing module adopts the method for determining the biomass material characteristic probability distribution information of any one of claims 1 to 6. The data processing module is connected with the visualization module; the visualization module is used for importing the material characteristic value sequence of the biomass and displaying a statistical analysis result obtained by the data processing module according to the material characteristic value sequence of the imported biomass and a probability distribution model curve superposed with the frequency distribution histogram.
Referring to fig. 4, the visualization module includes: the device comprises a data import unit, a data display unit, a data selection unit and a model result output unit.
The data import unit is connected with the data display unit; the data import unit is used for importing material characteristic data of the biomass and displaying the material characteristic data on the data display unit. The data selection unit is connected with the data display unit; the data selection unit is used for receiving a data selection condition; the data display unit is used for displaying the material characteristic data of the biomass meeting the data selection condition. The data display unit and the model result output unit are connected with the data processing module; the data processing module is used for obtaining a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram according to the material characteristic data of the biomass meeting the data selection condition, and outputting the statistical analysis result and the probability distribution model curve superposed with the frequency distribution histogram to the model result output unit for visualization.
The data presentation unit can display the imported data and the fitting result data. The data selection unit may select a table, a column in a table, and a start row and an end row.
Illustratively, the visualization module further comprises: and a function setting module. The function setting module is connected with the data processing module; the function setting module is used for receiving the selection of the probability density function and transmitting the selected probability density function to the data processing module. The data processing module is used for solving parameters of the selected probability density function by adopting maximum likelihood estimation according to material characteristic data of biomass meeting data selection conditions, fitting to obtain a selected probability distribution model and obtaining a fitting goodness test result of the selected probability distribution model; when the goodness-of-fit test result meets the goodness-of-fit test condition, outputting a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram; and when the goodness-of-fit test result does not meet the goodness-of-fit test condition, outputting a fitting error prompt.
The function setup module enumerates all probability density functions for selection by the user.
Illustratively, the visualization module further comprises: and the initial value, the final value, the number of divisions and the goodness-of-fit test value p of the frequency distribution histogram in the operation process of the data processing module can be displayed on the parameter setting unit.
Illustratively, the data processing module includes: the device comprises a data acquisition unit, a frequency distribution histogram acquisition unit, a probability density function selection unit, a model parameter fitting unit, a fitting result inspection unit, a statistical result calculation unit and a curve standardization unit.
The data acquisition unit acquires a material characteristic value sequence of the biomass. The frequency distribution histogram acquisition unit arranges the material characteristic values in the material characteristic value sequence from small to large, and determines the frequency distribution histogram of the ordered material characteristic value sequence. The probability density function selecting unit selects a plurality of probability density functions which accord with the frequency distribution shown by the frequency distribution histogram. And the model parameter fitting unit adopts maximum likelihood estimation to solve the parameters of each probability density function according to the material characteristic value sequence, and a plurality of probability distribution models are obtained through fitting. And the fitting result inspection unit performs goodness-of-fit inspection on each probability distribution model and screens out the probability distribution models meeting goodness-of-fit inspection conditions. The statistical result calculating unit calculates the statistical analysis result of each screened probability distribution model; statistical analysis results include expectations, variance, skewness, kurtosis, median, and standard deviation. And the curve standardization unit standardizes each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
The invention writes programs based on python 3.8, and the program dependent modules comprise numpy (> ═ 1.19.5), scipy (> ═ 1.5.4), wxPython (> ═ 4.1.1), xlwins (> ═ 0.25.2) and the like. The body design comprises five parts: menu bar, data display, parameter selection and operation, model result output and status bar. The main interface is shown in fig. 4.
Compared with the existing calculation program or software containing the probability density model, such as the mathematical analysis software like Origin, the method has the following advantages:
1. the invention relates to 100 probability density function models, which have a wide model range, and the probability density function models related to other software only have about 10 common probability density function models at present, so that model information can be obtained more quickly, accurately and comprehensively.
2. The invention provides various goodness-of-fit inspection methods, which can ensure the authenticity and accuracy of probability distribution model information from multiple angles.
3. The method can directly obtain the statistical results (such as expectation, variance, skewness, kurtosis, median, standard deviation and the like), model parameters, probability distribution model curves (such as probability density function curves and cumulative probability distribution curves) and other information of the material characteristic probability distribution model.
4. The invention develops a program for acquiring the probability distribution model information, has a good user interface, refines the model analysis process, provides flexible parameter selection and model selection space, prevents a misoperation mechanism and improves the model acquisition efficiency.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A method for determining probability distribution information of characteristics of biomass materials is characterized by comprising the following steps:
acquiring a material characteristic value sequence of biomass;
arranging the material characteristic values in the material characteristic value sequence from small to large, and determining a frequency distribution histogram of the ordered material characteristic value sequence;
selecting a plurality of probability density functions of the frequency distribution according with the frequency distribution histogram;
solving parameters of each probability density function by adopting maximum likelihood estimation according to the material characteristic value sequence, and fitting to obtain a plurality of probability distribution models;
performing goodness-of-fit inspection on each probability distribution model, and screening out probability distribution models meeting goodness-of-fit inspection conditions;
calculating the statistical analysis result of each screened probability distribution model; the statistical analysis result comprises expectation, variance, skewness, kurtosis, median and standard deviation;
and standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
2. The method for determining the probability distribution information of the biomass material characteristics according to claim 1, wherein the step of obtaining the material characteristic value sequence of the biomass further comprises the following steps:
calculating the median and the quartile distance of all material characteristic values in the material characteristic value sequence;
eliminating [ x ] in the material characteristic value sequence50%-3IQR,x50%+3IQR]Material characteristic values outside the range are eliminated, and material characteristic values with empty numerical values are eliminated;
wherein x is50%Represents the median, IQR represents the interquartile range, IQR ═ x75%-x25%
Figure FDA0003564530040000011
c=(n-1)×p+1,xpDenotes the value of the quantile p, p being 75%, 50% or 25%, xcRepresenting the material property value at position c, c representing position, N representing the number of material property values in the sequence of material property values, N representing a positive integer, { c } representing the fractional part of c,
Figure FDA0003564530040000012
and
Figure FDA0003564530040000013
respectively representing a previous material property value and a subsequent material property value at position c.
3. The method for determining the probability distribution information of the biomass material characteristics according to claim 1, wherein the determining the frequency distribution histogram of the sorted material characteristic value sequence specifically includes:
according to the sorted material characteristic value sequence, determining the initial value, the final value and the number of divisions of the frequency distribution histogram by using the following formulas:
Figure FDA0003564530040000021
Figure FDA0003564530040000022
Figure FDA0003564530040000023
wherein SV represents a start value, EV represents an end value, k represents the number of divisions, xmin、xmaxRespectively representing the minimum and maximum material characteristic values in the sorted material characteristic value sequence, IQR representing a four-quadrant spacing, m representing the number of material characteristic values in the sorted material characteristic value sequence,
Figure FDA0003564530040000025
meaning that the rounding is done down,
Figure FDA0003564530040000026
represents rounding up;
dividing the interval from the starting value to the ending value into k sub-intervals according to the number k of the divisions;
counting the number of the material characteristic values falling into each subinterval in the sorted material characteristic value sequence, and determining the proportion of the number of the material characteristic values falling into each subinterval to the number m of the material characteristic values in the sorted material characteristic value sequence as the frequency of each subinterval;
and drawing a frequency distribution histogram by taking the material characteristic value as an abscissa and the frequency as an ordinate.
4. The method for determining the biomass material characteristic probability distribution information according to claim 1, wherein the goodness-of-fit test is performed on each probability distribution model, and the probability distribution model satisfying the goodness-of-fit test condition is screened out, specifically comprising:
and performing goodness-of-fit inspection on each probability distribution model by adopting K-S inspection, and screening out the probability distribution models with goodness-of-fit inspection values larger than or equal to a preset threshold value.
5. The method for determining the biomass material characteristic probability distribution information according to claim 1, wherein the goodness-of-fit test is performed on each probability distribution model, and the probability distribution model satisfying the goodness-of-fit test condition is screened out, specifically comprising:
using a formula based on each probability distribution model
Figure FDA0003564530040000024
Calculating the cumulative probability value in each subinterval in the frequency histogram; wherein m isi、mi+1Respectively representing the start value and the end value of the ith interval,
Figure FDA0003564530040000031
is represented by [ mi,mi+1]Cumulative probability values for the intervals, F representing a probability density distribution function, F representing a cumulative probability distribution function, theta representing a parameter of the probability distribution model, F (x; theta) representing a probability density distribution function with the parameter theta, F (m)i;θ)、F(mi+1(ii) a Theta) respectively represent the probability distribution model at miAnd mi+1A cumulative probability value of (d);
and performing goodness-of-fit test on the cumulative probability value in each subinterval in the frequency histogram and the frequency in each subinterval in the frequency histogram by adopting a chi-square test, and screening out a probability distribution model with the goodness-of-fit test value being greater than or equal to a preset threshold value.
6. The method for determining the probability distribution information of the characteristics of the biomass material according to claim 1, wherein the step of standardizing each screened probability distribution model to obtain a probability distribution model curve coinciding with a frequency distribution histogram specifically comprises the steps of:
using formulas
Figure FDA0003564530040000032
Standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram;
wherein f isGF' is the selected probability distribution model curve, SV and EV are the initial value and the final value of the frequency distribution histogram, k is the frequencyThe number of bins of the distribution histogram.
7. A system for determining probability distribution information of characteristics of a biomass material, the system comprising: the system comprises a data processing module and a visualization module;
the data processing module adopts a determination method of biomass material characteristic probability distribution information as set forth in any one of claims 1-6;
the data processing module is connected with the visualization module; the visualization module is used for importing the material characteristic value sequence of the biomass and displaying a statistical analysis result obtained by the data processing module according to the material characteristic value sequence of the imported biomass and a probability distribution model curve superposed with the frequency distribution histogram.
8. The system for determining probability distribution information of biomass material characteristics according to claim 7, wherein the visualization module comprises: the system comprises a data import unit, a data display unit, a data selection unit and a model result output unit;
the data import unit is connected with the data display unit; the data import unit is used for importing material characteristic data of biomass and displaying the material characteristic data on the data display unit;
the data selection unit is connected with the data display unit; the data selection unit is used for receiving a data selection condition; the data display unit is used for displaying material characteristic data of the biomass meeting the data selection condition;
the data display unit and the model result output unit are connected with the data processing module; the data processing module is used for obtaining a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram according to the material characteristic data of the biomass meeting the data selection condition, and outputting the statistical analysis result and the probability distribution model curve superposed with the frequency distribution histogram to the model result output unit for visualization.
9. The system for determining probability distribution information of biomass material characteristics according to claim 8, wherein the visualization module further comprises: a function setting module;
the function setting module is connected with the data processing module; the function setting module is used for receiving the selection of the probability density function and transmitting the selected probability density function to the data processing module;
the data processing module is used for solving parameters of the selected probability density function by adopting maximum likelihood estimation according to material characteristic data of the biomass meeting the data selection condition, fitting to obtain a selected probability distribution model and obtaining a fitting goodness test result of the selected probability distribution model; when the goodness-of-fit test result meets the goodness-of-fit test condition, outputting a statistical analysis result and a probability distribution model curve superposed with the frequency distribution histogram; and when the goodness-of-fit test result does not meet the goodness-of-fit test condition, outputting a fitting error prompt.
10. The system for determining probability distribution information of biomass material characteristics according to claim 7, wherein the data processing module comprises:
the data acquisition unit is used for acquiring a material characteristic value sequence of the biomass;
a frequency distribution histogram obtaining unit, configured to arrange the material characteristic values in the material characteristic value sequence in a descending order, and determine a frequency distribution histogram of the ordered material characteristic value sequence;
a probability density function selecting unit for selecting a plurality of probability density functions according with the frequency distribution shown by the frequency distribution histogram;
the model parameter fitting unit is used for solving the parameters of each probability density function by adopting maximum likelihood estimation according to the material characteristic value sequence, and fitting to obtain a plurality of probability distribution models;
the fitting result checking unit is used for carrying out goodness-of-fit checking on each probability distribution model and screening out the probability distribution models meeting goodness-of-fit checking conditions;
the statistical result calculating unit is used for calculating the statistical analysis result of each screened probability distribution model; the statistical analysis result comprises expectation, variance, skewness, kurtosis, median and standard deviation;
and the curve standardization unit is used for standardizing each screened probability distribution model to obtain a probability distribution model curve superposed with the frequency distribution histogram.
CN202210305030.8A 2022-03-25 2022-03-25 Method and system for determining probability distribution information of biomass material characteristics Active CN114627979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210305030.8A CN114627979B (en) 2022-03-25 2022-03-25 Method and system for determining probability distribution information of biomass material characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210305030.8A CN114627979B (en) 2022-03-25 2022-03-25 Method and system for determining probability distribution information of biomass material characteristics

Publications (2)

Publication Number Publication Date
CN114627979A true CN114627979A (en) 2022-06-14
CN114627979B CN114627979B (en) 2024-07-16

Family

ID=81903580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210305030.8A Active CN114627979B (en) 2022-03-25 2022-03-25 Method and system for determining probability distribution information of biomass material characteristics

Country Status (1)

Country Link
CN (1) CN114627979B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115797734A (en) * 2023-02-07 2023-03-14 慧铁科技有限公司 Method for representing and processing discrete data of railway train fault form

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251238A (en) * 2016-07-27 2016-12-21 华北电力大学(保定) Choosing and Model Error Analysis method of wind energy turbine set modeling series of discrete step-length
CN108982409A (en) * 2018-08-08 2018-12-11 浙江工业大学 A method of quickly detecting three constituent content of kelp lignocellulosic based near infrared spectrum
CN110287601A (en) * 2019-06-27 2019-09-27 浙江农林大学 A kind of Bamboo Diameter Breast-high age binary combination distribution Accurate Estimation Method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251238A (en) * 2016-07-27 2016-12-21 华北电力大学(保定) Choosing and Model Error Analysis method of wind energy turbine set modeling series of discrete step-length
CN108982409A (en) * 2018-08-08 2018-12-11 浙江工业大学 A method of quickly detecting three constituent content of kelp lignocellulosic based near infrared spectrum
CN110287601A (en) * 2019-06-27 2019-09-27 浙江农林大学 A kind of Bamboo Diameter Breast-high age binary combination distribution Accurate Estimation Method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALBERTO GALLIFUOCO ET AL.,: "Modeling biomass hydrothermal carbonization by the maximum information entropy criterion", REACTION CHEMISTRY & ENGINEERING, 11 March 2021 (2021-03-11), pages 920 - 928 *
毕煜;任晓卫;姜庆五;赵耐青;: "血吸虫感染率概率模型及其估计方法研究", 中国卫生统计, no. 05, 25 October 2008 (2008-10-25), pages 457 - 460 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115797734A (en) * 2023-02-07 2023-03-14 慧铁科技有限公司 Method for representing and processing discrete data of railway train fault form

Also Published As

Publication number Publication date
CN114627979B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN109325218B (en) Data screening statistical method and device, electronic equipment and storage medium
CN113792825A (en) Fault classification model training method and device for electricity information acquisition equipment
CN109891508A (en) Single cell type detection method, device, equipment and storage medium
CN109597968A (en) Paste solder printing Performance Influence Factor analysis method based on SMT big data
CN111400366B (en) Interactive outpatient quantity prediction visual analysis method and system based on Catboost model
CN111402017A (en) Credit scoring method and system based on big data
CN114595956B (en) Eucalyptus soil fertility analysis method based on gray scale correlation method fuzzy clustering algorithm
CN114627979A (en) Method and system for determining biomass material characteristic probability distribution information
CN115563477A (en) Harmonic data identification method and device, computer equipment and storage medium
CN117035563B (en) Product quality safety risk monitoring method, device, monitoring system and medium
CN114548494A (en) Visual cost data prediction intelligent analysis system
CN113793057A (en) Building bidding and tendering data generation method based on regression analysis model
CN112256681A (en) Air traffic control digital index application system and method
CN115186776B (en) Method, device and storage medium for classifying ruby producing areas
Wirawan et al. Application of data mining to prediction of timeliness graduation of students (a case study)
CN108961071A (en) The method and terminal device of automatic Prediction composite service income
CN112102882B (en) Quality control system and method for NGS detection process of tumor sample
CN111654853B (en) Data analysis method based on user information
CN113205274A (en) Quantitative ranking method for construction quality
CN112598228A (en) Enterprise competitiveness analysis method, device, equipment and storage medium
CN113707218A (en) Intelligent reading method and system for human genetic disease gene detection
CN114927239B (en) Automatic decision rule generation method and system applied to drug analysis
CN116050773B (en) Industry fusion method and system based on carbon emission evaluation
CN111855930B (en) Grain nutrient detection device and method
Joung et al. Verifying the Classification Accuracy for Korea's Standardized Classification System of Research F&E by using LDA (Linear Discriminant Analysis)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant